Download Free PDF. Urbano Nunes. Teresa Sousa. A short summary of this paper. Efficient feature selection for sleep staging based on maximal overlap discrete wavelet transform and SVM.
Zoubek et al. Six electroencephalographic EEG and two electro-oculographic suggested feature selection algorithms to find the most EOG channels were used in this study.
The maximum overlap relevant features from polysomnography PSG signals. The extracted feature set is Jo et al. A set of significant features are selected by mRMR which is a powerful feature selection method.
Finally four stages: wakefulness, shallow sleep, deep sleep, and the selected feature set is classified using support vector REM stages. Values of classification accuracy vary widely machines SVMs.
The system achieved As concerns the multiclass Rigorous comparisons between the reported systems cannot case, the average accuracy of sleep stages classification was be done since they differ in recording conditions and The maximal [1]. Sleep-wake cycle is categorized in awake, non signals at different resolutions. NREM sleep is further divided into are selected by a minimum-redundancy maximum-relevance three stages: S1, S2 and S3 [2].
Sleep scoring by experts is a mRMR algorithm [10]. Furthermore, a median filter is very time consuming task and normally may require hours to used to enhance the classification accuracies. It is also a somewhat subjective procedure in which the concordance between II. Accordingly, the The proposed system can be organized in various development of automatic systems is highly desirable to interoperating parts as illustrated in Fig.
Several studies have reported the development of A. Ebrahimi et al. Preprocessing and Feature Extraction four db4 is applied to every 30 second epochs with a Before computing the feature vectors from the sampled sampling rate of Hz. A set of statistical wavelet at 50 Hz and a bandpass Butterworth filter with lower cutoff based features see Fig. EEG is traditionally analyzed in the frequency domain, since each sleep stage is 2 Frequency and Temporal Features characterized by a specific pattern of frequency contents.
Regarding the importance of spectral and temporal However further useful information can be extracted from analysis, some features are extracted as suggested in [4], [7], temporal analysis of EOG and EEG signals.
Moreover, EEG [13], [14], [15]. These features are discussed in the signals are non-stationary; therefore time-frequency experimental results section. Thus, after C. Feature Transformation and Normalization preprocessing, some features are extracted using several The extracted features are transformed and normalized in methods in the time-frequency, temporal and frequency order to reduce the influence of extreme values. The domain. It was verified that some of those transformations The discrete wavelet transform DWT generates improved the classification results.
After a thorough coefficients that are local in both time and frequency. The experimental evaluation of each transform operator over maximum overlap discrete wavelet transform MODWT extracted features, it was empirically verified that the best [11] is a DWT in which the operation of subsampling from classification results were attained with the transform an output filter is omitted.
As a respectively is the transformed feature matrix. Thereby this result, in the MODWT, the wavelet and scaling transform was adopted in the overall sleep staging system. Although the components those in smaller numeric ranges, and numerical difficulties of MODWT are not mutually orthogonal, their sum is equal during the classification; in the selection process, each to the original time series. Additionally, the detail and feature of the transformed matrix is independently smooth coefficients of a MODWT are associated with normalized to the [0, 1] range by applying zero phase filters.
Effects Furthermore, the MODWT is invariant to circularly of transformation and normalization on the classification shifting the original time series. Hence, shifting the time process are discussed in the experimental results section. This property does not D. Regarding the sensitivity and the specificity, we know that the higher the sensitivity, the lower the chance of misclassifying the short survival patients; on the other hand, the higher the specificity, the lower the chance of misclassifying the long survival patients.
Interestingly, our model predicts the short survival patients with more confidence than the long survival patients. To further investigate the effectiveness of our proposed method, we also draw a Kaplan-Meier plot based on the model output i. The result indicates that our model can well separate subjects with long OS from those with short OS.
Therefore, our K-M plot should be interpreted carefully. To further validate the effectiveness of our proposed algorithm on predicting OS for gliomas patients, we test our trained model on a newly collected dataset. This newly collected dataset consists of 25 patients; each of them has the same modalities channels with the dataset described in Section Data Acquisition.
The statistical information about the 25 patients are shown in Table 3. We preprocessed the data by following the same procedures as described in Section Data Preprocessing. With preprocessed data, we adopt the trained neural networks to extract feature representation for all these 25 patients. Then, we apply the trained SVM model for classification based on the extracted features Note that we only consider limited demographic and tumor-related features and fc7 features as the final features for the SVM model.
The experimental results on such an independent dataset are presented in Table 4. The performances reported in Table 4 are generally consistent with those in Table 2 , especially for the accuracy, sensitivity and specificity.
This further proves the robustness of the proposed method. Accurate pre-operative prognosis for this high-grade glioma can lead to better treatment planning. Conventional survival prediction based on clinical information is prone to be subjective and sometimes could be not accurate enough.
In this paper, we propose a multi-modality multi-channel deep learning method to automatically learn feature representations for the imaging data and then a binary SVM model for the final tumor OS classification.
Due to the use of powerful deep learning model, we can learn useful features from imaging data automatically. It can thus avoid being subjective if self-designing the features by radiologists, and will be able to explore some useful but hard-to-design features. Furthermore, our proposed deep feature learning method can be adapted to both single-channel and multi-channel imaging data. This is a huge advantage in clinical application as it is common that medical imaging data has uncertain number of channels.
However, we by no means aim to underrate and criticize the traditional OS prediction model, but to test the feasibility of deep-learning-based OS prediction model as this type of methods has many advantages such as automatic feature learning, high-level feature learning, better ability to fuse multi-channel images, and so on.
Other studies also used conventional features, or radiomics features, but may include more features, such as KPS daily living score , resection percentage, and some genomics features e.
But obtaining these features will involve enormous work and resources. We would like to provide the reasons why we did not include them as below. Moreover, all our patients have KPS scores larger than 90, making this factor count for very little variability of the survival data.
Second, resection percentage is a treatment-related factor, which is beyond the scope of this study i. Moreover, as described in Sec. Therefore, our goal can be simply summarized as: to predict OS based on deep learning upon tumor multimodal imaging obtained presurgically, given optimized treatment following the guideline later on.
Third, the genomic data is not available for most of the subjects, given the commencement of the study is early i. Collectively, to achieve our preset research goal, and to simply convey our proposed method, i. More importantly, our proposed framework is able to work well on a small dataset. In our study, the minimum unit or a sample is a patch in the feature learning stage, rather than a whole brain image. That is, for each subject, we can extract hundreds of patches with the same label ; therefore, we can eventually have enough samples i.
In other words, we train the networks at the patch level, rather than a whole-image or subject level. Because the features learned from the deep learning framework are more accurate and also at much higher level, the following SVM could have better performance than the SVMs using features extracted by traditional methods. Moreover, our proposed framework can fuse multi-modality information so that it can fuse more information from different imaging modalities to determine the final classification.
The results from additional experiments on this issue are detailed in the Experiments and Results section. We run the same proposed framework for extracting features from each single modality, and train SVM using the extracted features.
In this way, we can justify the importance of fusing multi-modal imaging data in predicting OS. The quantitative results are shown in Fig. Among the single modality classification performances, the features from rs-fMRI yield the best performance among all single modalities, i. OS prediction results using different imaging modalities. Another experiment shows the comparison result between supervised feature extraction method in our proposed method and traditional unsupervised feature extraction approaches i.
To analyze the importance of the features for predicting OS, we also calculate the number of the features selected from each modality in the prediction based on multi-modal images. To do this, we use the L1-regularized SVM for classification, for internally enforcing the selection of the most discriminative features from the outputs of the fc7 layers of the CNNs. However, for the T1 MRI, we only have a single channel of data, while multiple channels for the other modalities. Therefore, we further normalize these numbers by the total number of the channels from the corresponding modality.
The normalized measures are 1, 0. Specifically, we adopted a 2D version of the CNN architecture shown in page 13, in which the inputs are 2D patches along the axial plane, and the feature maps are all reduced to 2D.
Moreover, we employ the same strategy shown in page 14 to train multi-channel deep networks, with 2D features. The experimental results are presented in Fig. To show the advantage of our 3D-CNN-based supervised feature learning, we also perform comparisons with several unsupervised feature extraction techniques, which are popularly used in both computer vision and medical imaging fields.
Specifically, we adopt scale-invariant transform SIFT 52 , a commonly used unsupervised image descriptor in image reconstruction, alignment and recognition tasks, as a comparison feature extraction approach. As our medical image is stored in 3D format, we employ a spatial-temporal descriptor based on 3D gradients 53 to extract the features from the tumor regions.
Each patch in an image is represented by a certain visual vocabulary, and finally the image can be represented by a histogram of the visual vocabularies. We also extract the Haar-like features from tumor patches, which are originally proposed by the paper 55 for object detection and have been applied to many applications due to its efficiency. Note that we use a variant of the Haar-like features 56 calculated based on the difference between the mean values of two cubic-regions randomly located within an image patch.
The size of each cubic-region is randomly chosen from an arbitrary range, i. Since we have multiple modalities of data to extract the features, we first extract features from each modality separately and then use PCA to reduce their dimensionality. Next, we concatenate the features from different modalities and the handcrafted features, and finally train an SVM model.
The experimental results are shown in Fig. The Haar-like features present the worst performance, and the proposed deep-learning-based features result in the best performance. To investigate the impact of combining these two models, we design comparison experiments with SVM alone method and CNN alone method. With SVM alone, we used manual designed features, i. As for CNN-based classification, as CNN has an ability to combine feature learning and classification together, which directly generates the soft label as the final output from the neuronal network.
The rationale that we design our model CNN-based feature extraction plus SVM-based classification is that SVM generally performs better and more robustly in a study with limited sample size. Deep learning based models hierarchically process the input data imaging data in our case , and output the highly semantic features towards the target i.
As in our study, we use the last two layers i. These features will be highly semantic and quite effective for tumor OS classification as they are learned under supervision. However, it is currently difficult to investigate which imaging features really help improve the accuracy and what they really represent.
It is also difficult for our study, although we have tried to investigate which region of the imaging data contributes most to the useful features. As reported in Section Experiments and Results, we first validate our proposed method on the dataset with 68 patients in a 3-fold cross-validation fashion. Then we further validate it by introducing extra testing on a new dataset with 25 patients.
The experimental results on these two datasets indicate that our proposed method is robust. Furthermore, a lot of similar researches based on deep learning models are recently proposed and achieve great success.
For example, Setio et al. Esteva et al. And part of these studies has even been applied to clinical trials. Thus, we believe our proposed method is useful in developing a new tumor OS prediction model. It is worth indicating the limitations of our work.
For example, we only use limited clinical information in our study and thus obtain a weak clinical model. The newly revised WHO grading system even suggests using some of the genetic features to grade the gliomas. Unfortunately, we did not have such information for all the subjects because our data were collected several years ago at that time, collecting genetic information has not become the clinical routine yet.
But, for these newly-enrolled subjects, since they were newly admitted to the hospital and have been only followed up for a short time, we have not had their OS information yet. In the future follow-up study, as more subjects with both genetic information and OS data, we will include genetic information for OS prediction. Therefore, in the experimental design, we have deliberately enrolled subjects with total or gross total resection; for most of them, they have been conducted with postsurgical adjuvant radiotherapy and chemotherapy with the same protocol suggested by the guideline.
With these specifically selected subjects, we can then reduce the confounding effect of treatment and focus more on the prognostic value of neuroimaging. Of note, we acknowledge that treatment is very important to OS, and we are carrying on an ongoing study to predict OS based on both presurgical and treatment features, as well as genetic features, so that future treatment can be better tailored for each individual.
As discussed earlier, our experiments are conducted on a dataset with 68 subjects and a new dataset with 25 patients. The number of subjects is relatively small. Therefore, to obtain better generalizability of the proposed method, we need to increase the participating subjects in the future. Also, we simply concatenate features fc6 or fc7 extracted from different modalities together and utilize them for subsequent OS prediction without considering the relationship between different modalities.
We should better take it into consideration in the future work. Moreover, we have resized the tumor cuboids to make them consistent in size; however, this operation obviously affects parts of the geometric properties of the tumor. This issue can be possibly resolved by applying a multi-instance learning framework. Lastly, we choose a hard threshold to classify the patients into two categories long or short OS , which decreases the precision of our predictive results.
Besides, we can further categorize the patients into more e. Since these features are from different domains, more advanced feature learning and integration methods need to be developed. Moreover, since our ultimate goal is to predict the overall survival which can be better used in clinical practice, we will treat it as continuous variable with a rigorous machine-learning or deep-learning-based regression model in our future work.
Generally, in this study, we have proposed a 3D deep learning model to predict the long or short OS time for the patients with brain glioma. The extracted features were then fed into a binary SVM classifier.
The performance of our supervised CNN-based learned features was compared with the performances of several other state-of-the-art methods, including those using the traditional handcrafted features. Experimental results showed that our supervised-learned features significantly improved the predictive accuracy of OS time for the glioma patients. This also indicates that our proposed 3D deep learning frameworks can provoke computational models to extract useful features for such neuro-oncological applications.
Overall, our proposed method shows its great promise in multi-modal MRI-based diagnosis or prognosis for a wider spectrum of neurological and psychiatric diseases.
Then, a binary classifier i. Different from the conventional CNN that stacks multi-channel inputs at the beginning, we perform independent convolution streams for each inputting channel in the early layers and then fuse them in deep layers for high-level feature extraction.
Of note, to augment the dataset, we flip the bounding box along three directions x, y, z separately for all metrics. CNN derives the high-level features from the low-level input, while the estimated high-level features directly contribute to the classification of the input data. The network architecture usually consists of a number of layers.
As we go deeper in the network, the layer will generate higher-level features. For example, the last layer can represent more intrinsic features compared to the earlier layer s Inspired by the very deep convolutional networks VGGNet 62 , we design our CNN architecture with four convolutional layer groups and three fully-connected layers.
The detailed configurations of the four convolutional layer groups conv1 to conv4 are shown in Fig. The convolutional operation results in the 3D output patch of the same size as the input, followed by max-pooling to down-sample the patch. The last three layers in the CNN are fully connected fc5 to fc7.
These fully-connected layers include the neurons that are connected to all outputs of their precedent layers, as in the conventional neural networks. The last layer fc7 has 2 neurons, whose correspond to the probabilities of classifying the patient into the long or the short OS group.
An illustration of the CNN architecture for the single-channel feature extraction from 3D patches. There are four convolutional layer groups and three fully-connected layers. The supervision on the classification of the training data leads to a back-propagation procedure for learning the most relevant features in the CNN.
Specifically, we regard the outputs from the last two layers of the CNN fc6 and fc7 as the learned high-level appearance features of individual input patch. The efficiency and effectiveness of the extracted features will be verified in the experiments.
Each metric map corresponds to an input channel when learning the high-level appearance features. To effectively employ all multi-channel data for providing complementary information for the brain tumor, we propose a new multi-channel-CNN mCNN architecture to train one mCNN for each modality. Inspired by the multi-modal deep Boltzmann machine 63 , we extend our single-channel 3D CNN architecture to deal with multi-channel data. Specifically, in the proposed mCNN, the same convolutional layer groups are applied to each channel separately.
Then, a fusion layer is added to integrate the outputs of the last convolutional layer group conv4 from all channels by concatenating them. Then, three fully-connected layers are further incorporated to finally extract the features. The mCNN architecture is illustrated in Fig. Other layers, including the convolutional layers and the fully-connected layers, follow the same configuration.
It is important to note that the statistical properties of different channels of the input data can vary largely, which makes it difficult for a single-channel model to directly encode multi-channel data i. In contrast, our proposed mCNN model sustains much better capability of modeling multi-channel input data and fusing them together to generate high-level features.
Once we complete training a CNN Fig. That is, the patch es , from single or multiple channels of the metrics, will go through the convolutional networks. Then the CNNs convert the input patch es to the high-level features and obtain the survival estimation in the final layer.
In particular, the high-level features extracted at the last two layers fc6 and fc7 of our CNN architectures are perceived to be suitable image-level descriptors In this way, each patch can associate its high-level features with the survival time of the patient under consideration. Note that the fc6 layer has neurons, while the last fc7 layer comprises of two neurons.
In addition to these high-level appearance features, the limited demographic and tumor-related features dtf are also included in our experiments. These limited demographic and tumor-related features consist of generic brain tumor features, including gender, age at diagnosis, tumor location, size of tumor, and the WHO grade. Tumor location is defined by two metrics, i. For example, the tumor distribution of 1 denotes that the tumor appears only in one brain lobe.
The size of the tumor was calculated based on T1 contrast-enhanced MRI by manually delineating the volume with abnormal intensity i. This was conducted by one neurosurgeon with 8-year experience JL to ensure the consistent tumor delineation criteria. Although not included as the limited demographic and tumor-related features, we think that these factors will not likely to make significant contribution to the individualized OS prediction.
Since the numbers of the features in the fc6 and fc7 layers are huge, we further conduct feature reduction or selection for the features from each modality separately.
Curran, W. Recursive partitioning analysis of prognostic factors in three radiation therapy oncology group malignant glioma trials. Article Google Scholar. Gittleman, H. An independently validated nomogram for individualized estimation of survival among patients with newly diagnosed glioblastoma: Nrg oncology rtog and Neuro-oncology 19 , — Lacroix, M.
A multivariate analysis of patients with glioblastoma multiforme: prognosis, extent of resection, and survival. Journal of neurosurgery 95 , — DeAngelis, L. Fundamentals of wireless communication.
Cambridge: Cambridge University Press. Analysis and simulation of interference to vehicle-equipped digital receivers from cellular mobile terminals operating in adjacent frequencies. Sultan, K.
Big data perspective and challenges in next generation networks. Future Internet , 10 , Feukeu, E. Doppler shift signature for bpsk in a vehicular network: IEEE Zhang, Y.
Air-to-air path loss prediction based on machine learning methods in urban environments. OShea, T. Learning approximate neural estimators for wireless channel state information. Ben-Hur, A. Support vector clustering. Journal of Machine Learning Research , 2 , He, H. An introduction to deep learning for the physical layer. Deep learning based MIMO communications. Raj, V. Backpropagating through the air: Deep learning at physical layer without channel models.
Riihijarvi, J. Machine learning for performance prediction in mobile cellular networks. Mohri, M. Foundations of machine learning.
Lee, J. Robust automatic modulation classification technique for fading channels via deep neural network. Entropy , 19 , Lu, T. High-speed channel modeling with machine learning methods for signal integrity analysis. Shiva, N. Predicting wireless channel features using neural networks. From arXiv database.
Where big data meets 5G? Neumann, D. Learning the MMSE channel estimator. Yang, Z. End-to-end multi-modal multi-task vehicle control for self-driving cars with visual perceptions. Ye, H. Power of deep learning for channel estimation and signal detection in OFDM systems. Neural network detection of data sequences in communication systems.
Mitchell, T. Machine learning. Channel agnostic end-to-end learning based communication systems with conditional GAN. Ibnkahla, M. Applications of neural networks to digital communications a survey.
Signal Processing , 80 7 , — Cavalcanti, B. Journal of Microwaves, Optoelectronics and Electromagnetic Applications , 16 3 , — Ebhota, V. Improved adaptive signal power loss prediction using combined vector statistics based smoothing and neural network approach. Progress in Electromagnetics Research C , 82 , — Hackeling, G. Mastering machine learning with scikit-learn. Birmingham: Packt Publishing Ltd.
Huang, H. Zhao, X. A new SVM-based modeling method of cabin path loss prediction. Czink, N. A framework for automatic clustering of parametric MIMO channel data including path powers. In Vehicular technology conference, Vtc Fall pp. He, R.
Clustering enabled wireless channel modeling using big data algorithms. Li, Y. Clustering in wireless propagation channel with a statistics-based framework. Barcelona pp. Ko, J. Millimeter-wave channel measurements and analysis for statistical spatial channel model in in-building and urban environments at 28 GHz. Guraliuc, A. Limb movements classification using wearable wireless transceivers.
Improving clustering performance using multipath component distance. Electronics Letters , 42 1 , 33— Kim, D. A novel validity index for determination of the optimal number of clusters. Molisch, A.
0コメント