Catalogue Search | MBRL

Deep learning approach for microarray cancer data classification

by Basavegowda, Hema Shekar , Dagnew, Guesh in 7-layer deep neural network architecture , Accuracy , adaptive moment estimation

2020

Analysis of microarray data is a highly challenging problem due to the inherent complexity in the nature of the data associated with higher dimensionality, smaller sample size, imbalanced number of classes, noisy data-structure, and higher variance of feature values. This has led to lesser classification accuracy and over-fitting problem. In this work, the authors aimed to develop a deep feedforward method to classify the given microarray cancer data into a set of classes for subsequent diagnosis purposes. They have used a 7-layer deep neural network architecture having various parameters for each dataset. The small sample size and dimensionality problems are addressed by considering a well-known dimensionality reduction technique namely principal component analysis. The feature values are scaled using the Min–Max approach and the proposed approach is validated on eight standard microarray cancer datasets. To measure the loss, a binary cross-entropy is used and adaptive moment estimation is considered for optimisation. The performance of the proposed approach is evaluated using classification accuracy, precision, recall, f-measure, log-loss, receiver operating characteristic curve, and confusion matrix. A comparative analysis with state-of-the-art methods is carried out and the performance of the proposed approach exhibit better performance than many of the existing methods.

Journal Article

Share this book

Add to My Shelf

Classification of cancer microarray data using a two-step feature selection framework with moth-flame optimization and extreme learning machine

by Sucharita, Swati , Sahu, Barnali , Meher, Saroj K. in Artificial neural networks , Cancer , Classification

2024

Analysis of microarray gene expression data for the detection/classification of cancer is one of the common approaches adopted worldwide. However, many genes (features) with correlated and irrelevant information in these data sets become the bottleneck for a classification model and significantly deteriorate its performance. A large number of features with fewer samples further make the classification task more cumbersome. Several feature selection methods (both filter and wrapper) have been proposed individually to address this issue, but choosing the best one among them is an open challenge. Our objective in the present study is to simplify the search for the best feature selection method without relying completely on individual methods and propose a two-step hybrid approach. In the first step, we use an ensemble of filter-based heterogeneous feature selection methods. These selected features then undergo the second step of wrapper-based selection. We propose to use the bio-inspired method called Moth-flame optimization (MFO) with an extreme learning machine (ELM) as its fitness function in this step. The motivation for using ELM is to leverage its learning strategy with one-pass processing of samples. Using this hybrid feature selection method, we proposed a classification model for Cancer Micraoarray data, where ELM is also considered as a classifier. The work demonstrates the superiority of the proposed model over other state-of-the-art methods in classifying cancer data from four different microarray gene expression datasets. Several measurement indexes are used for the performance evaluation of models.

Journal Article

Share this book

Add to My Shelf

Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data

by Zhou, Yi , Takagi, Tatsuya , Song, Jiangning in Accuracy , Algorithms , Analysis

2023

Background Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is “large p and small n” in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. Results This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies–Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. Conclusions The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.

Journal Article

Share this book

Add to My Shelf

A novel and innovative cancer classification framework through a consecutive utilization of hybrid feature selection

by Mahto, Rajul , Shah, Mohd Asif , Ahmed, Saboor Uddin in Accuracy , Algorithms , Analysis

2023

Cancer prediction in the early stage is a topic of major interest in medicine since it allows accurate and efficient actions for successful medical treatments of cancer. Mostly cancer datasets contain various gene expression levels as features with less samples, so firstly there is a need to eliminate similar features to permit faster convergence rate of classification algorithms. These features (genes) enable us to identify cancer disease, choose the best prescription to prevent cancer and discover deviations amid different techniques. To resolve this problem, we proposed a hybrid novel technique CSSMO-based gene selection for cancer classification. First, we made alteration of the fitness of spider monkey optimization (SMO) with cuckoo search algorithm (CSA) algorithm viz., CSSMO for feature selection, which helps to combine the benefit of both metaheuristic algorithms to discover a subset of genes which helps to predict a cancer disease in early stage. Further, to enhance the accuracy of the CSSMO algorithm, we choose a cleaning process, minimum redundancy maximum relevance (mRMR) to lessen the gene expression of cancer datasets. Next, these subsets of genes are classified using deep learning (DL) to identify different groups or classes related to a particular cancer disease. Eight different benchmark microarray gene expression datasets of cancer have been utilized to analyze the performance of the proposed approach with different evaluation matrix such as recall, precision, F1-score, and confusion matrix. The proposed gene selection method with DL achieves much better classification accuracy than other existing DL and machine learning classification models with all large gene expression dataset of cancer.

Journal Article

Share this book

Add to My Shelf

Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification

by Devaraj, D , M Jansi Rani in Cancer , Classification , Colon

2019

Cancer is a deadly disease which requires a very complex and costly treatment. Microarray data classification plays an important role in cancer treatment. An efficient gene selection technique to select the more promising genes is necessary for cancer classification. Here, we propose a Two-stage MI-GA Gene Selection algorithm for selecting informative genes in cancer data classification. In the first stage, Mutual Information based gene selection is applied which selects only the genes that have high information related to the cancer. The genes which have high mutual information value are given as input to the second stage. The Genetic Algorithm based gene selection is applied in the second stage to identify and select the optimal set of genes required for accurate classification. For classification, Support Vector Machine (SVM) is used. The proposed MI-GA gene selection approach is applied to Colon, Lung and Ovarian cancer datasets and the results show that the proposed gene selection approach results in higher classification accuracy compared to the existing methods.

Journal Article

Share this book

Add to My Shelf

Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights

in Accuracy , Breast cancer , Cancer

2024

The identification of relevant biomarkers from high-dimensional cancer data remains a significant challenge due to the complexity and heterogeneity inherent in various cancer types. Conventional feature selection methods often struggle to effectively navigate the vast solution space while maintaining high predictive accuracy. In response to these challenges, we introduce a novel feature selection approach that integrates Random Drift Optimization (RDO) with XGBoost, specifically designed to enhance the performance of cancer classification tasks. Our proposed framework not only improves classification accuracy but also offers valuable insights into the underlying biological mechanisms driving cancer progression. Through comprehensive experiments conducted on real-world cancer datasets, including Central Nervous System (CNS), Leukemia, Breast, and Ovarian cancers, we demonstrate the efficacy of our method in identifying a smaller subset of unique and relevant genes. This selection results in significantly improved classification efficiency and accuracy. When compared with popular classifiers such as Support Vector Machine, K-Nearest Neighbor, and Naive Bayes, our approach consistently outperforms these models in terms of both accuracy and F-measure metrics. For instance, our framework achieved an accuracy of 97.24% in the CNS dataset, 99.14% in Leukemia, 95.21% in Ovarian, and 87.62% in Breast cancer, showcasing its robustness and effectiveness across different types of cancer data. These results underline the potential of our RDO-XGBoost framework as a promising solution for feature selection in cancer data analysis, offering enhanced predictive performance and valuable biological insights.

Journal Article

Share this book

Add to My Shelf

Classification of cancer cells and gene selection based on microarray data using MOPSO algorithm

by Makarem, Dorna , Rahimi, Mohammad Reza , Armaghan, Seyed Mostafa in Accuracy , Algorithms , Bayes Theorem

2023

Purpose Microarray information is crucial for the identification and categorisation of malignant tissues. The very limited sample size in the microarray has always been a challenge for classification design in cancer research. As a result, by pre-processing gene selection approaches and genes lacking their information, the microarray data are deleted prior to categorisation. In essence, an appropriate gene selection technique can significantly increase the accuracy of illness (cancer) classification. Methods For the classification of high-dimensional microarray data, a novel approach based on the hybrid model of multi-objective particle swarm optimisation (MOPSO) is proposed in this research. First, a binary vector representing each particle’s position is presented at random. A gene is represented by each bit. Bit 0 denotes the absence of selection of the characteristic (gene) corresponding to it, while bit 1 denotes the selection of the gene. Therefore, the position of each particle represents a set of genes, and the linear Bayesian discriminant analysis classification algorithm calculates each particle’s degree of fitness to assess the quality of the gene set that particle has chosen. The suggested methodology is applied to four different cancer database sets, and the results are contrasted with those of other approaches currently in use. Results The proposed algorithm has been applied on four sets of cancer database and its results have been compared with other existing methods. The results of the implementation show that the improvement of classification accuracy in the proposed algorithm compared to other methods for four sets of databases is 25.84% on average. So that it has improved by 18.63% in the blood cancer database, 24.25% in the lung cancer database, 27.73% in the breast cancer database, and 32.80% in the prostate cancer database. Therefore, the proposed algorithm is able to identify a small set of genes containing information in a way choose to increase the classification accuracy. Conclusion Our proposed solution is used for data classification, which also improves classification accuracy. This is possible because the MOPSO model removes redundancy and reduces the number of redundant and redundant genes by considering how genes are correlated with each other.

Journal Article

Share this book

Add to My Shelf

Feature-specific quantile normalization and feature-specific mean–variance normalization deliver robust bi-directional classification and feature selection performance between microarray and RNAseq data

by Ghosh, Sunita , Spratlin, Jennifer , Skubleny, Daniel in Algorithms , Automatic classification , Bioinformatics

2024

Background Cross-platform normalization seeks to minimize technological bias between microarray and RNAseq whole-transcriptome data. Incorporating multiple gene expression platforms permits external validation of experimental findings, and augments training sets for machine learning models. Here, we compare the performance of Feature Specific Quantile Normalization (FSQN) to a previously used but unvalidated and uncharacterized method we label as Feature Specific Mean Variance Normalization (FSMVN). We evaluate the performance of these methods for bidirectional normalization in the context of nested feature selection. Results FSQN and FSMVN provided clinically equivalent bidirectional model performance with and without feature selection for colon CMS and breast PAM50 classification. Using principal component analysis, we determine that these methods eliminate batch effects related to technological platforms. Without feature selection, no statistical difference was identified between the performance of FSQN and FSMVN of cross-platform data compared to within-platform distributions. Under optimal feature selection conditions, balanced accuracy was FSQN and FSMVN were statistically equivalent to the within-platform distribution performance in multivariable linear regression analysis. FSQN and FSMVN also provided similar performance to within-platform distributions as the number of selected genes used to create models decreases. Conclusions In the context of generating supervised machine learning classifiers for molecular subtypes, FSQN and FSMVN are equally effective. Under optimal modeling conditions, FSQN and FSMVN provide equivalent model accuracy performance on cross-platform normalization data compared to within-platform data. Using cross-platform data should still be approached with caution as subtle performance differences may exist depending on the classification problem, training, and testing distributions.

Journal Article

Share this book

Add to My Shelf

Comparison of RNA-seq and microarray-based models for clinical endpoint prediction

by Chierici, Marco , Yin, Ye , Furlanello, Cesare in Adolescent , Adult , Annotations

2015

Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

Journal Article

Share this book

Add to My Shelf

Deep learning assisted cancer disease prediction from gene expression data using WT-GAN

by Gunavathi, C. , Ravindran, U. in Accuracy , Algorithms , Analysis

2024

Several diverse fields including the healthcare system and drug development sectors have benefited immensely through the adoption of deep learning (DL), which is a subset of artificial intelligence (AI) and machine learning (ML). Cancer makes up a significant percentage of the illnesses that cause early human mortality across the globe, and this situation is likely to rise in the coming years, especially when non-communicable illnesses are not considered. As a result, cancer patients would greatly benefit from precise and timely diagnosis and prediction. Deep learning (DL) has become a common technique in healthcare due to the abundance of computational power. Gene expression datasets are frequently used in major DL-based applications for illness detection, notably in cancer therapy. The quantity of medical data, on the other hand, is often insufficient to fulfill deep learning requirements. Microarray gene expression datasets are used for training procedures despite their extreme dimensionality, limited volume of data samples, and sparsely available information. Data augmentation is commonly used to expand the training sample size for gene data. The Wasserstein Tabular Generative Adversarial Network (WT-GAN) model is used for the data augmentation process for generating synthetic data in this proposed work. The correlation-based feature selection technique selects the most relevant characteristics based on threshold values. Deep FNN and ML algorithms train and classify the gene expression samples. The augmented data give better classification results (> 97%) when using WT-GAN for cancer diagnosis.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter