Catalogue Search | MBRL

Prediction of Alzheimer’s disease using blood gene expression data

by Lee, Taesic , Lee, Hyunju in 38/61 , 45/61 , 631/114/2164

2020

Identification of AD (Alzheimer’s disease)-related genes obtained from blood samples is crucial for early AD diagnosis. We used three public datasets, ADNI, AddNeuroMed1 (ANM1), and ANM2, for this study. Five feature selection methods and five classifiers were used to curate AD-related genes and discriminate AD patients, respectively. In the internal validation (five-fold cross-validation within each dataset), the best average values of the area under the curve (AUC) were 0.657, 0.874, and 0.804 for ADNI, ANMI, and ANM2, respectively. In the external validation (training and test sets from different datasets), the best AUCs were 0.697 (training: ADNI to testing: ANM1), 0.764 (ADNI to ANM2), 0.619 (ANM1 to ADNI), 0.79 (ANM1 to ANM2), 0.655 (ANM2 to ADNI), and 0.859 (ANM2 to ANM1), respectively. These results suggest that although the classification performance of ADNI is relatively lower than that of ANM1 and ANM2, classifiers trained using blood gene expression can be used to classify AD for other data sets. In addition, pathway analysis showed that AD-related genes were enriched with inflammation, mitochondria, and Wnt signaling pathways. Our study suggests that blood gene expression data are useful in predicting the AD classification.

Journal Article

Share this book

Add to My Shelf

Biomedical named entity recognition using deep neural networks with contextual information

by Cho, Hyejin , Lee, Hyunju in Algorithms , Artificial neural networks , Bioinformatics

2019

Background In biomedical text mining, named entity recognition (NER) is an important task used to extract information from biomedical articles. Previously proposed methods for NER are dictionary- or rule-based methods and machine learning approaches. However, these traditional approaches are heavily reliant on large-scale dictionaries, target-specific rules, or well-constructed corpora. These methods to NER have been superseded by the deep learning-based approach that is independent of hand-crafted features. However, although such methods of NER employ additional conditional random fields (CRF) to capture important correlations between neighboring labels, they often do not incorporate all the contextual information from text into the deep learning layers. Results We propose herein an NER system for biomedical entities by incorporating n-grams with bi-directional long short-term memory (BiLSTM) and CRF; this system is referred to as a contextual long short-term memory networks with CRF (CLSTM). We assess the CLSTM model on three corpora: the disease corpus of the National Center for Biotechnology Information (NCBI), the BioCreative II Gene Mention corpus (GM), and the BioCreative V Chemical Disease Relation corpus (CDR). Our framework was compared with several deep learning approaches, such as BiLSTM, BiLSTM with CRF, GRAM-CNN, and BERT. On the NCBI corpus, our model recorded an F-score of 85.68% for the NER of diseases, showing an improvement of 1.50% over previous methods. Moreover, although BERT used transfer learning by incorporating more than 2.5 billion words, our system showed similar performance with BERT with an F-scores of 81.44% for gene NER on the GM corpus and a outperformed F-score of 86.44% for the NER of chemicals and diseases on the CDR corpus. We conclude that our method significantly improves performance on biomedical NER tasks. Conclusion The proposed approach is robust in recognizing biological entities in text.

Journal Article

Share this book

Add to My Shelf

Identifying disease-gene associations using a convolutional neural network-based model by embedding a biological knowledge graph with entity descriptions

by Choi, Wonjun , Lee, Hyunju in Analysis , Artificial neural networks , Breast

2021

Understanding the role of genes in human disease is of high importance. However, identifying genes associated with human diseases requires laborious experiments that involve considerable effort and time. Therefore, a computational approach to predict candidate genes related to complex diseases including cancer has been extensively studied. In this study, we propose a convolutional neural network-based knowledge graph-embedding model (KGED), which is based on a biological knowledge graph with entity descriptions to infer relationships between biological entities. As an application demonstration, we generated gene-interaction networks for each cancer type using gene-gene relationships inferred by KGED. We then analyzed the constructed gene networks using network centrality measures, including betweenness, closeness, degree, and eigenvector centrality metrics, to rank the central genes of the network and identify highly correlated cancer genes. Furthermore, we evaluated our proposed approach for prostate, breast, and lung cancers by comparing the performance with that of existing approaches. The KGED model showed improved performance in predicting cancer-related genes using the inferred gene-gene interactions. Thus, we conclude that gene-gene interactions inferred by KGED can be helpful for future research, such as that aimed at future research on pathogenic mechanisms of human diseases, and contribute to the field of disease treatment discovery.

Journal Article

Share this book

Add to My Shelf

HUBO and QUBO models for prime factorization

by Jun, Kyungtaek , Lee, Hyunju in 639/705/1041 , 639/705/1042 , Humanities and Social Sciences

2023

The security of the RSA cryptosystem is based on the difficulty of factoring a large number N into prime numbers p and q satisfying N = p × q . This paper presents a prime factorization method using a D-Wave quantum computer that could threaten the RSA cryptosystem in the future. The starting point for this method is very simple, representing two prime numbers as qubits. Then, we set the difference between the product of the two prime numbers expressed in qubits and N as a cost function, and we find the solution when the cost function is minimized. D-Wave's quantum annealer can find the minimum value of any quadratic problem. However, the cost function must be a higher-order unconstrained optimization (HUBO) model because it contains second- or higher-order terms. We used a hybrid solver accessible via Leap, D-Wave’s real-time quantum cloud service, and the dimod package provided by the D-Wave Ocean software development kit (SDK) to solve the HUBO problem. We also successfully factorized 102,454,763 with 26 logical qubits. In addition, we factorized 1,000,070,001,221 using the range-dependent Hamiltonian algorithm.

Journal Article

Share this book

Add to My Shelf

Crossfeat: a transformer-based cross-feature learning model for predicting drug side effect frequency

by Baek, Bin , Lee, Hyunju in Ablation , Adverse and side effects , Algorithms

2024

Background Safe drug treatment requires an understanding of the potential side effects. Identifying the frequency of drug side effects can reduce the risks associated with drug use. However, existing computational methods for predicting drug side effect frequencies heavily depend on known drug side effect frequency information. Consequently, these methods face challenges when predicting the side effect frequencies of new drugs. Although a few methods can predict the side effect frequencies of new drugs, they exhibit unreliable performance owing to the exclusion of drug-side effect relationships. Results This study proposed CrossFeat, a model based on convolutional neural network-transformer architecture with cross-feature learning that can predict the occurrence and frequency of drug side effects for new drugs, even in the absence of information regarding drug-side effect relationships. CrossFeat facilitates the concurrent learning of drugs and side effect information within its transformer architecture. This simultaneous exchange of information enables drugs to learn about their associated side effects, while side effects concurrently acquire information about the respective drugs. Such bidirectional learning allows for the comprehensive integration of drug and side effect knowledge. Our five-fold cross-validation experiments demonstrated that CrossFeat outperforms existing studies in predicting side effect frequencies for new drugs without prior knowledge. Conclusions Our model offers a promising approach for predicting the drug side effect frequencies, particularly for new drugs where prior information is limited. CrossFeat’s superior performance in cross-validation experiments, along with evidence from case studies and ablation experiments, highlights its effectiveness.

Journal Article

Share this book

Add to My Shelf

Range dependent Hamiltonian algorithms for numerical QUBO formulation

by Jun, Kyungtaek , Lee, Hyunju in 631/57/2266 , 639/705/1042 , 639/766/400/1106

2025

With the advent and development of quantum computers, various quantum algorithms that can solve linear equations and eigenvalues faster than classical computers have been developed. In particular, a hybrid solver provided by D-Wave’s Leap quantum cloud service can utilize up to two million variables. Using this technology, quadratic unconstrained binary optimization (QUBO) models have been proposed for linear systems, eigenvalue problems, RSA cryptosystems, and computed tomography (CT) image reconstructions. Generally, QUBO formulation is obtained through simple arithmetic operations, which offers great potential for future development with the progress of quantum computers. A common method here was to binarize the variables and match them to multiple qubits. To achieve the accuracy of 64 bits per variable, 64 logical qubits must be used. Finding the global minimum energy in quantum optimization becomes more difficult as more logical qubits are used; thus, a quantum parallel computing algorithm that can create and compute multiple QUBO models is introduced here. This new algorithm divides the entire domain each variable can have into multiple subranges to generate QUBO models. This paper demonstrates the superior performance of this new algorithm particularly when utilizing an algorithm for binary variables.

Journal Article

Share this book

Add to My Shelf

Quantum optimization algorithms for CT image segmentation from X-ray data

by Jun, Kyungtaek , Lee, Hyunju in 631/1647/245/1847 , 639/624/1107/510 , 639/766/483/481

2025

Computed tomography (CT) is an important imaging technique used in medical analysis of the internal structure of the human body. Previously, image segmentation methods were required after acquiring reconstructed CT images to obtain segmented CT images which made it susceptible to errors from both reconstruction and segmentation algorithms. However, this paper introduces a new approach using an advanced quantum optimization algorithm called quadratic unconstrained binary optimization (QUBO) for CT image segmentation. This algorithm allows CT image reconstruction and segmentation to be performed simultaneously. This algorithm segments CT images by minimizing the difference between a sinogram in a superposition state with qubits, obtained using the mathematical projection including the Radon transform, and the experimentally acquired sinogram from X-ray images for various angles. Furthermore, we leveraged X-ray mass attenuation coefficients to reduce the number of logical qubits required for our quantum optimization algorithm, and we employed D-Wave’s hybrid solver to solve the optimization problem. We compared the segmentation results of our algorithm with those of classical algorithms using X-ray images of actual tooth samples to validate the results of our algorithm. The comparison revealed that, after undergoing appropriate image post-processing, our algorithm’s segmentation results matched those of classical algorithms that perform segmentation after reconstruction, except for some pixels at the boundary. We expect that the new quantum optimization CT algorithm will bring about great advancements in medical imaging.

Journal Article

Share this book

Add to My Shelf

Molecular data representation based on gene embeddings for cancer drug response prediction

by Park, Sejin , Lee, Hyunju in 631/114 , 631/154 , Antineoplastic Agents - pharmacology

2023

Cancer drug response prediction is a crucial task in precision medicine, but existing models have limitations in effectively representing molecular profiles of cancer cells. Specifically, when these models represent molecular omics data such as gene expression, they employ a one-hot encoding-based approach, where a fixed gene set is selected for all samples and omics data values are assigned to specific positions in a vector. However, this approach restricts the utilization of embedding-vector-based methods, such as attention-based models, and limits the flexibility of gene selection. To address these issues, our study proposes gene embedding-based fully connected neural networks (GEN) that utilizes gene embedding vectors as input data for cancer drug response prediction. The GEN allows for the use of embedding-vector-based architectures and different gene sets for each sample, providing enhanced flexibility. To validate the efficacy of GEN, we conducted experiments on three cancer drug response datasets. Our results demonstrate that GEN outperforms other recently developed methods in cancer drug prediction tasks and offers improved gene representation capabilities. All source codes are available at https://github.com/DMCB-GIST/GEN/ .

Journal Article

Share this book

Add to My Shelf

Joint triplet loss with semi-hard constraint for data augmentation and disease prediction using gene expression data

by Chung, Yeonwoo , Lee, Hyunju in 631/114/2397 , 631/67 , Alzheimer Disease - genetics

2023

The accurate prediction of patients with complex diseases, such as Alzheimer’s disease (AD), as well as disease stages, including early- and late-stage cancer, is challenging owing to substantial variability among patients and limited availability of clinical data. Deep metric learning has emerged as a promising approach for addressing these challenges by improving data representation. In this study, we propose a joint triplet loss model with a semi-hard constraint (JTSC) to represent data in a small number of samples. JTSC strictly selects semi-hard samples by switching anchors and positive samples during the learning process in triplet embedding and combines a triplet loss function with an angular loss function. Our results indicate that JTSC significantly improves the number of appropriately represented samples during training when applied to the gene expression data of AD and to cancer stage prediction tasks. Furthermore, we demonstrate that using an embedding vector from JTSC as an input to the classifiers for AD and cancer stage prediction significantly improves classification performance by extracting more accurate features. In conclusion, we show that feature embedding through JTSC can aid in classification when there are a small number of samples compared to a larger number of features.

Journal Article

Share this book

Add to My Shelf

DeepPIG: deep neural network architecture with pairwise connected layers and stochastic gates using knockoff frameworks for feature selection

by Oh, Euiyoung , Lee, Hyunju in 631/67/1857 , 692/308/53/2422 , Feature selection

2024

Selecting relevant feature subsets is essential for machine learning applications. Among the feature selection techniques, the knockoff filter procedure proposes a unique framework that minimizes false discovery rates (FDR). However, employing a deep neural network architecture for a knockoff filter framework requires higher detection power. Using the knockoff filter framework, we present a Deep neural network with PaIrwise connected layers integrated with stochastic Gates (DeepPIG) for the feature selection model. DeepPIG exhibited better detection power in synthetic data than the baseline and recent models such as Deep feature selection using Paired-Input Nonlinear Knockoffs (DeepPINK), Stochastic Gates (STG), and SHapley Additive exPlanations (SHAP) while not violating the preselected FDR level, especially when the signal of the features were weak. The selected features determined by DeepPIG demonstrated superior classification performance compared with the baseline model in real-world data analyses, including the prediction of certain cancer prognosis and classification tasks using microbiome and single-cell datasets. In conclusion, DeepPIG is a robust feature selection approach even when the signals of features are weak. Source code is available at https://github.com/DMCB-GIST/DeepPIG .

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter