Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
142
result(s) for
"Hong, Huixiao"
Sort by:
Persistent Organic Pollutants in Food: Contamination Sources, Health Effects and Detection Methods
by
Guo, Wenjing
,
Pan, Bohu
,
Ge, Weigong
in
Environmental Monitoring
,
Environmental Pollutants - chemistry
,
Food Contamination - analysis
2019
Persistent organic pollutants (POPs) present in foods have been a major concern for food safety due to their persistence and toxic effects. To ensure food safety and protect human health from POPs, it is critical to achieve a better understanding of POP pathways into food and develop strategies to reduce human exposure. POPs could present in food in the raw stages, transferred from the environment or artificially introduced during food preparation steps. Exposure to these pollutants may cause various health problems such as endocrine disruption, cardiovascular diseases, cancers, diabetes, birth defects, and dysfunctional immune and reproductive systems. This review describes potential sources of POP food contamination, analytical approaches to measure POP levels in food and efforts to control food contamination with POPs.
Journal Article
Integrating Molecular Dynamics, Molecular Docking, and Machine Learning for Predicting SARS-CoV-2 Papain-like Protease Binders
by
Liu, Jie
,
Varghese, Ann
,
Patterson, Tucker A.
in
Antiviral agents
,
Antiviral Agents - chemistry
,
Antiviral Agents - pharmacology
2025
Coronavirus disease 2019 (COVID-19) produced devastating health and economic impacts worldwide. While progress has been made in vaccine development, effective antiviral treatments remain limited, particularly those targeting the papain-like protease (PLpro) of SARS-CoV-2. PLpro plays a key role in viral replication and immune evasion, making it an attractive yet underexplored target for drug repurposing. In this study, we combined machine learning, molecular dynamics, and molecular docking to identify potential PLpro inhibitors in existing drugs. We performed long-timescale molecular dynamics simulations on PLpro–ligand complexes at two known binding sites, followed by structural clustering to capture representative structures. These were used for molecular docking, including a training set of 127 compounds and a library of 1107 FDA-approved drugs. A random forest model, trained on the docking scores of the representative conformations, yielded 76.4% accuracy via leave-one-out cross-validation. Applying the model to the drug library and filtering results based on prediction confidence and the applicability domain, we identified five drugs as promising candidates for repurposing for COVID-19 treatment. Our findings demonstrate the power of integrating computational modeling with machine learning to accelerate drug repurposing against emerging viral targets.
Journal Article
Development of Decision Forest Models for Prediction of Drug-Induced Liver Injury in Humans Using A Large Set of FDA-approved Drugs
2017
Drug-induced liver injury (DILI) presents a significant challenge to drug development and regulatory science. The FDA’s Liver Toxicity Knowledge Base (LTKB) evaluated >1000 drugs for their likelihood of causing DILI in humans, of which >700 drugs were classified into three categories (most-DILI, less-DILI, and no-DILI). Based on this dataset, we developed and compared 2-class and 3-class DILI prediction models using the machine learning algorithm of Decision Forest (DF) with Mold2 structural descriptors. The models were evaluated through 1000 iterations of 5-fold cross-validations, 1000 bootstrapping validations and 1000 permutation tests (that assessed the chance correlation). Furthermore, prediction confidence analysis was conducted, which provides an additional parameter for proper interpretation of prediction results. We revealed that the 3-class model not only had a higher resolution to estimate DILI risk but also showed an improved capability to differentiate most-DILI drugs from no-DILI drugs in comparison with the 2-class DILI model. We demonstrated the utility of the models for drug ingredients with warnings very recently issued by the FDA. Moreover, we identified informative molecular features important for assessing DILI risk. Our results suggested that the 3-class model presents a better option than the binary model (which most publications are focused on) for drug safety evaluation.
Journal Article
Deep learning architectures for multi-label classification of intelligent health risk prediction
by
Weng, Heng
,
Hong, Huixiao
,
Ou, Aihua
in
Algorithms
,
Artificial neural networks
,
Bioinformatics
2017
Background
Multi-label classification of data remains to be a challenging problem. Because of the complexity of the data, it is sometimes difficult to infer information about classes that are not mutually exclusive. For medical data, patients could have symptoms of multiple different diseases at the same time and it is important to develop tools that help to identify problems early. Intelligent health risk prediction models built with deep learning architectures offer a powerful tool for physicians to identify patterns in patient data that indicate risks associated with certain types of chronic diseases.
Results
Physical examination records of 110,300 anonymous patients were used to predict diabetes, hypertension, fatty liver, a combination of these three chronic diseases, and the absence of disease (8 classes in total). The dataset was split into training (90%) and testing (10%) sub-datasets. Ten-fold cross validation was used to evaluate prediction accuracy with metrics such as precision, recall, and
F
-score. Deep Learning (DL) architectures were compared with standard and state-of-the-art multi-label classification methods. Preliminary results suggest that Deep Neural Networks (DNN), a DL architecture, when applied to multi-label classification of chronic diseases, produced accuracy that was comparable to that of common methods such as Support Vector Machines. We have implemented DNNs to handle both problem transformation and algorithm adaption type multi-label methods and compare both to see which is preferable.
Conclusions
Deep Learning architectures have the potential of inferring more information about the patterns of physical examination data than common classification methods. The advanced techniques of Deep Learning can be used to identify the significance of different features from physical examination data as well as to learn the contributions of each feature that impact a patient’s risk for chronic diseases. However, accurate prediction of chronic disease risks remains a challenging problem that warrants further studies.
Journal Article
Structure–activity relationship-based chemical classification of highly imbalanced Tox21 datasets
2020
The specificity of toxicant-target biomolecule interactions lends to the very imbalanced nature of many toxicity datasets, causing poor performance in Structure–Activity Relationship (SAR)-based chemical classification. Undersampling and oversampling are representative techniques for handling such an imbalance challenge. However, removing inactive chemical compound instances from the majority class using an undersampling technique can result in information loss, whereas increasing active toxicant instances in the minority class by interpolation tends to introduce artificial minority instances that often cross into the majority class space, giving rise to class overlapping and a higher false prediction rate. In this study, in order to improve the prediction accuracy of imbalanced learning, we employed SMOTEENN, a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms, to oversample the minority class by creating synthetic samples, followed by cleaning the mislabeled instances. We chose the highly imbalanced Tox21 dataset, which consisted of 12 in vitro bioassays for > 10,000 chemicals that were distributed unevenly between binary classes. With Random Forest (RF) as the base classifier and bagging as the ensemble strategy, we applied four hybrid learning methods, i.e., RF without imbalance handling (RF), RF with Random Undersampling (RUS), RF with SMOTE (SMO), and RF with SMOTEENN (SMN). The performance of the four learning methods was compared using nine evaluation metrics, among which F
1
score, Matthews correlation coefficient and Brier score provided a more consistent assessment of the overall performance across the 12 datasets. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that SMN significantly outperformed the other three methods. We also found that a strong negative correlation existed between the prediction accuracy and the imbalance ratio (IR), which is defined as the number of inactive compounds divided by the number of active compounds. SMN became less effective when IR exceeded a certain threshold (e.g., > 28). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. This work demonstrates that the performance of SAR-based, imbalanced chemical toxicity classification can be significantly improved through the use of data rebalancing.
Journal Article
Multimodal feature fusion machine learning for predicting chronic injury induced by engineered nanomaterials
2025
Concerns regarding chronic injuries (
e.g
., fibrosis and carcinogenesis) induced by nanoparticles raised public health concerns and need to be rapidly assessed in hazard identification. Although in silico analysis is commonly used for risk assessment of chemicals, predicting chronic in vivo nanotoxicity remains challenging due to the intricate interactions at multiple interfaces like nano-biofluids and nano-subcellular organelles. Herein, we develop a multimodal feature fusion analysis framework to predict the fibrogenic potential of metal oxide nanoparticles (MeONPs) in female mice. Treating each nano-bio interface as an independent entity, eighty-seven features derived from MeONP-lung interactions are used to develop a machine learning-based predictive framework for lung fibrosis. We identify cell damage and cytokine (IL-1β and TGF-β1) production in macrophages and epithelial cells as key events closely associated with particle size, surface charge, and lysosome interactions. Experimental validations show that the developed in silico model has 85% accuracy. Our findings demonstrate the potential usefulness of this predictive model for risk assessment of nanomaterials and in assisting regulatory decision-making. While the model is developed based on 52 MeONPs, further validation using a larger nanoparticle library is necessary to confirm its broader applicability.
The prediction of chronic toxicity is a major challenge in nanotoxicity studies. Here, the authors present an in silico framework for predicting ENM-induced lung fibrosis, displaying 85% accuracy in experimental validation and leading to identification of key events at nano-bio interfaces that allows mechanism interpretation of ENM-induced lung fibrosis.
Journal Article
Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method
by
Yang, Jingcheng
,
Chen, Qingwang
,
Hong, Huixiao
in
Algorithms
,
Animal Genetics and Genomics
,
Base Composition
2023
Background
Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios.
Results
As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies.
Conclusions
Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
Journal Article
Analysis of Structures of SARS-CoV-2 Papain-like Protease Bound with Ligands Unveils Structural Features for Inhibiting the Enzyme
2025
The COVID-19 pandemic, driven by the novel coronavirus SARS-CoV-2, has drastically reshaped global health and socioeconomic landscapes. The papain-like protease (PLpro) plays a critical role in viral polyprotein cleavage and immune evasion, making it a prime target for therapeutic intervention. Numerous compounds have been identified as inhibitors of SARS-CoV-2 PLpro, with many characterized through crystallographic studies. To date, over 70 three-dimensional (3D) structures of PLpro complexed ligands have been deposited in the Protein Data Bank, offering valuable insight into ligand-binding features that could aid the discovery and development of effective COVID-19 treatments targeting PLpro. In this study, we reviewed and analyzed these 3D structures, focusing on the key residues involved in ligand interactions. Our analysis revealed that most inhibitors bind to PLpro’s substrate recognition sites S3/S4 and SUb2. While these sites are highly attractive and have been extensively explored, other potential binding regions, such as SUb1 and the Zn(II) domain, are less explored and may hold untapped potential for future COVID-19 drug discovery and development. Our structural analysis provides insights into the molecular features of PLpro that could accelerate the development of novel therapeutics targeting this essential viral enzyme.
Journal Article
Deep Learning Methods for Omics Data Imputation
by
Song, Meng
,
Deng, Hong-Wen
,
Huang, Lei
in
Biological Sciences
,
data collection
,
Deep learning
2023
One common problem in omics data analysis is missing values, which can arise due to various reasons, such as poor tissue quality and insufficient sample volumes. Instead of discarding missing values and related data, imputation approaches offer an alternative means of handling missing data. However, the imputation of missing omics data is a non-trivial task. Difficulties mainly come from high dimensionality, non-linear or non-monotonic relationships within features, technical variations introduced by sampling methods, sample heterogeneity, and the non-random missingness mechanism. Several advanced imputation methods, including deep learning-based methods, have been proposed to address these challenges. Due to its capability of modeling complex patterns and relationships in large and high-dimensional datasets, many researchers have adopted deep learning models to impute missing omics data. This review provides a comprehensive overview of the currently available deep learning-based methods for omics imputation from the perspective of deep generative model architectures such as autoencoder, variational autoencoder, generative adversarial networks, and Transformer, with an emphasis on multi-omics data imputation. In addition, this review also discusses the opportunities that deep learning brings and the challenges that it might face in this field.
Journal Article
Similarities and differences between variants called with human reference genome HG19 or HG38
2019
Background
Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed.
Methods
We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs.
Results
The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected).
Conclusion
A significant number of SNVs could not be converted between HG19 and HG38. Based on careful review of our comparisons, we recommend HG38 (the newer version) for NGS SNV analysis. To summarize, our findings suggest caution when translating identified SNVs between different versions of the human reference genome.
Journal Article