Catalogue Search | MBRL

A data science roadmap for open science organizations engaged in early-stage drug discovery

by Haibe-Kains, Benjamin , Wang, Yanli , Schütt, Kristof T. in 639/638/309/2144 , 706/648/697 , Artificial Intelligence

2024

The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery. We present here the recommendations of a working group composed of experts from both the public and private sectors. Robust data management requires precise ontologies and standardized vocabulary while a centralized database architecture across laboratories facilitates data integration into high-value datasets. Lab automation and opening electronic lab notebooks to data mining push the boundaries of data sharing and data modeling. Important considerations for building robust machine-learning models include transparent and reproducible data processing, choosing the most relevant data representation, defining the right training and test sets, and estimating prediction uncertainty. Beyond data-sharing, cloud-based computing can be harnessed to build and disseminate machine-learning models. Important vectors of acceleration for hit and chemical probe discovery will be (1) the real-time integration of experimental data generation and modeling workflows within design-make-test-analyze (DMTA) cycles openly, and at scale and (2) the adoption of a mindset where data scientists and experimentalists work as a unified team, and where data science is incorporated into the experimental design. Artificial intelligence is greatly accelerating research in drug discovery, but its development is still hindered by the lack of available data. Here the authors present data management and data science recommendations to help reach AI’s potential in the field.

Journal Article

Share this book

Add to My Shelf

One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening

by Zakharov, Alexey V. , Maxfield, Travis , Jain, Sankalp in Accuracy , Best practice , Chemistry

2025

Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we now recommend the use of models with the highest positive predictive value (PPV) built on imbalanced training sets as preferred virtual screening tools. This recommendation stems from practical considerations of how the results of virtual screening are used in experimental laboratories where only a small fraction of virtually screened molecules can be tested using standard well plates. As a proof of concept, we have developed QSAR models for five expansive datasets with different ratios of active and inactive molecules and compared model performance in virtual screening using BA, PPV, and other metrics. We show that training on imbalanced datasets achieves a hit rate at least 30% higher than using balanced datasets, and that the PPV metric captured this difference of performance with no parameter tuning. Importantly, hit rates were estimated for top scoring compounds organized in batches of the size of plates (for instance, 128 molecules) used in the experimental high throughput screening. Based on the results of our studies, we posit that QSAR models trained on imbalanced datasets with the highest PPV should be relied upon to identify and test hit compounds in early drug discovery studies.

Journal Article

Share this book

Add to My Shelf

Synthesis of 5-Benzylamino and 5-Alkylamino-Substituted Pyrimido4,5-cquinoline Derivatives as CSNK2A Inhibitors with Antiviral Activity

by Dickmander, Rebekah J. , Smith, Jeffery L. , Asressu, Kesatebrhan Haile in Analysis , antiviral , Antiviral agents

2024

A series of 5-benzylamine-substituted pyrimido[4,5-c]quinoline derivatives of the CSNK2A chemical probe SGC-CK2-2 were synthesized with the goal of improving kinase inhibitor cellular potency and antiviral phenotypic activity while maintaining aqueous solubility. Among the range of analogs, those bearing electron-withdrawing (4c and 4g) or donating (4f) substituents on the benzyl ring as well as introduction of non-aromatic groups such as the cyclohexylmethyl (4t) were shown to maintain CSNK2A activity. The CSNK2A activity was also retained with N-methylation of SGC-CK2-2, but α-methyl substitution of the benzyl substituent led to a 10-fold reduction in potency. CSNK2A inhibition potency was restored with indene-based compound 4af, with activity residing in the S-enantiomer (4ag). Analogs with the highest CSNK2A potency showed good activity for inhibition of Mouse Hepatitis Virus (MHV) replication. Conformational analysis indicated that analogs with the best CSNK2A inhibition (4t, 4ac, and 4af) exhibited smaller differences between their ground state conformation and their predicted binding pose. Analogs with reduced activity (4ad, 4ae, and 4ai) required more substantial conformational changes from their ground state within the CSNK2A protein pocket.

Journal Article

Share this book

Add to My Shelf

A Novel Machine Learning Model and a Web Portal for Predicting the Human Skin Sensitization Effects of Chemical Agents

by Moreira-Filho, José Teófilo , Martin, Holli-Joi , Tropsha, Alexander in Accuracy , Algorithms , Allergy

2024

Skin sensitization is a significant concern for chemical safety assessments. Traditional animal assays often fail to predict human responses accurately, and ethical constraints limit the collection of human data, necessitating a need for reliable in silico models of skin sensitization prediction. This study introduces HuSSPred, an in silico tool based on the Human Predictive Patch Test (HPPT). HuSSPred aims to enhance the reliability of predicting human skin sensitization effects for chemical agents to support their regulatory assessment. We have curated an extensive HPPT database and performed chemical space analysis and grouping. Binary and multiclass QSAR models were developed with Bayesian hyperparameter optimization. Model performance was evaluated via five-fold cross-validation. We performed model validation with reference data from the Defined Approaches for Skin Sensitization (DASS) app. HuSSPred models demonstrated strong predictive performance with CCR ranging from 55 to 88%, sensitivity between 48 and 89%, and specificity between 37 and 92%. The positive predictive value (PPV) ranged from 84 to 97%, versus negative predictive value (NPV) from 22 to 65%, and coverage was between 75 and 93%. Our models exhibited comparable or improved performance compared to existing tools, and the external validation showed the high accuracy and sensitivity of the developed models. HuSSPred provides a reliable, open-access, and ethical alternative to traditional testing for skin sensitization. Its high accuracy and reasonable coverage make it a valuable resource for regulatory assessments, aligning with the 3Rs principles. The publicly accessible HuSSPred web tool offers a user-friendly interface for predicting skin sensitization based on chemical structure.

Journal Article

Share this book

Add to My Shelf

Development and Validation of Machine Learning Methods and Tools to Support Virtual Screening of Ultra Large Chemical Libraries and Artificial Intelligence

by Wellnitz, James in Artificial intelligence , Bioinformatics , Computational chemistry

2025

In recent years there has been an explosion in the size of readily accessible chemical space with both purchasable commercial catalogs and high throughput screening libraries spanning billions of unique small molecules. Leveraging the improved diversity of chemicals and the large number of new biological activity datapoints in tandem with virtual screening methods has been shown to be an effective way to improve the success of the early, hit-identification stages in drug discovery. However, how to effectively navigate and utilize these new and vast datasets in combination with computational tools is still an open problem. Many existing computational tools face challenges when it comes to scaling effectively to handle billions of chemicals. Further, there is a lack of existing best practices and rigorous benchmarks that make assessing the performance of different approaches difficult. Yet, even when the latter two issues are resolved, still a substantial portion of relevant data that could now benefit from these new approaches will remain inaccessible to the public/academic researchers. The limitation of data will also slow down progress in effectively developing better solutions, placing a burden on the whole process. Here we outline the development of new computational workflows for both structure-based and ligand-based virtual screening, accelerated with artificial intelligence to handle continuously growing volumes of data efficiently. We show these methods obtained superior performance compared to brute force or existing approaches, as capable of locating chemical hits with experimentally confirmed desired properties. Further, I outline an open-source data generation framework, alongside a new database, to easily share novel, large scale chemical activity datasets with the community. This new database will be released with a series of new benchmarks and validation approaches, along with new open-source software packages to establish a reproducible and easily comparable assessment of virtual screening performance. In parallel, I use these datasets and open-source framework in a proof-of-concept application to show their value in a drug discovery setting. Overall, this work aims to lay the foundations required to promote development and identification of improved virtual screening methods in the era of big data.

Dissertation

Share this book

Add to My Shelf

DeTox: an In-Silico Alternative to Animal Testing for Predicting Developmental Toxicity Potential

by Tieghi, Ricardo Scheufen , Moreira-Filho, José Teófilo , Martin, Holli-Joi

2025

Medication use among pregnant women is common, yet the safety of these medications for the developing fetus/baby is widely understudied. Quantitative Structure-Activity Relationship (QSAR) models can be used to predict the overall and trimester-specific developmental toxicity potential of chemicals, supporting the development of safer medications for pregnant women and regulatory assessment aligned with the 3Rs ( efining, educing, and eplacing) of animal testing. This study aimed to collect and curate a database of compounds classified according to their developmental toxicity potential, use this database to develop and validate QSAR models for predicting prenatal developmental toxicity, and implement models via a user-friendly online platform to support regulatory assessments of drug candidates. We compiled and curated data from the FDA and Teratogen Information System (TERIS) databases and validated annotations with rigorous literature searches. The database was leveraged to create QSAR models using machine learning algorithms (RF, SVM, LightGBM) with Bayesian hyperparameter optimization. These models were implemented into a web tool. We built a binary classification QSAR model for overall pregnancy risk, and separate QSAR models for trimester-specific risk, exhibiting correct classification rates of and 76% (overall), 80% (1 trimester), 95% (2 trimester), and 95% (3 trimester). Models showed a sensitivity between 53% and 90%, specificity between 46% and 100%, and coverage of 76% assessed using a five-fold external validation protocol. We established a publicly accessible web portal (https://detox.mml.unc.edu/) for developmental toxicity prediction of both overall and trimester-specific toxicity predictions. DeTox can be employed to support regulatory assessment of pharmaceutical and cosmetic products aligned with the 3Rs of animal testing and to guide the development of safer drugs for pregnant populations. The curated dataset of developmental toxicants is publicly available, and all models are implemented in a public, user-friendly web tool, DeTox ( velopmental icity), at https://detox.mml.unc.edu/. https://doi.org/10.1289/EHP15307.

Journal Article

Share this book

Add to My Shelf

Synthesis of 5-Benzylamino and 5-Alkylamino-Substituted Pyrimido4,5-cquinoline Derivatives as CSNK2A Inhibitors with Antiviral Activity

by Dickmander, Rebekah J , Asressu, Kesatebrhan Haile , Smith, Jeffery L

2024

A series of 5-benzylamine-substituted pyrimido[4,5-c]quinoline derivatives of the CSNK2A chemical probe SGC-CK2-2 were synthesized with the goal of improving kinase inhibitor cellular potency and antiviral phenotypic activity while maintaining aqueous solubility. Among the range of analogs, those bearing electron-withdrawing (4c and 4g) or donating (4f) substituents on the benzyl ring as well as introduction of non-aromatic groups such as the cyclohexylmethyl (4t) were shown to maintain CSNK2A activity. The CSNK2A activity was also retained with N-methylation of SGC-CK2-2, but α-methyl substitution of the benzyl substituent led to a 10-fold reduction in potency. CSNK2A inhibition potency was restored with indene-based compound 4af, with activity residing in the S-enantiomer (4ag). Analogs with the highest CSNK2A potency showed good activity for inhibition of Mouse Hepatitis Virus (MHV) replication. Conformational analysis indicated that analogs with the best CSNK2A inhibition (4t, 4ac, and 4af) exhibited smaller differences between their ground state conformation and their predicted binding pose. Analogs with reduced activity (4ad, 4ae, and 4ai) required more substantial conformational changes from their ground state within the CSNK2A protein pocket.A series of 5-benzylamine-substituted pyrimido[4,5-c]quinoline derivatives of the CSNK2A chemical probe SGC-CK2-2 were synthesized with the goal of improving kinase inhibitor cellular potency and antiviral phenotypic activity while maintaining aqueous solubility. Among the range of analogs, those bearing electron-withdrawing (4c and 4g) or donating (4f) substituents on the benzyl ring as well as introduction of non-aromatic groups such as the cyclohexylmethyl (4t) were shown to maintain CSNK2A activity. The CSNK2A activity was also retained with N-methylation of SGC-CK2-2, but α-methyl substitution of the benzyl substituent led to a 10-fold reduction in potency. CSNK2A inhibition potency was restored with indene-based compound 4af, with activity residing in the S-enantiomer (4ag). Analogs with the highest CSNK2A potency showed good activity for inhibition of Mouse Hepatitis Virus (MHV) replication. Conformational analysis indicated that analogs with the best CSNK2A inhibition (4t, 4ac, and 4af) exhibited smaller differences between their ground state conformation and their predicted binding pose. Analogs with reduced activity (4ad, 4ae, and 4ai) required more substantial conformational changes from their ground state within the CSNK2A protein pocket.

Journal Article

Share this book

Add to My Shelf

Look mom, no experimental data! Learning to score protein-ligand interactions from simulations

by Popov, Konstantin I , Tropsha, Alexander , Wellnitz, James in Binding , Deep learning , Free energy

2025

Despite recent advances in protein-ligand structure prediction, deep learning methods remain limited in their ability to accurately predict binding affinities, particularly for novel protein targets dissimilar from the training set. In contrast, physics-based binding free energy calculations offer high accuracy across chemical space but are computationally prohibitive for large-scale screening. We propose a hybrid approach that approximates the accuracy of physics-based methods by training target-specific neural networks on molecular dynamics simulations of the protein in complex with random small molecules. Our method uses force matching to learn an implicit free energy landscape of ligand binding for each target. Evaluated on six proteins, our approach achieves competitive virtual screening performance using 100-500 \\(\\mu\\)s of MD simulations per target. Notably, this approach achieves state-of-the-art early enrichment when using the true pose for active compounds. These results highlight the potential of physics-informed learning for virtual screening on novel targets. We publicly release the code for this paper at https://github.com/molecularmodelinglab/lfm under the MIT license.

Paper

Share this book

Add to My Shelf

Open-Source DNA-Encoded Library Package for Design, Decoding, and Analysis: DELi

by Zhilinskaya, Ivanna , Brandon Cole Novy , Maxfield, Travis in Benzimidazoles , Calorimetry , Deoxyribonucleic acid

2025

DNA-encoded library (DEL) technology has become a powerful tool in modern drug discovery. Fully harnessing its potential requires the use of advanced computational methodologies, which are often available only through proprietary software. This limitation restricts flexibility and accessibility for academic researchers and small biotech companies, hindering the growth of the technology. Here, we present DELi, an open-source DEL informatics platform designed for library design, NGS decoding and calling, and enrichment analysis. To showcase its capabilities, we used DELi to design an in-house custom library (UNC-DEL006), a benzimidazole-based DEL, and performed proof-of-concept selection experiments against Bromodomain-containing Protein 4 (BRD4). The DELi decoding and analysis modules identified top-performing compounds, leading to the off-DNA synthesis of UNC 002-080, which was confirmed as a nanomolar BRD4 binder via isothermal titration calorimetry (ITC). In contrast, a chemically similar compound not prioritized by DELi, UNC 002-083, showed no measurable binding. These results demonstrate DELi as an effective tool for DEL design and analysis. Further, its open-source nature will promote ongoing development and contributions from the DEL community to expand its applications and capabilities.Competing Interest StatementThe authors have declared no competing interest.Footnotes* https://github.com/Popov-Lab-UNC/DELi

Paper

Share this book

Add to My Shelf

Expansion of DNA-Encoded Library Hits Using Generative Chemistry and Ultra-Large Compound Catalogs

by Novy, Brandon , Maxfield, Travis , Lin, Shu-Hang in Biophysics

2025

DNA-encoded libraries (DELs) are powerful tools for initial hit identification, yet the combinatorial chemistries and building block choices used in their construction can restrict chemical space coverage and hit drug-likeness, limiting efficient hit expansion. Generative artificial intelligence (AI), by contrast, can in principle explore drug-like chemical space around any given compound, but it often struggles with the synthesizability of generated molecules and requires a set of validated hits to initiate exploration. Here, we present a synergistic methodology that overcomes these mutual limitations by leveraging experimentally validated DEL data to initialize and bias an AI-powered virtual screening pipeline, expanding initial DEL hits with both de novo and purchasable compounds from ultra-large chemical libraries. Using this approach, we identified novel, commercially available hits from the Enamine REAL Space for the chromatin reader protein 53BP1 and validated them in a time-resolved fluorescence resonance energy transfer (TR-FRET) displacement assay. Three compounds demonstrated TR-FRET IC50 values ≤50 μM, while 11 exhibited IC50 values ≤100 μM. Critically, the AI-nominated hits exhibited greater chemical diversity, improved drug-likeness, and were readily purchasable off-the-shelf compared to compounds from the initial DEL selection. This work demonstrates a streamlined platform in which empirical DEL data and generative chemistry models are combined to enable rapid hit expansion from initially screened libraries into diverse, commercially available chemical matter.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter