Catalogue Search | MBRL

Identification of Novel Antibacterials Using Machine Learning Techniques

by Baimiev, Alexey Kh , Skvortsov, Dmitry A. , Veselov, Mark S. in Antibacterial activity , Antibiotics , Discriminant analysis

2019

Many pharmaceutical companies are avoiding the development of novel antibacterials due to a range of rational reasons and the high risk of failure. However, there is an urgent need for novel antibiotics especially against resistant bacterial strains. Available models suffer from many drawbacks and, therefore, are not applicable for scoring novel molecules with high structural diversity by their antibacterial potency. Considering this, the overall aim of this study was to develop an efficient model able to find compounds that have plenty of chances to exhibit antibacterial activity. Based on a proprietary screening campaign, we have accumulated a representative dataset of more than 140,000 molecules with antibacterial activity against assessed in the same assay and under the same conditions. This intriguing set has no analogue in the scientific literature. We applied six techniques to mine these data. For external validation, we used 5,000 compounds with low similarity towards training samples. The antibacterial activity of the selected molecules against was assessed using a comprehensive biological study. Kohonen-based nonlinear mapping was used for the first time and provided the best predictive power (av. 75.5%). Several compounds showed an outstanding antibacterial potency and were identified as translation machinery inhibitors and . For the best compounds, MIC and CC values were determined to allow us to estimate a selectivity index (SI). Many active compounds have a robust IP position.

Journal Article

Share this book

Add to My Shelf

Identification of pyrrolo-pyridine derivatives as novel class of antibacterials

by Sofronova, Alina A , Osterman, Ilya A , Kartsev, Victor G

2020

A series of 5-oxo-4H-pyrrolo[3,2-b]pyridine derivatives was identified as novel class of highly potent antibacterial agents during an extensive large-scale high-throughput screening (HTS) program utilizing a unique double-reporter system—pDualrep2. The construction of the reporter system allows us to perform visual inspection of the underlying mechanism of action due to two genes—Katushka2S and RFP—which encode the proteins with different imaging signatures. Antibacterial activity of the compounds was evaluated during the initial HTS round and subsequent rescreen procedure. The most active molecule demonstrated a MIC value of 3.35 µg/mL against E. coli with some signs of translation blockage (low Katushka2S signal) and no SOS response. The compound did not demonstrate cytotoxicity in standard cell viability assay. Subsequent structural morphing and follow-up synthesis may result in novel compounds with a meaningful antibacterial potency which can be reasonably regarded as an attractive starting point for further in vivo investigation and optimization.

Journal Article

Share this book

Add to My Shelf

2-Pyrazol-1-yl-thiazole derivatives as novel highly potent antibacterials

by Matniyazov, Rustam , Malyshev, Alexander S , Iarovenko, Svetlana in Antibiotics , Antifungal agents , Automation

2019

The present report describes our efforts to identify new structural classes of compounds having promising antibacterial activity using previously published double-reporter system pDualrep2. This semi-automated high-throughput screening (HTS) platform has been applied to perform a large-scale screen of a diverse small-molecule compound library. We have selected a set of more than 125,000 molecules and evaluated them for their antibacterial activity. On the basis of HTS results, eight compounds containing 2-pyrazol-1-yl-thiazole scaffold exhibited moderate-to-high activity against ΔTolC Escherichia coli. Minimum inhibitory concentration (MIC) values for these molecules were in the range of 0.037–8 μg ml−1. The most active compound 8 demonstrated high antibacterial potency (MIC = 0.037 μg ml−1), that significantly exceed that measured for erythromycin (MIC = 2.5 μg ml−1) and was comparable with the activity of levofloxacin (MIC = 0.016 μg ml−1). Unfortunately, this compound showed only moderate selectivity toward HEK293 eukaryotic cell line. On the contrary, compound 7 was less potent (MIC = 0.8 μg ml−1) but displayed only slight cytotoxicity. Thus, 2-pyrazol-1-yl-thiazoles can be considered as a valuable starting point for subsequent optimization and morphing.

Journal Article

Share this book

Add to My Shelf

Deep learning enables rapid identification of potent DDR1 kinase inhibitors

by Zhebrak, Alexander , Polykovskiy, Daniil A. , Kuznetsov, Maksim D. in 631/154/309/2144 , 631/154/309/606 , 631/61/338/2248

2019

We have developed a deep generative model, generative tensorial reinforcement learning (GENTRL), for de novo small-molecule design. GENTRL optimizes synthetic feasibility, novelty, and biological activity. We used GENTRL to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases, in 21 days. Four compounds were active in biochemical assays, and two were validated in cell-based assays. One lead candidate was tested and demonstrated favorable pharmacokinetics in mice. A machine learning model allows the identification of new small-molecule kinase inhibitors in days.

Journal Article

Share this book

Add to My Shelf

A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models

by Mou, Zhenzhen , Ivanenkov, Yan , Song, Dandan in 631/114/1305 , 692/308/153 , 692/699/1785

2025

Idiopathic pulmonary fibrosis (IPF) is an aggressive interstitial lung disease with a high mortality rate. Putative drug targets in IPF have failed to translate into effective therapies at the clinical level. We identify TRAF2- and NCK-interacting kinase (TNIK) as an anti-fibrotic target using a predictive artificial intelligence (AI) approach. Using AI-driven methodology, we generated INS018_055, a small-molecule TNIK inhibitor, which exhibits desirable drug-like properties and anti-fibrotic activity across different organs in vivo through oral, inhaled or topical administration. INS018_055 possesses anti-inflammatory effects in addition to its anti-fibrotic profile, validated in multiple in vivo studies. Its safety and tolerability as well as pharmacokinetics were validated in a randomized, double-blinded, placebo-controlled phase I clinical trial (NCT05154240) involving 78 healthy participants. A separate phase I trial in China, CTR20221542, also demonstrated comparable safety and pharmacokinetic profiles. This work was completed in roughly 18 months from target discovery to preclinical candidate nomination and demonstrates the capabilities of our generative AI-driven drug-discovery pipeline. An AI-generated small-molecule inhibitor treats fibrosis in vivo and in phase I clinical trials.

Journal Article

Share this book

Add to My Shelf

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models

by Zhebrak, Alexander , Artamonov, Aleksey , Tatanov, Oktai in benchmark , Datasets , Deep learning

2020

Generative models are becoming a tool of choice for exploring the molecular space. These models learn on a large training dataset and produce novel molecular structures with similar properties. Generated structures can be utilized for virtual screening or training semi-supervized predictive models in the downstream tasks. While there are plenty of generative models, it is unclear how to compare and rank them. In this work, we introduce a benchmarking platform called Molecular Sets (MOSES) to standardize training and comparison of molecular generative models. MOSES provides training and testing datasets, and a set of metrics to evaluate the quality and diversity of generated structures. We have implemented and compared several molecular generation models and suggest to use our results as reference points for further advancements in generative chemistry research. The platform and source code are available at https://github.com/molecularsets/moses .

Journal Article

Share this book

Add to My Shelf

Molecular LEGION: incalculably large coverage of chemical space around the NLRP3 target

by Zhavoronkov, Alex , Ilin, Ivan , Vasileva, Anna in 639/638/309/507 , 639/638/309/630 , 639/638/563/606

2026

The exploration and mapping of chemical space remain a central challenge in modern drug discovery. Traditional compound libraries and databases cover only a minute fraction of this space, limiting the discovery of novel, bioactive, and patentable chemotypes. Here, we present a unique dataset containing approximately 110 M molecular structures of potential NLRP3 inhibitors enabled by the LEGION ( Latent Enumeration, Generation, Integration, Optimization, and Navigation ) workflow, which integrates generative AI, AI-guided screening within the Chemistry42 platform and auxiliary cheminformatics tools to enable large-scale exploration of chemical space around specific drug targets. Using the structural data of NLRP3 co-crystals, a clinically relevant target, LEGION combined ligand- and structure-based design strategies, in-house algorithms for 3D pharmacophore-aware scaffold extraction, and distinct library enumeration methods to identify over 34,000 unique scaffolds, which can be multiplied into a dataset of 123B molecular structures within the provided code. The resulting dataset of unprecedented size proved effective for scaffold hopping, chemical space navigation, and supporting intellectual property applications by generating structurally diverse and synthetically accessible structures.

Journal Article

Share this book

Add to My Shelf

Longevity Bench: Are SotA LLMs ready for aging research?

by Zhavoronkov, Alex , Zagirova, Diana , Naumov, Vladimir in Aging , Biometrics , DNA methylation

2026

Aging is a core biological process observed in most species and tissues, which is studied with a vast array of technologies. We argue that the abilities of AI systems to emulate aging and to accurately interpret biodata in its context are the key criteria to judge an LLM's utility in biomedical research. Here, we present LongevityBench -- a collection of tasks designed to assess whether foundation models grasp the fundamental principles of aging biology and can use low-level biodata to arrive at phenotype-level conclusions. The benchmark covers a variety of prediction targets including human time-to-death, mutations' effect on lifespan, and age-dependent omics patterns. It spans all common biodata types used in longevity research: transcriptomes, DNA methylation profiles, proteomes, genomes, clinical blood tests and biometrics, as well as natural language annotations. After ranking state-of-the-art foundation models using LongevityBench, we highlight their weaknesses and outline procedures to maximize their utility in aging research and life sciencesCompeting Interest StatementAZ, DS, VN, SP, DZ, VA, AA, and FG are employees of Insilico Medicine, a publicly traded drug development company (HKEX:3696.HK) developing AI applications for target discovery and drug design.Footnotes* This revision includes updated experimental results, restructured framing, and expanded discussion addressing feedback we received from Derya Unumtaz, who is now included as a co-author. # Results update: Table 2 , Figures 2, 3, and 6 have been updated to reflect changes to the prompt formats, which reduced the model refusal rate. Gemini 3 Pro emerges as the top-ranked model overall, achieving first place in 7 of 17 tasks, compared to the previous version's top-3 rank of this model. In Figure 3, panels relating to \"General population survival - pairwise\" and \"Transcriptomic aging - pairwise\" have been affected by the change. In Figure 6, the panel relating to \"Transcriptomic aging - age groups\" has been affected. Corresponding metrics in Table 2 and model average ranks across the 17 tasks have been recalculated. Corresponding sections of the Discussion have been updated to reflect these changes. # Introduction restructuring: Apart from stylistic and grammatical corrections, we have dedicated parts of the introduction to emphasize the significance of benchmarking in biomedical AI and explained in a bit more details what makes aging biology such a compelling field to build benchmarks for. # Discussion revisions: We moderated claims regarding model reliability, acknowledging that newer untested models may perform differently and that rankings will shift as models evolve. We added explicit scope clarification distinguishing what LongevityBench measures from what remains for future work, e.g. mechanistic understanding verification. We expanded the interpretation of format-dependent performance variance, arguing that ranking instability across prompt variations suggests absence of coherent biological representations. We added practical recommendations for researchers based on task-specific findings. We included acknowledgment of the moving target problem in LLM evaluation and our commitment to maintaining updated leaderboards. # Methods clarification: We still intend to continue the experiments with the train portion of the prompts, but since this preprint describes only our experiments with the holdout set, we have removed any mentions of the training set and SFT to avoid confusion. This change also affected content of Table 1, which now reports only the counts for the holdout set. # New sections We have extended the preprint with the Abbreviations section to improve reading comprehension. The Contributions section has been added to recognize each team member's input to this project.* https://bench.insilico.com/

Paper

Share this book

Add to My Shelf

When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs

by Zhavoronkov, Alex , Ilin, Ivan , Schutski, Roman in Benchmarks , Large language models , Performance evaluation

2026

Recent progress has expanded the use of large language models (LLMs) in drug discovery, including synthesis planning. However, objective evaluation of retrosynthesis performance remains limited. Existing benchmarks and metrics typically rely on published synthetic procedures and Top-K accuracy based on single ground-truth, which does not capture the open-ended nature of real-world synthesis planning. We propose a new benchmarking framework for single-step retrosynthesis that evaluates both general-purpose and chemistry-specialized LLMs using ChemCensor, a novel metric for chemical plausibility. By emphasizing plausibility over exact match, this approach better aligns with human synthesis planning practices. We also introduce CREED, a novel dataset comprising millions of ChemCensor-validated reaction records for LLM training, and use it to train a model that improves over the LLM baselines under this benchmark.

Paper

Share this book

Add to My Shelf

MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

by Schutski, Roman , Ilin, Ivan , Kaymak-Loveless, Kaeli in Drug development , Functional groups , Large language models

2026

General-purpose large language models (LLMs) that rely on in-context learning do not reliably deliver the scientific understanding and performance required for drug discovery tasks. Simply increasing model size or introducing reasoning tokens does not yield significant performance gains. To address this gap, we introduce the MMAI Gym for Science, a one-stop shop molecular data formats and modalities as well as task-specific reasoning, training, and benchmarking recipes designed to teach foundation models the 'language of molecules' in order to solve practical drug discovery problems. We use MMAI Gym to train an efficient Liquid Foundation Model (LFM) for these applications, demonstrating that smaller, purpose-trained foundation models can outperform substantially larger general-purpose or specialist models on molecular benchmarks. Across essential drug discovery tasks - including molecular optimization, ADMET property prediction, retrosynthesis, drug-target activity prediction, and functional group reasoning - the resulting model achieves near specialist-level performance and, in the majority of settings, surpasses larger models, while remaining more efficient and broadly applicable in the domain.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter