Catalogue Search | MBRL

Computational planning of the synthesis of complex natural products

by Dittwald, Piotr , Gołębiowska, Patrycja , Gajewska, Ewa P. in 119/118 , 639/638/403/977 , 639/638/549

2020

Training algorithms to computationally plan multistep organic syntheses has been a challenge for more than 50 years 1 – 7 . However, the field has progressed greatly since the development of early programs such as LHASA 1 , 7 , for which reaction choices at each step were made by human operators. Multiple software platforms 6 , 8 – 14 are now capable of completely autonomous planning. But these programs ‘think’ only one step at a time and have so far been limited to relatively simple targets, the syntheses of which could arguably be designed by human chemists within minutes, without the help of a computer. Furthermore, no algorithm has yet been able to design plausible routes to complex natural products, for which much more far-sighted, multistep planning is necessary 15 , 16 and closely related literature precedents cannot be relied on. Here we demonstrate that such computational synthesis planning is possible, provided that the program’s knowledge of organic chemistry and data-based artificial intelligence routines are augmented with causal relationships 17 , 18 , allowing it to ‘strategize’ over multiple synthetic steps. Using a Turing-like test administered to synthesis experts, we show that the routes designed by such a program are largely indistinguishable from those designed by humans. We also successfully validated three computer-designed syntheses of natural products in the laboratory. Taken together, these results indicate that expert-level automated synthetic planning is feasible, pending continued improvements to the reaction knowledge base and further code optimization. A synthetic route-planning algorithm, augmented with causal relationships that allow it to strategize over multiple steps, can design complex natural-product syntheses that are indistinguishable from those designed by human experts.

Journal Article

Share this book

Add to My Shelf

Inferring serum proteolytic activity from LC-MS/MS data

by Dittwald, Piotr , Karczmarski, Jakub , Gambin, Anna in Algorithms , Amino acids , Bioinformatics

2012

Background In this paper we deal with modeling serum proteolysis process from tandem mass spectrometry data. The parameters of peptide degradation process inferred from LC-MS/MS data correspond directly to the activity of specific enzymes present in the serum samples of patients and healthy donors. Our approach integrate the existing knowledge about peptidases' activity stored in MEROPS database with the efficient procedure for estimation the model parameters. Results Taking into account the inherent stochasticity of the process, the proteolytic activity is modeled with the use of Chemical Master Equation (CME). Assuming the stationarity of the Markov process we calculate the expected values of digested peptides in the model. The parameters are fitted to minimize the discrepancy between those expected values and the peptide activities observed in the MS data. Constrained optimization problem is solved by Levenberg-Marquadt algorithm. Conclusions Our results demonstrates the feasibility and potential of high-level analysis for LC-MS proteomic data. The estimated enzyme activities give insights into the molecular pathology of colorectal cancer. Moreover the developed framework is general and can be applied to study proteolytic activity in different systems.

Journal Article

Share this book

Add to My Shelf

BRAIN 2.0: Time and Memory Complexity Improvements in the Algorithm for Calculating the Isotope Distribution

by Dittwald, Piotr , Valkenborg, Dirk in Algorithms , Analytical Chemistry , Bioinformatics

2014

Recently, an elegant iterative algorithm called BRAIN ( B affling R ecursive A lgorithm for I sotopic distributio N calculations) was presented. The algorithm is based on the classic polynomial method for calculating aggregated isotope distributions, and it introduces algebraic identities using Newton-Girard and Viète’s formulae to solve the problem of polynomial expansion. Due to the iterative nature of the BRAIN method, it is a requirement that the calculations start from the lightest isotope variant. As such, the complexity of BRAIN scales quadratically with the mass of the putative molecule, since it depends on the number of aggregated peaks that need to be calculated. In this manuscript, we suggest two improvements of the algorithm to decrease both time and memory complexity in obtaining the aggregated isotope distribution. We also illustrate a concept to represent the element isotope distribution in a generic manner. This representation allows for omitting the root calculation of the element polynomial required in the original BRAIN method. A generic formulation for the roots is of special interest for higher order element polynomials such that root finding algorithms and its inaccuracies can be avoided. Graphical abstract ᅟ

Journal Article

Share this book

Add to My Shelf

An Efficient Method to Calculate the Aggregated Isotopic Distribution and Exact Center-Masses

by Burzykowski, Tomasz , Dittwald, Piotr , Valkenborg, Dirk in Algorithms , Analytical Chemistry , Analytical, structural and metabolic biochemistry

2012

In this article, we present a computation- and memory-efficient method to calculate the probabilities of occurrence and exact center-masses of the aggregated isotopic distribution of a molecule. The method uses fundamental mathematical properties of polynomials given by the Newton-Girard theorem and Viete’s formulae. The calculation is based on the atomic composition of the molecule and the natural abundances of the elemental isotopes in normal terrestrial matter. To evaluate the performance of the proposed method, which we named BRAIN , we compare it with the results obtained from five existing software packages ( IsoPro , Mercury , Emass , NeutronCluster , and IsoDalton ) for 10 biomolecules. Additionally, we compare the computed mass centers with the results obtained by calculating, and subsequently aggregating, the fine isotopic distribution for two of the exemplary biomolecules. The algorithm will be made available as a Bioconductor package in R, and is also available upon request.

Journal Article

Share this book

Add to My Shelf

A computer algorithm to discover iterative sequences of organic reactions

by Dittwald, Piotr , Gołębiowska, Patrycja , Roszak, Rafal in Aldehydes , Algorithms , Automation

2022

Iterative syntheses comprise sequences of organic reactions in which the substrate molecules grow with each iteration and the functional groups, which enable the growth step, are regenerated to allow sustained cycling. Typically, iterative sequences can be automated, for example, as in the transformative examples of the robotized syntheses of peptides, oligonucleotides, polysaccharides and even some natural products. However, iterations are not easy to identify—in particular, for sequences with cycles more complex than protection and deprotection steps. Indeed, the number of catalogued examples is in the tens to maybe a hundred. Here, a computer algorithm using a comprehensive knowledge base of individual reactions constructs and evaluates myriads of putative, but chemically plausible, sequences and discovers an unprecedented number of iterative sequences. Some of these iterations are validated by experiment and result in the synthesis of motifs commonly found in natural products. This computer-driven discovery expands the pool of iterative sequences that may be automated in the future.Iterative sequences of organic reactions can be automated but are rare and challenging to identify. Now, a computer-driven strategy is reported for the systematic discovery and evaluation of such sequences. Several of the iterative sequences are validated experimentally and enable the syntheses of useful motifs in natural product targets.

Journal Article

Share this book

Add to My Shelf

On the Fine Isotopic Distribution and Limits to Resolution in Mass Spectrometry

by Dittwald, Piotr , Valkenborg, Dirk , Rockwood, Alan L. in Analytical Chemistry , Bioinformatics , Biomolecules

2015

Mass spectrometry enables the study of increasingly larger biomolecules with increasingly higher resolution, which is able to distinguish between fine isotopic variants having the same additional nucleon count, but slightly different masses. Therefore, the analysis of the fine isotopic distribution becomes an interesting research topic with important practical applications. In this paper, we propose the comprehensive methodology for studying the basic characteristics of the fine isotopic distribution. Our approach uses a broad spectrum of methods ranging from generating functions—that allow us to estimate the variance and the information theory entropy of the distribution—to the theory of thermal energy fluctuations. Having characterized the variance, spread, shape, and size of the fine isotopic distribution, we are able to indicate limitations to high resolution mass spectrometry. Moreover, the analysis of “thermorelativistic” effects (i.e., mass uncertainty attributable to relativistic effects coupled with the statistical mechanical uncertainty of the energy of an isolated ion), in turn, gives us an estimate of impassable limits of isotopic resolution (understood as the ability to distinguish fine structure peaks), which can be moved further only by cooling the ions. The presented approach highlights the potential of theoretical analysis of the fine isotopic distribution, which allows modeling the data more accurately, aiming to support the successful experimental measurements. Graphical Abstract ᅟ

Journal Article

Share this book

Add to My Shelf

Human endogenous retroviral elements promote genome instability via non-allelic homologous recombination

by Campbell, Ian M , Rosenfeld, Jill A , Dittwald, Piotr in Analysis , Base Sequence , Biomedical and Life Sciences

2014

Background Recurrent rearrangements of the human genome resulting in disease or variation are mainly mediated by non-allelic homologous recombination (NAHR) between low-copy repeats. However, other genomic structures, including AT-rich palindromes and retroviruses, have also been reported to underlie recurrent structural rearrangements. Notably, recurrent deletions of Yq12 conveying azoospermia, as well as non-pathogenic reciprocal duplications, are mediated by human endogenous retroviral elements (HERVs). We hypothesized that HERV elements throughout the genome can serve as substrates for genomic instability and result in human copy-number variation (CNV). Results We developed parameters to identify HERV elements similar to those that mediate Yq12 rearrangements as well as recurrent deletions of 3q13.2q13.31. We used these parameters to identify HERV pairs genome-wide that may cause instability. Our analysis highlighted 170 pairs, flanking 12.1% of the genome. We cross-referenced these predicted susceptibility regions with CNVs from our clinical databases for potentially HERV-mediated rearrangements and identified 78 CNVs. We subsequently molecularly confirmed recurrent deletion and duplication rearrangements at four loci in ten individuals, including reciprocal rearrangements at two loci. Breakpoint sequencing revealed clustering in regions of high sequence identity enriched in PRDM9-mediated recombination hotspot motifs. Conclusions The presence of deletions and reciprocal duplications suggests NAHR as the causative mechanism of HERV-mediated CNV, even though the length and the sequence homology of the HERV elements are less than currently thought to be required for NAHR. We propose that in addition to HERVs, other repetitive elements, such as long interspersed elements, may also be responsible for the formation of recurrent CNVs via NAHR.

Journal Article

Share this book

Add to My Shelf

Reply to the Comment on

by Burzykowski, Tomasz , Dittwald, Piotr , Valkenborg, Dirk in Analytical Chemistry , Bioinformatics , Biotechnology

2012

Journal Article

Share this book

Add to My Shelf

Reply to the Comment on

by Dittwald, Piotr , Burzykowski, Tomasz , Valkenborg, Dirk

2012