Catalogue Search | MBRL

PyMC: a modern, and comprehensive probabilistic programming framework in Python

by Martin, Osvaldo A. , Andreani, Virgile , Carroll, Colin in Bayesian statistics , Data Science , Differential equations

2023

PyMC is a probabilistic programming library for Python that provides tools for constructing and fitting Bayesian models. It offers an intuitive, readable syntax that is close to the natural syntax statisticians use to describe models. PyMC leverages the symbolic computation library PyTensor, allowing it to be compiled into a variety of computational backends, such as C, JAX, and Numba, which in turn offer access to different computational architectures including CPU, GPU, and TPU. Being a general modeling framework, PyMC supports a variety of models including generalized hierarchical linear regression and classification, time series, ordinary differential equations (ODEs), and non-parametric models such as Gaussian processes (GPs). We demonstrate PyMC’s versatility and ease of use with examples spanning a range of common statistical models. Additionally, we discuss the positive role of PyMC in the development of the open-source ecosystem for probabilistic programming.

Journal Article

Share this book

Add to My Shelf

Bayesian calibration, process modeling and uncertainty quantification in biotechnology

by von Lieres, Eric , Osthege, Michael , Wiechert, Wolfgang in Analytical chemistry , Automation , Bayes Theorem

2022

High-throughput experimentation has revolutionized data-driven experimental sciences and opened the door to the application of machine learning techniques. Nevertheless, the quality of any data analysis strongly depends on the quality of the data and specifically the degree to which random effects in the experimental data-generating process are quantified and accounted for. Accordingly calibration, i.e. the quantitative association between observed quantities and measurement responses, is a core element of many workflows in experimental sciences. Particularly in life sciences, univariate calibration, often involving non-linear saturation effects, must be performed to extract quantitative information from measured data. At the same time, the estimation of uncertainty is inseparably connected to quantitative experimentation. Adequate calibration models that describe not only the input/output relationship in a measurement system but also its inherent measurement noise are required. Due to its mathematical nature, statistically robust calibration modeling remains a challenge for many practitioners, at the same time being extremely beneficial for machine learning applications. In this work, we present a bottom-up conceptual and computational approach that solves many problems of understanding and implementing non-linear, empirical calibration modeling for quantification of analytes and process modeling. The methodology is first applied to the optical measurement of biomass concentrations in a high-throughput cultivation system, then to the quantification of glucose by an automated enzymatic assay. We implemented the conceptual framework in two Python packages, calibr8 and murefi , with which we demonstrate how to make uncertainty quantification for various calibration tasks more accessible. Our software packages enable more reproducible and automatable data analysis routines compared to commonly observed workflows in life sciences. Subsequently, we combine the previously established calibration models with a hierarchical Monod-like ordinary differential equation model of microbial growth to describe multiple replicates of Corynebacterium glutamicum batch cultures. Key process model parameters are learned by both maximum likelihood estimation and Bayesian inference, highlighting the flexibility of the statistical and computational framework.

Journal Article

Share this book

Add to My Shelf

Biosensor-based growth-coupling as an evolutionary strategy to improve heme export in Corynebacterium glutamicum

by Göddecke, Janik , Osthege, Michael , Krüger, Aileen in Applied Microbiology , Bacterial Proteins - genetics , Bacterial Proteins - metabolism

2024

The iron-containing porphyrin heme is of high interest for the food industry for the production of artificial meat as well as for medical applications. Recently, the biotechnological platform strain Corynebacterium glutamicum has emerged as a promising host for animal-free heme production. Beyond engineering of complex heme biosynthetic pathways, improving heme export offers significant yet untapped potential for enhancing production strains. In this study, a growth-coupled biosensor was designed to impose a selection pressure on the increased expression of the hrtBA operon encoding an ABC-type heme exporter in C. glutamicum . For this purpose, the promoter region of the growth-regulating genes pfkA (phosphofructokinase) and aceE (pyruvate dehydrogenase) was replaced with that of P hrtB , creating biosensor strains with a selection pressure for hrtBA activation. Resulting sensor strains were used for plate-based selections and for a repetitive batch f(luorescent)ALE using a fully automated laboratory platform. Genome sequencing of isolated clones featuring increased hrtBA expression revealed three distinct mutational hotspots: (i) chrS , (ii) chrA , and (iii) cydD . Mutations in the genes of the ChrSA two-component system, which regulates hrtBA in response to heme levels, were identified as a promising target to enhance export activity. Furthermore, causal mutations within cydD , encoding an ABC-transporter essential for cytochrome bd oxidase assembly, were confirmed by the construction of a deletion mutant. Reversely engineered strains showed strongly increased hrtBA expression as well as increased cellular heme levels. These results further support the proposed role of CydDC as a heme transporter in bacteria. Mutations identified in this study therefore underline the potential of biosensor-based growth coupling and provide promising engineering targets to improve microbial heme production.

Journal Article

Share this book

Add to My Shelf

Why are different estimates of the effective reproductive number so different? A case study on COVID-19 in Germany

by Brockhaus, Elisabeth K. , Funk, Sebastian , Stadler, Tanja in Analysis , Basic Reproduction Number , Case studies

2023

The effective reproductive number R t has taken a central role in the scientific, political, and public discussion during the COVID-19 pandemic, with numerous real-time estimates of this quantity routinely published. Disagreement between estimates can be substantial and may lead to confusion among decision-makers and the general public. In this work, we compare different estimates of the national-level effective reproductive number of COVID-19 in Germany in 2020 and 2021. We consider the agreement between estimates from the same method but published at different time points (within-method agreement) as well as retrospective agreement across eight different approaches (between-method agreement). Concerning the former, estimates from some methods are very stable over time and hardly subject to revisions, while others display considerable fluctuations. To evaluate between-method agreement, we reproduce the estimates generated by different groups using a variety of statistical approaches, standardizing analytical choices to assess how they contribute to the observed disagreement. These analytical choices include the data source, data pre-processing, assumed generation time distribution, statistical tuning parameters, and various delay distributions. We find that in practice, these auxiliary choices in the estimation of R t may affect results at least as strongly as the selection of the statistical approach. They should thus be communicated transparently along with the estimates.

Journal Article

Share this book

Add to My Shelf

Turbid but accurate: automating lysostaphin quantification including uncertainty quantification

by Prigolovkin, Lisa , Osthege, Michael , Hoffzimmer, Anja in Antibacterial activity , Antiinfectives and antibacterials , Applied Microbiology

2026

Conventional methods for measuring antibacterial activity, such as disk-diffusion assays, have limitations in quantitative reliability and require long incubation times making them unsuitable for high-throughput applications. To address these limitations, we automated a turbidity-based assay using readily available equipment and Bayesian data analysis, enabling accurate and precise antibacterial quantification from high-throughput experiments. In this study, we demonstrate the method applied to lysostaphin, a potent anti-staphylococcal agent and promising candidate for therapeutic applications. The turbidity assay monitors optical density changes upon lysostaphin-induced lysis of a susceptible Staphylococcus strain. We validated the use of autoclaved Staphylococcus carnosus TM300 as suitable indicator strain and optimized assay conditions for dynamic range of 0.63–10 mg L −1 lysostaphin. Our integrated approach provides a robust, scalable, and reproducible platform for quantifying active lysostaphin, paving the way for its application in high-throughput screening and process development. We believe that the approach is adaptable to other turbidity-based assays, such as those assessing endolysin activity.

Journal Article

Share this book

Add to My Shelf

bletl ‐ A Python package for integrating BioLector microcultivation devices in the Design‐Build‐Test‐Learn cycle

by Osthege, Michael , Tenhaef, Niklas , Müller, Carolin in Automation , Bayesian analysis , BioLector

2022

Microbioreactor (MBR) devices have emerged as powerful cultivation tools for tasks of microbial phenotyping and bioprocess characterization and provide a wealth of online process data in a highly parallelized manner. Such datasets are difficult to interpret in short time by manual workflows. In this study, we present the Python package bletl and show how it enables robust data analyses and the application of machine learning techniques without tedious data parsing and preprocessing. bletl reads raw result files from BioLector I, II and Pro devices to make all the contained information available to Python‐based data analysis workflows. Together with standard tooling from the Python scientific computing ecosystem, interactive visualizations and spline‐based derivative calculations can be performed. Additionally, we present a new method for unbiased quantification of time‐variable specific growth rate μ⃗t $\\overrightarrow{\\mu }_t$based on unsupervised switchpoint detection with Student‐t distributed random walks. With an adequate calibration model, this method enables practitioners to quantify time‐variable growth rate with Bayesian uncertainty quantification and automatically detect switch‐points that indicate relevant metabolic changes. Finally, we show how time series feature extraction enables the application of machine learning methods to MBR data, resulting in unsupervised phenotype characterization. As an example, Neighbor Embedding (t‐SNE) is performed to visualize datasets comprising a variety of growth/DO/pH phenotypes.

Journal Article

Share this book

Add to My Shelf

Control of parallelized bioreactors II: probabilistic quantification of carboxylic acid reductase activity for bioprocess optimization

by Osthege, Michael , Weuster-Botz, Dirk , Wiechert, Wolfgang in Automation , Bayesian analysis , Biomass

2022

Autonomously operated parallelized mL-scale bioreactors are considered the key to reduce bioprocess development cost and time. However, their application is often limited to products with very simple analytics. In this study, we investigated enhanced protein expression conditions of a carboxyl reductase from Nocardia otitidiscaviarum in E. coli. Cells were produced with exponential feeding in a L-scale bioreactor. After the desired cell density for protein expression was reached, the cells were automatically transferred to 48 mL-scale bioreactors operated by a liquid handling station where protein expression studies were conducted. During protein expression, the feed rate and the inducer concentration was varied. At the end of the protein expression phase, the enzymatic activity was estimated by performing automated whole-cell biotransformations in a deep-well-plate. The results were analyzed with hierarchical Bayesian modelling methods to account for the biomass growth during the biotransformation, biomass interference on the subsequent product assay, and to predict absolute and specific enzyme activities at optimal expression conditions. Lower feed rates seemed to be beneficial for high specific and absolute activities. At the optimal investigated expression conditions an activity of 1153UmL-1 was estimated with a 90% credible interval of [992,1321]UmL-1. This is about 40-fold higher than the highest published data for the enzyme under investigation. With the proposed setup, 192 protein expression conditions were studied during four experimental runs with minimal manual workload, showing the reliability and potential of automated and digitalized bioreactor systems.

Journal Article

Share this book

Add to My Shelf

PeakPerformance - a tool for Bayesian inference-based fitting of LC-MS/MS peaks

by von Lieres, Eric , Osthege, Michael , Wiechert, Wolfgang in Systems Biology

2024

A major bottleneck of chromatography-based analytics has been the elusive fully automated identification and integration of peak data without the need of extensive human supervision. The presented Python package PeakPerformance applies Bayesian inference to chromatographic peak fitting, and provides an automated approach featuring model selection and uncertainty quantification. Currently, its application is focused on data from targeted liquid chromatography tandem mass spectrometry (LC-MS/MS), but its design allows for an expansion to other chromatographic techniques. PeakPerformance is implemented in Python and the source code is available on GitHub. It is unit-tested on Linux and Windows and accompanied by general introductory documentation, as well as example notebooks. The presented PeakPerformance tool performs automated chromatographic peak data fitting using Bayesian methodology. Accordingly, it innovates by delivering built-in uncertainty quantification for each peak, thus taking the measurement noise into account. Using a convergence statistic and based on the determined peak uncertainties, the differentiation of signals into peak and noise was improved and false positives or negatives were largely eliminated. The provided documentation and the implemented convenience functions are meant to lower the barrier of entry for users with little programming experience. Lastly, the modular design of the software enables modification and expansion to data from different chromatographic methods.

Paper

Share this book

Add to My Shelf

PeakPerformance - A software tool for fitting LC-MS/MS peaks including uncertainty quantification by Bayesian inference

by Osthege, Michael , Wiechert, Wolfgang , Nießer, Jochen in Bayesian analysis , Liquid chromatography , Mass spectroscopy

2024

A major bottleneck of chromatography-based analytics has been the elusive fully automated identification and integration of peak data without the need of extensive human supervision. The presented Python package PeakPerformance applies Bayesian inference to chromatographic peak fitting, and provides an automated approach featuring model selection and uncertainty quantification. Currently, its application is focused on data from targeted liquid chromatography tandem mass spectrometry (LC-MS/MS), but its design allows for an expansion to other chromatographic techniques. Availability and implementation: PeakPerformance is implemented in Python and the source code is available on https://github.com/JuBiotech/peak-performance. It is unit-tested on Linux and Windows and accompanied by general introductory documentation, as well as example notebooks. Contact: s.noack@fz-juelich.deCompeting Interest StatementThe authors have declared no competing interest.

Paper

Share this book

Add to My Shelf

Bayesian calibration, process modeling and uncertainty quantification in biotechnology

by Eric Von Lieres , Osthege, Michael , Wiechert, Wolfgang in Bayesian analysis , Bioengineering , Bioreactors

2021

High-throughput experimentation has revolutionized data-driven experimental sciences and opened the door to the application of machine learning techniques. Nevertheless, the quality of any data analysis strongly depends on the quality of the data and specifically the degree to which random effects in the experimental data-generating process are quantified and accounted for. Accordingly, calibration, i.e. the quantitative association between observed quantities with measurement responses, is a core element of many workflows in experimental sciences. Particularly in life sciences, univariate calibration, often involving non-linear saturation effects, must be performed to extract quantitative information from measured data. At the same time, the estimation of uncertainty is inseparably connected to quantitative experimentation. Adequate calibration models that describe not only the input/output relationship in a measurement system, but also its inherent measurement noise are required. Due to its mathematical nature, statistically robust calibration modeling remains a challenge for many practitioners, at the same time being extremely beneficial for machine learning applications. In this work, we present a bottom-up conceptual and computational approach that solves many problems of understanding and implementing non-linear, empirical calibration modeling for quantification of analytes and process modeling. The methodology is first applied to the optical measurement of biomass concentrations in a high-throughput cultivation system, then to the quantification of glucose by an automated enzymatic assay. We implemented the conceptual framework in two Python packages, with which we demonstrate how it makes uncertainty quantification for various calibration tasks more accessible. Our software packages enable more reproducible and automatable data analysis routines compared to commonly observed workflows in life sciences. Subsequently, we combine the previously established calibration models with a hierarchical Monod-like differential equation model of microbial growth to describe multiple replicates of Corynebacterium glutamicum batch microbioreactor cultures. Key process model parameters are learned by both maximum likelihood estimation and Bayesian inference, highlighting the flexibility of the statistical and computational framework. Competing Interest Statement The authors have declared no competing interest. Footnotes * Line numbering was removed. Page break added after abstract.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter