Catalogue Search | MBRL

De novo design of protein structure and function with RFdiffusion

by Courbet, Alexis , Ragotte, Robert J. , Ovchinnikov, Sergey in 101/28 , 631/114/1305 , 631/114/469

2023

There has been considerable recent progress in designing new proteins using deep-learning methods 1 – 9 . Despite this progress, a general deep-learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher-order symmetric architectures, has yet to be described. Diffusion models 10 , 11 have had considerable success in image and language generative modelling but limited success when applied to protein modelling, probably due to the complexity of protein backbone geometry and sequence–structure relationships. Here we show that by fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of designed symmetric assemblies, metal-binding proteins and protein binders. The accuracy of RFdiffusion is confirmed by the cryogenic electron microscopy structure of a designed binder in complex with influenza haemagglutinin that is nearly identical to the design model. In a manner analogous to networks that produce images from user-specified inputs, RFdiffusion enables the design of diverse functional proteins from simple molecular specifications. Fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks yields a generative model for protein design that achieves outstanding performance on a wide range of protein structure and function design challenges.

Journal Article

Share this book

Add to My Shelf

Geometry and Representation Learning in Deep Generative Models

by Mathieu, Emile

2021

Deep generative models have de facto emerged as state of the art when it comes to density estimation and sampling high-dimensional and multi-modal data. They combine the abstract yet practical mathematical description of probabilistic modelling with the flexibility and scalability brought by neural networks. They have been successfully applied to a wide spectrum of problems ranging from computer vision and natural language, to the realms of physical sciences. Probabilistic modelling, and in particular Bayesian statistics, have long enabled scientists and practitioners to include prior knowledge that they may have about the data into models. Building well-specified models is beneficial as it leads to improved generalisation capacity, data efficiency and interpretability of the model. In contrast, principled methods to encode such inductive biases in deep generative models are still under development. This thesis presents three pieces of work aimed at addressing this problem. First, we propose a principled approach to introduce inductive biases in variational auto-encoders. Taking a Bayesian perspective, we show that encoding a desired structure into the prior distribution, and applying a proper regularisation, can lead to the desired decomposition in the learnt encodings. We demonstrate this approach on a variety of computer vision datasets and successfully learn representations with sparsity, clustering, and even intricate hierarchical dependency relationships. Next, we introduce an extension of variational auto-encoders to model data with underlying hierarchical structure. As hyperbolic spaces are perfectly suited to embed tree-like data, in contrast to Euclidean geometry, we endow the latent space with hyperbolic geometry. We do so by deriving the necessary methods to work with two main Gaussian generalisations and geometry-aware architectures for the encoder and decoder networks. Finally, we leverage the formalism of Riemannian geometry to define flexible distributions for data that are assumed to live on a given manifold. We do so by extending continuous normalising flows and parametrising manifold-valued diffeomorphisms as solutions of ordinary differential equations.

Dissertation

Share this book

Add to My Shelf

Riemannian Continuous Normalizing Flows

by Nickel, Maximilian , Mathieu, Emile in Differential equations , Ordinary differential equations , Parameterization

2020

Normalizing flows have shown great promise for modelling flexible probability distributions in a computationally tractable way. However, whilst data is often naturally described on Riemannian manifolds such as spheres, torii, and hyperbolic spaces, most normalizing flows implicitly assume a flat geometry, making them either misspecified or ill-suited in these situations. To overcome this problem, we introduce Riemannian continuous normalizing flows, a model which admits the parametrization of flexible probability measures on smooth manifolds by defining flows as the solution to ordinary differential equations. We show that this approach can lead to substantial improvements on both synthetic and real-world data when compared to standard flows or previously introduced projected flows.

Paper

Share this book

Add to My Shelf

Improved motif-scaffolding with SE(3) flow matching

by Satorras, Victor Garcia , Foong, Andrew Y K , Gastegger, Michael

2024

Protein design often begins with the knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a range of motifs. However, generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow without additional training. On a benchmark of 24 biologically meaningful motifs, we show our method achieves 2.5 times more designable and unique motif-scaffolds compared to state-of-the-art. Code: https://github.com/microsoft/protein-frame-flow.

Journal Article

Share this book

Add to My Shelf

On Contrastive Representations of Stochastic Processes

by Teh, Yee Whye , Foster, Adam , Mathieu, Emile in Machine learning , Periodic functions , Reconstruction

2021

Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series. Typical methods rely on exact reconstruction of observations, but this approach breaks down as observations become high-dimensional or noise distributions become complex. To address this, we propose a unifying framework for learning contrastive representations of stochastic processes (CReSP) that does away with exact reconstruction. We dissect potential use cases for stochastic process representations, and propose methods that accommodate each. Empirically, we show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes. Our methods tolerate noisy high-dimensional observations better than traditional approaches, and the learned representations transfer to a range of downstream tasks.

Paper

Share this book

Add to My Shelf

A framework for conditional diffusion modelling with applications in motif scaffolding for protein design

by Vargas, Francisco , Dutordoir, Vincent , Kieran Didi in Computer vision , Modelling , Proteins

2024

Many protein design applications, such as binder or enzyme design, require scaffolding a structural motif with high precision. Generative modelling paradigms based on denoising diffusion processes emerged as a leading candidate to address this motif scaffolding problem and have shown early experimental success in some cases. In the diffusion paradigm, motif scaffolding is treated as a conditional generation task, and several conditional generation protocols were proposed or imported from the Computer Vision literature. However, most of these protocols are motivated heuristically, e.g. via analogies to Langevin dynamics, and lack a unifying framework, obscuring connections between the different approaches. In this work, we unify conditional training and conditional sampling procedures under one common framework based on the mathematically well-understood Doob's h-transform. This new perspective allows us to draw connections between existing methods and propose a new variation on existing conditional training protocols. We illustrate the effectiveness of this new protocol in both, image outpainting and motif scaffolding and find that it outperforms standard methods.

Paper

Share this book

Add to My Shelf

Diffusion Models for Constrained Domains

by Klarner, Leo , Fishman, Nic , De Bortoli, Valentin in Brownian motion , Constraints , Diffusion

2024

Denoising diffusion models are a novel class of generative algorithms that achieve state-of-the-art performance across a range of domains, including image generation and text-to-image tasks. Building on this success, diffusion models have recently been extended to the Riemannian manifold setting, broadening their applicability to a range of problems from the natural and engineering sciences. However, these Riemannian diffusion models are built on the assumption that their forward and backward processes are well-defined for all times, preventing them from being applied to an important set of tasks that consider manifolds defined via a set of inequality constraints. In this work, we introduce a principled framework to bridge this gap. We present two distinct noising processes based on (i) the logarithmic barrier metric and (ii) the reflected Brownian motion induced by the constraints. As existing diffusion model techniques cannot be applied in this setting, we derive new tools to define such models in our framework. We then demonstrate the practical utility of our methods on a number of synthetic and real-world tasks, including applications from robotics and protein design.

Paper

Share this book

Add to My Shelf

SE(3) Equivariant Augmented Coupling Flows

by Antorán, Javier , José Miguel Hernández-Lobato , Midgley, Laurence I in Alanine , Boltzmann distribution , Cartesian coordinates

2024

Coupling normalizing flows allow for fast sampling and density evaluation, making them the tool of choice for probabilistic modeling of physical systems. However, the standard coupling architecture precludes endowing flows that operate on the Cartesian coordinates of atoms with the SE(3) and permutation invariances of physical systems. This work proposes a coupling flow that preserves SE(3) and permutation equivariance by performing coordinate splits along additional augmented dimensions. At each layer, the flow maps atoms' positions into learned SE(3) invariant bases, where we apply standard flow transformations, such as monotonic rational-quadratic splines, before returning to the original basis. Crucially, our flow preserves fast sampling and density evaluation, and may be used to produce unbiased estimates of expectations with respect to the target distribution via importance sampling. When trained on the DW4, LJ13, and QM9-positional datasets, our flow is competitive with equivariant continuous normalizing flows and diffusion models, while allowing sampling more than an order of magnitude faster. Moreover, to the best of our knowledge, we are the first to learn the full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian positions of its atoms. Lastly, we demonstrate that our flow can be trained to approximately sample from the Boltzmann distribution of the DW4 and LJ13 particle systems using only their energy functions.

Paper

Share this book

Add to My Shelf

On conditional diffusion models for PDE simulations

by Bergamin, Federico , Shysheya, Aliaksandra , José Miguel Hernández-Lobato in Comparative studies , Conditioning , Data assimilation

2024

Modelling partial differential equations (PDEs) is of crucial importance in science and engineering, and it includes tasks ranging from forecasting to inverse problems, such as data assimilation. However, most previous numerical and machine learning approaches that target forecasting cannot be applied out-of-the-box for data assimilation. Recently, diffusion models have emerged as a powerful tool for conditional generation, being able to flexibly incorporate observations without retraining. In this work, we perform a comparative study of score-based diffusion models for forecasting and assimilation of sparse observations. In particular, we focus on diffusion models that are either trained in a conditional manner, or conditioned after unconditional training. We address the shortcomings of existing models by proposing 1) an autoregressive sampling approach that significantly improves performance in forecasting, 2) a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths, and 3) a hybrid model which employs flexible pre-training conditioning on initial conditions and flexible post-training conditioning to handle data assimilation. We empirically show that these modifications are crucial for successfully tackling the combination of forecasting and data assimilation, a task commonly encountered in real-world scenarios.

Paper

Share this book

Add to My Shelf

Metropolis Sampling for Constrained Diffusion Models

by Klarner, Leo , de Bortoli, Valentin , Fishman, Nic in Brownian motion , Constraint modelling , Riemann manifold

2023

Denoising diffusion models have recently emerged as the predominant paradigm for generative modelling on image domains. In addition, their extension to Riemannian manifolds has facilitated a range of applications across the natural sciences. While many of these problems stand to benefit from the ability to specify arbitrary, domain-informed constraints, this setting is not covered by the existing (Riemannian) diffusion model methodology. Recent work has attempted to address this issue by constructing novel noising processes based on the reflected Brownian motion and logarithmic barrier methods. However, the associated samplers are either computationally burdensome or only apply to convex subsets of Euclidean space. In this paper, we introduce an alternative, simple noising scheme based on Metropolis sampling that affords substantial gains in computational efficiency and empirical performance compared to the earlier samplers. Of independent interest, we prove that this new process corresponds to a valid discretisation of the reflected Brownian motion. We demonstrate the scalability and flexibility of our approach on a range of problem settings with convex and non-convex constraints, including applications from geospatial modelling, robotics and protein design.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter