Catalogue Search | MBRL

Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning

by Nebgen, Benjamin T. , Smith, Justin S. , Tretiak, Sergei in 119/118 , 639/638/440 , 639/638/563/606

2019

Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist’s toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations. Computational modelling of chemical systems requires a balance between accuracy and computational cost. Here the authors use transfer learning to develop a general purpose neural network potential that approaches quantum-chemical accuracy for reaction thermochemistry, isomerization, and drug-like molecular torsions.

Journal Article

Share this book

Add to My Shelf

Teaching a neural network to attach and detach electrons from molecules

by Nebgen, Benjamin T. , Smith, Justin S. , Isayev, Olexandr in 119/118 , 639/638/563/606 , 639/638/563/758

2021

Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. Most DNN potentials were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we propose an improved machine learning framework for simulating open-shell anions and cations. We introduce the AIMNet-NSE (Neural Spin Equilibration) architecture, which can predict molecular energies for an arbitrary combination of molecular charge and spin multiplicity with errors of about 2–3 kcal/mol and spin-charges with error errors ~0.01e for small and medium-sized organic molecules, compared to the reference QM simulations. The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions. We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions. Quantum mechanical calculations of molecular ionized states are computationally quite expensive. This work reports a successful extension of a previous deep-neural networks approach towards transferable neural-network models for predicting multiple properties of open shell anions and cations.

Journal Article

Share this book

Add to My Shelf

Automated discovery of a robust interatomic potential for aluminum

by Nam, Hai Ah , Smith, Justin S. , Tretiak, Sergei in 639/638/563/606 , 639/638/563/980 , 639/638/563/981

2021

Machine learning, trained on quantum mechanics (QM) calculations, is a powerful tool for modeling potential energy surfaces. A critical factor is the quality and diversity of the training dataset. Here we present a highly automated approach to dataset construction and demonstrate the method by building a potential for elemental aluminum (ANI-Al). In our active learning scheme, the ML potential under development is used to drive non-equilibrium molecular dynamics simulations with time-varying applied temperatures. Whenever a configuration is reached for which the ML uncertainty is large, new QM data is collected. The ML model is periodically retrained on all available QM data. The final ANI-Al potential makes very accurate predictions of radial distribution function in melt, liquid-solid coexistence curve, and crystal properties such as defect energies and barriers. We perform a 1.3M atom shock simulation and show that ANI-Al force predictions shine in their agreement with new reference DFT calculations. The accuracy of a machine-learned potential is limited by the quality and diversity of the training dataset. Here the authors propose an active learning approach to automatically construct general purpose machine-learning potentials here demonstrated for the aluminum case.

Journal Article

Share this book

Add to My Shelf

The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules

by Smith, Justin S. , Tretiak, Sergei , Barros, Kipton in 639/638/563/979 , 639/638/563/980 , 639/638/630

2020

Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry. Measurement(s) Quantum Mechanics • energy • force • multipole moment • atomic charge Technology Type(s) computational modeling technique Factor Type(s) atom Sample Characteristic - Environment organic molecule Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12046440

Journal Article

Share this book

Add to My Shelf

Exploring the frontiers of condensed-phase chemistry with a general reactive machine learning potential

by Nebgen, Benjamin T. , Jadrich, Ryan B. , Smith, Justin S. in 639/638/563/606 , 639/638/563/934 , 639/638/563/981

2024

Atomistic simulation has a broad range of applications from drug design to materials discovery. Machine learning interatomic potentials (MLIPs) have become an efficient alternative to computationally expensive ab initio simulations. For this reason, chemistry and materials science would greatly benefit from a general reactive MLIP, that is, an MLIP that is applicable to a broad range of reactive chemistry without the need for refitting. Here we develop a general reactive MLIP (ANI-1xnr) through automated sampling of condensed-phase reactions. ANI-1xnr is then applied to study five distinct systems: carbon solid-phase nucleation, graphene ring formation from acetylene, biofuel additives, combustion of methane and the spontaneous formation of glycine from early earth small molecules. In all studies, ANI-1xnr closely matches experiment (when available) and/or previous studies using traditional model chemistry methods. As such, ANI-1xnr proves to be a highly general reactive MLIP for C, H, N and O elements in the condensed phase, enabling high-throughput in silico reactive chemistry experimentation. Atomistic simulations have a broad range of applications from drug design to materials discovery. Machine learning interatomic potentials (MLIPs) have become an efficient alternative to computationally expensive ab initio simulations. Now a general reactive MLIP (called ANI-1xnr) has been developed and validated against a broad range of condensed-phase reactive systems.

Journal Article

Share this book

Add to My Shelf

Uncertainty-driven dynamics for active learning of interatomic potentials

by Smith, Justin S. , Tretiak, Sergei , Li, Ying Wai in Acetylacetone , Bias , Configurations

2023

Machine learning (ML) models, if trained to data sets of high-fidelity quantum simulations, produce accurate and efficient interatomic potentials. Active learning (AL) is a powerful tool to iteratively generate diverse data sets. In this approach, the ML model provides an uncertainty estimate along with its prediction for each new atomic configuration. If the uncertainty estimate passes a certain threshold, then the configuration is included in the data set. Here we develop a strategy to more rapidly discover configurations that meaningfully augment the training data set. The approach, uncertainty-driven dynamics for active learning (UDD-AL), modifies the potential energy surface used in molecular dynamics simulations to favor regions of configuration space for which there is large model uncertainty. The performance of UDD-AL is demonstrated for two AL tasks: sampling the conformational space of glycine and sampling the promotion of proton transfer in acetylacetone. The method is shown to efficiently explore the chemically relevant configuration space, which may be inaccessible using regular dynamical sampling at target temperature conditions.

Journal Article

Share this book

Add to My Shelf

Intermolecular conical intersections in molecular aggregates

by Nebgen, Benjamin Tyler , Bäuerle, Peter , Pittalis, Stefano in 119/118 , 140/125 , 639/301/1005

2021

Conical intersections (CoIns) of multidimensional potential energy surfaces are ubiquitous in nature and control pathways and yields of many photo-initiated intramolecular processes. Such topologies can be potentially involved in the energy transport in aggregated molecules or polymers but are yet to be uncovered. Here, using ultrafast two-dimensional electronic spectroscopy (2DES), we reveal the existence of intermolecular CoIns in molecular aggregates relevant for photovoltaics. Ultrafast, sub-10-fs 2DES tracks the coherent motion of a vibrational wave packet on an optically bright state and its abrupt transition into a dark state via a CoIn after only 40 fs. Non-adiabatic dynamics simulations identify an intermolecular CoIn as the source of these unusual dynamics. Our results indicate that intermolecular CoIns may effectively steer energy pathways in functional nanostructures for optoelectronics. Two-dimensional electronic spectroscopy reveals the existence of intermolecular conical intersections in molecular aggregates relevant for photovoltaics.

Journal Article

Share this book

Add to My Shelf

Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need

by Matin, Sakib , Messerly, Mitchell , Allen, Alice E A in Accuracy , Computation , computational chemistry

2025

The promise of machine learning interatomic potentials (MLIPs) has led to an abundance of public quantum mechanical (QM) training datasets. The quality of an MLIP is directly limited by the accuracy of the energies and atomic forces in the training dataset. Unfortunately, most of these datasets are computed with relatively low-accuracy QM methods, e.g. density functional theory with a moderate basis set. Due to the increased computational cost of more accurate QM methods, e.g. coupled-cluster theory with a complete basis set (CBS) extrapolation, most high-accuracy datasets are much smaller and often do not contain atomic forces. The lack of high-accuracy atomic forces is quite troubling, as training with force data greatly improves the stability and quality of the MLIP compared to training to energy alone. Because most datasets are computed with a unique level of theory, traditional single-fidelity (SF) learning is not capable of leveraging the vast amounts of published QM data. In this study, we apply multi-fidelity learning (MFL) to train an MLIP to multiple QM datasets of different levels of accuracy, i.e. levels of fidelity. Specifically, we perform three test cases to demonstrate that MFL with both low-level forces and high-level energies yields an extremely accurate MLIP—far more accurate than a SF MLIP trained solely to high-level energies and almost as accurate as a SF MLIP trained directly to high-level energies and forces. Therefore, MFL greatly alleviates the need for generating large and expensive datasets containing high-accuracy atomic forces and allows for more effective training to existing high-accuracy energy-only datasets. Indeed, low-accuracy atomic forces and high-accuracy energies are all that are needed to achieve a high-accuracy MLIP with MFL.

Journal Article

Share this book

Add to My Shelf

A neural network for determination of latent dimensionality in non-negative matrix factorization

by Kuksova, Svetlana , Nebgen, Benjamin T , Alexandrov, Boian S in Anomalies , Classifiers , Data mining

2021

Non-negative matrix factorization (NMF) has proven to be a powerful unsupervised learning method for uncovering hidden features in complex and noisy data sets with applications in data mining, text recognition, dimension reduction, face recognition, anomaly detection, blind source separation, and many other fields. An important input for NMF is the latent dimensionality of the data, that is, the number of hidden features, K, present in the explored data set. Unfortunately, this quantity is rarely known a priori. The existing methods for determining latent dimensionality, such as automatic relevance determination (ARD), are mostly heuristic and utilize different characteristics to estimate the number of hidden features. However, all of them require human presence to make a final determination of K. Here we utilize a supervised machine learning approach in combination with a recent method for model determination, called NMFk, to determine the number of hidden features automatically. NMFk performs a set of NMF simulations on an ensemble of matrices, obtained by bootstrapping the initial data set, and determines which K produces stable groups of latent features that reconstruct the initial data set well. We then train a multi-layer perceptron (MLP) classifier network to determine the correct number of latent features utilizing the statistics and characteristics of the NMF solutions, obtained from NMFk. In order to train the MLP classifier, a training set of 58 660 matrices with predetermined latent features were factorized with NMFk. The MLP classifier in conjunction with NMFk maintains a greater than 95% success rate when applied to a held out test set. Additionally, when applied to two well-known benchmark data sets, the swimmer and MIT face data, NMFk/MLP correctly recovered the established number of hidden features. Finally, we compared the accuracy of our method to ARD, AIC and stability-based methods.

Journal Article

Share this book

Add to My Shelf

Deep learning of dynamically responsive chemical Hamiltonians with semiempirical quantum mechanics

by Zhou, Guoqing , Lubbers, Nicholas , Nebgen, Benjamin in Accuracy , Bonding strength , Chemical bonds

2022

Conventional machine-learning (ML) models in computational chemistry learn to directly predict molecular properties using quantum chemistry only for reference data. While these heuristic ML methods show quantum-level accuracy with speeds several orders of magnitude faster than traditional quantum chemistry methods, they suffer from poor extensibility and transferability; i.e., their accuracy degrades on large or new chemical systems. Incorporating quantum chemistry frameworks into the ML models directly solves this problem. Here we take the structure of semiempirical quantum mechanics (SEQM) methods to construct dynamically responsive Hamiltonians. SEQM methods use empirical parameters fitted to experimental properties to construct reduced-order Hamiltonians, facilitating much faster calculations than ab initio methods but with compromised accuracy. By replacing these static parameters with machine-learned dynamic values inferred from the local environment, we greatly improve the accuracy of the SEQM methods. Trained on molecular energies and atomic forces, these dynamically generated Hamiltonian parameters show a strong correlation with atomic hybridization and bonding. Trained with only about 60,000 small organic molecular conformers, the resulting model retains interpretability, extensibility, and transferability when testing on much larger chemical systems and predicting various molecular properties. Overall, this work demonstrates the virtues of incorporating physics-based descriptions with ML to develop models that are simultaneously accurate, transferable, and interpretable.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter