Catalogue Search | MBRL

The generative capacity of probabilistic protein sequence models

by Haldane, Allan , Novinger, Quentin , Hauri, Sandro in 631/114/1305 , 631/114/2397 , 639/766/530/2804

2021

Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict mutation effects. Despite encouraging results, current model evaluation metrics leave unclear whether GPSMs faithfully reproduce the complex multi-residue mutational patterns observed in natural sequences due to epistasis. Here, we develop a set of sequence statistics to assess the “generative capacity” of three current GPSMs: the pairwise Potts Hamiltonian, the VAE, and the site-independent model. We show that the Potts model’s generative capacity is largest, as the higher-order mutational statistics generated by the model agree with those observed for natural sequences, while the VAE’s lies between the Potts and site-independent models. Importantly, our work provides a new framework for evaluating and interpreting GPSM accuracy which emphasizes the role of higher-order covariation and epistasis, with broader implications for probabilistic sequence models in general. Generative models have become increasingly popular in protein design, yet rigorous metrics that allow the comparison of these models are lacking. Here, the authors propose a set of such metrics and use them to compare three popular models.

Journal Article

Share this book

Add to My Shelf

From Sports to Physics: Deep Representation Learning in Real World Problems

by Hauri, Sandro in Artificial intelligence , Computer science

2023

Machine learning has recently made significant progress due to modern neural network architectures and training procedures. When neural networks learn a task, they create internal representations of the input data. The specific neural network architecture, training process, and task being addressed will influence the way in which the neural network interprets and explains the patterns in the data. The goal of representation learning is to train the neural network to create representations that effectively capture the overall structure of the data. However, the process by which these representations are generated is not fully understood because of the complexity of neural network data manipulations. This makes it difficult to choose the correct training procedure in real world applications. In this dissertation, we apply representation learning to improve the performance of neural networks in three different areas: NBA movement data, material property prediction, and generative protein modeling. First, we propose a novel deep learning approach for predicting human trajectories in sporting events using advanced object tracking data. Our method leverages recent advances in deep learning techniques, including the use of recurrent neural networks and long short-term memory cells, to accurately predict the future movements of players and the ball in a basketball game. We evaluate our approach using data from the NBA’s advanced object tracking system and demonstrate improved performance compared to existing methods. Our results have the potential to inform real-time decision making in sports analytics and improve the understanding of player behavior and strategy. Next, we focused on group activity recognition (GAR) in basketball. In basketball, players engage in various activities, both collaborative and adversarial, in order to win the game. Identifying and analyzing these activities is important for sports analytics as it can inform better strategies and decisions by players and coaches. We introduce a novel deep learning approach for GAR in team sports called NETS. NETS utilizes a Transformer-based architecture combined with LSTM embedding and a team-wise pooling layer to recognize group activity. We test NETS using tracking data from 632 NBA games and found that it was able to learn group activities with high accuracy. Additionally, self- and weak-supervised training in NETS improved the accuracy of GAR. Then, study an application of neural networks on protein modeling. Recent work on autoregressive direct coupling analysis (arDCA) has shown promising potential to efficiently train a generative protein sequence model (GPSM) to adequately model protein sequence data. We propose an extension to this work by adding a higher order coupling estimator to build a model called autoregressive higher order coupling analysis (arHCA). We show that our model can correctly identify higher order couplings in a synthetic dataset and that our model improves the performance of arDCA when trained on real-world sequence data. Finally, we study material property prediction. Incorporation of physical principles in a machine learning (ML) architecture is a fundamental step toward the continued development of AI for inorganic materials. As inspired by the Pauling’s rule, we propose that structure motifs in inorganic crystals can serve as a central input to a machine learning framework. To demonstrate the use of structure motif information, a motif-centric learning framework is created by combining motif information with the atom-based graph neural networks to form an atom-motif dual graph network (AMDNet), which is more accurate in predicting the electronic structures of metal oxides such as bandgaps. The work illustrates the route toward fundamental design of graph neural network learning architecture for complex materials by incorporating beyond-atom physical principles.

Dissertation

Share this book

Add to My Shelf

Group Activity Recognition in Basketball Tracking Data -- Neural Embeddings in Team Sports (NETS)

by Hauri, Sandro , Vucetic, Slobodan in Activity recognition , Basketball , Decision analysis

2022

Like many team sports, basketball involves two groups of players who engage in collaborative and adversarial activities to win a game. Players and teams are executing various complex strategies to gain an advantage over their opponents. Defining, identifying, and analyzing different types of activities is an important task in sports analytics, as it can lead to better strategies and decisions by the players and coaching staff. The objective of this paper is to automatically recognize basketball group activities from tracking data representing locations of players and the ball during a game. We propose a novel deep learning approach for group activity recognition (GAR) in team sports called NETS. To efficiently model the player relations in team sports, we combined a Transformer-based architecture with LSTM embedding, and a team-wise pooling layer to recognize the group activity. Training such a neural network generally requires a large amount of annotated data, which incurs high labeling cost. To address scarcity of manual labels, we generate weak-labels and pretrain the neural network on a self-supervised trajectory prediction task. We used a large tracking data set from 632 NBA games to evaluate our approach. The results show that NETS is capable of learning group activities with high accuracy, and that self- and weak-supervised training in NETS have a positive impact on GAR accuracy.

Paper

Share this book

Add to My Shelf

Structure motif centric learning framework for inorganic crystalline systems

by Hauri, Sandro , Hautier, Geoffroy , Zhang, Shanshan in Algorithms , Artificial intelligence , Clustering

2020

Incorporation of physical principles in a network-based machine learning (ML) architecture is a fundamental step toward the continued development of artificial intelligence for materials science and condensed matter physics. In this work, as inspired by the Pauling rule, we propose that structure motifs (polyhedral formed by cations and surrounding anions) in inorganic crystals can serve as a central input to a machine learning framework for crystalline inorganic materials. Taking metal oxides as examples, we demonstrated that, an unsupervised learning algorithm Motif2Vec is able to convert the presence of structure motifs and their connections in a large set of crystalline compounds into unique vector representations. The connections among complex materials can be largely determined by the presence of different structure motifs and their clustering information are identified by our Motif2Vec algorithm. To demonstrate the novel use of structure motif information, we show that a motif-centric learning framework can be effectively created by combining motif information with the recently developed atom-based graph neural networks to form an atom-motif dual graph network (AMDNet). Taking advantage of node and edge information on both atomic and motif level, the AMDNet is more accurate than an atom graph network in predicting electronic structure related material properties of metal oxides such as band gaps. The work illustrates the route toward fundamental design of graph neural network learning architecture for complex material properties by incorporating beyond-atom physical principles.

Paper

Share this book

Add to My Shelf

Multi-Modal Trajectory Prediction of NBA Players

by Hauri, Sandro , Vucetic, Slobodan , Radosavljevic, Vladan in Decision making , Players , Professional basketball

2020

National Basketball Association (NBA) players are highly motivated and skilled experts that solve complex decision making problems at every time point during a game. As a step towards understanding how players make their decisions, we focus on their movement trajectories during games. We propose a method that captures the multi-modal behavior of players, where they might consider multiple trajectories and select the most advantageous one. The method is built on an LSTM-based architecture predicting multiple trajectories and their probabilities, trained by a multi-modal loss function that updates the best trajectories. Experiments on large, fine-grained NBA tracking data show that the proposed method outperforms the state-of-the-art. In addition, the results indicate that the approach generates more realistic trajectories and that it can learn individual playing styles of specific players.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter