Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
88
result(s) for
"Di Gioacchino, Andrea"
Sort by:
A transfer-learning approach to predict antigen immunogenicity and T-cell receptor specificity
by
Di Gioacchino, Andrea
,
Monasson, Rémi
,
Bravi, Barbara
in
Antigens
,
Cancer
,
Computational and Systems Biology
2023
Antigen immunogenicity and the specificity of binding of T-cell receptors to antigens are key properties underlying effective immune responses. Here we propose diffRBM, an approach based on transfer learning and Restricted Boltzmann Machines, to build sequence-based predictive models of these properties. DiffRBM is designed to learn the distinctive patterns in amino-acid composition that, on the one hand, underlie the antigen’s probability of triggering a response, and on the other hand the T-cell receptor’s ability to bind to a given antigen. We show that the patterns learnt by diffRBM allow us to predict putative contact sites of the antigen-receptor complex. We also discriminate immunogenic and non-immunogenic antigens, antigen-specific and generic receptors, reaching performances that compare favorably to existing sequence-based predictors of antigen immunogenicity and T-cell receptor specificity.
Journal Article
Generative and interpretable machine learning for aptamer design and analysis of in vitro sequence selection
2022
Selection protocols such as SELEX, where molecules are selected over multiple rounds for their ability to bind to a target of interest, are popular methods for obtaining binders for diagnostic and therapeutic purposes. We show that Restricted Boltzmann Machines (RBMs), an unsupervised two-layer neural network architecture, can successfully be trained on sequence ensembles from single rounds of SELEX experiments for thrombin aptamers. RBMs assign scores to sequences that can be directly related to their fitnesses estimated through experimental enrichment ratios. Hence, RBMs trained from sequence data at a given round can be used to predict the effects of selection at later rounds. Moreover, the parameters of the trained RBMs are interpretable and identify functional features contributing most to sequence fitness. To exploit the generative capabilities of RBMs, we introduce two different training protocols: one taking into account sequence counts, capable of identifying the few best binders, and another based on unique sequences only, generating more diverse binders. We then use RBMs model to generate novel aptamers with putative disruptive mutations or good binding properties, and validate the generated sequences with gel shift assay experiments. Finally, we compare the RBM’s performance with different supervised learning approaches that include random forests and several deep neural network architectures.
Journal Article
The Heterogeneous Landscape and Early Evolution of Pathogen-Associated CpG Dinucleotides in SARS-CoV-2
by
Komarova, Anastassia V
,
Šulc, Petr
,
Di Gioacchino, Andrea
in
Amino acids
,
Antiviral drugs
,
Bias
2021
COVID-19 can lead to acute respiratory syndrome, which can be due to dysregulated immune signaling. We analyze the distribution of CpG dinucleotides, a pathogen-associated molecular pattern, in the SARS-CoV-2 genome. We characterize CpG content by a CpG force that accounts for statistical constraints acting on the genome at the nucleotidic and amino acid levels. The CpG force, as the CpG content, is overall low compared with other pathogenic betacoronaviruses; however, it widely fluctuates along the genome, with a particularly low value, comparable with the circulating seasonal HKU1, in the spike coding region and a greater value, comparable with SARS and MERS, in the highly expressed nucleocapside coding region (N ORF), whose transcripts are relatively abundant in the cytoplasm of infected cells and present in the 3′UTRs of all subgenomic RNA. This dual nature of CpG content could confer to SARS-CoV-2 the ability to avoid triggering pattern recognition receptors upon entry, while eliciting a stronger response during replication. We then investigate the evolution of synonymous mutations since the outbreak of the COVID-19 pandemic, finding a signature of CpG loss in regions with a greater CpG force. Sequence motifs preceding the CpG-loss-associated loci in the N ORF match recently identified binding patterns of the zinc finger antiviral protein. Using a model of the viral gene evolution under human host pressure, we find that synonymous mutations seem driven in the SARS-CoV-2 genome, and particularly in the N ORF, by the viral codon bias, the transition–transversion bias, and the pressure to lower CpG content.
Journal Article
Deciphering the Code of Viral-Host Adaptation Through Maximum-Entropy Nucleotide Bias Models
by
Lecce, Ivan
,
Di Gioacchino, Andrea
,
Monasson, Rémi
in
Discoveries
,
Entropy
,
Evolution, Molecular
2025
How viruses evolve largely depends on their hosts. To quantitatively characterize this dependence, we introduce Maximum Entropy Nucleotide Bias models (MENB) learned from single, di- and tri-nucleotide usage of viral sequences that infect a given host. We first use MENB to classify the viral family and the host of a virus from its genome, among four families of ssRNA viruses and three hosts. We show that both the viral family and the host leave a fingerprint in nucleotide motif usages that MENB models decode. Benchmarking our approach against state-of-the-art methods based on deep neural networks shows that MENB is rapid, interpretable and robust. Our approach is able to predict, with good accuracy, both the viral family and the host from a whole genomic sequence or a portion of it. MENB models also display promising out of sample generalization ability on viral sequences of new host taxa or new viral families. Our approach is also capable of identifying, within the limitations imposed by the three-host setting, intermediate hosts for well-known pathogenic strains of Influenza A subtypes and Human Coronavirus and recombinations and reassortments on specific genomic regions. Finally, MENB models can be used to track the adaptation to the new host, to shed light on the more relevant selective pressures that acted on motif usage during this process and to design new sequences with altered nucleotide usage at fixed amino-acid content.
Journal Article
Designing molecular RNA switches with Restricted Boltzmann machines
2025
Riboswitches are structured allosteric RNA molecules capable of switching between competing conformations in response to a metabolite binding event, eventually triggering a regulatory response. Computational modelling of these molecules is complicated by complex tertiary contacts, conditioned to the presence of their cognate metabolite. In this work, we show that Restricted Boltzmann machines (RBM), a simple two-layer machine learning model, capture intricate sequence dependencies induced by secondary and tertiary structure, as well as the switching mechanism, resulting in a model that can be successfully used for the design of allosteric RNA. As a case study we consider the aptamer domain of SAM-I riboswitches. To validate the functionality of designed sequences experimentally by SHAPE-MaP, we develop a tailored analysis pipeline adequate for high-throughput probing of diverse homologous sequences. We find that among the probed 84 RBM designed sequences, showing up to 20% divergence from any natural sequence, about 28% (and 47% of the 45 among them having low RBM effective energies), are correctly structured and undergo a structural allosteric in response to SAM. Finally, we show how the flexibility of the molecule to switch conformations is connected to fine energetic features of its structural components.
Journal Article
Large deviations of the free energy in the p-spin glass spherical model
by
Rotondo, Pietro
,
Pastore, Mauro
,
Andrea Di Gioacchino
in
Broken symmetry
,
Free energy
,
Magnetic fields
2019
We investigate the behavior of the rare fluctuations of the free energy in the p-spin spherical model, evaluating the corresponding rate function via the G\"artner-Ellis theorem. This approach requires the knowledge of the analytic continuation of the disorder-averaged replicated partition function to arbitrary real number of replicas. In zero external magnetic field, we show via a one-step replica symmetry breaking (1RSB) calculation that the rate function is infinite for fluctuations of the free energy above its typical value, corresponding to an anomalous, super-extensive suppression of rare fluctuations. We extend this calculation to non-zero magnetic field, showing that in this case this very large deviation disappears and we try to motivate this finding in light of a geometrical interpretation of the scaled cumulant generating function.
Perils of Embedding for Sampling Problems
by
Rieffel, Eleanor G
,
Marshall, Jeffrey
,
Andrea Di Gioacchino
in
Annealing
,
Embedding
,
Ground state
2020
Advances in techniques for thermal sampling in classical and quantum systems would deepen understanding of the underlying physics. Unfortunately, one often has to rely solely on inexact numerical simulation, due to the intractability of computing the partition function in many systems of interest. Emerging hardware, such as quantum annealers, provide novel tools for such investigations, but it is well known that studying general, non-native systems on such devices requires graph minor embedding, at the expense of introducing additional variables. The effect of embedding for sampling is more pronounced than for optimization; for optimization one is just concerned with the ground state physics, whereas for sampling one needs to consider states at all energies. We argue that as the system size or the embedding size grows, the chance of a sample being in the subspace of interest - the logical subspace - can be exponentially suppressed. Though the severity of this scaling can be lessened through favorable parameter choices, certain physical constraints (such as a fixed temperature and range of couplings) provide hard limits on what is currently feasible. Furthermore, we show that up to some practical and reasonable assumptions, any type of post-processing to project samples back into the logical subspace will bias the resulting statistics. We introduce a new such technique, based on resampling, that substantially outperforms majority vote, which is shown to fail quite dramatically at preserving distribution properties.
Selberg integrals in 1D random Euclidean optimization problems
by
Caracciolo, Sergio
,
Andrea Di Gioacchino
,
Molinari, Luca G
in
Euclidean geometry
,
Integrals
,
Operations research
2019
We consider a set of Euclidean optimization problems in one dimension, where the cost function associated to the couple of points \\(x\\) and \\(y\\) is the Euclidean distance between them to an arbitrary power \\(p\\ge1\\), and the points are chosen at random with flat measure. We derive the exact average cost for the random assignment problem, for any number of points, by using Selberg's integrals. Some variants of these integrals allows to derive also the exact average cost for the bipartite travelling salesman problem.
The heterogeneous landscape and early evolution of pathogen-associated CpG dinucleotides in SARS-CoV-2
by
Andrea Di Gioacchino
,
Komarova, Anastassia V
,
Šulc, Petr
in
Amino acid sequence
,
Antiviral agents
,
Codon bias
2020
Abstract COVID-19 can lead to acute respiratory syndrome, which can be due to dysregulated immune signaling. We analyze the distribution of CpG dinucleotides, a pathogen-associated molecular pattern, in the SARS-CoV-2 genome. We find that the CpG content, which we characterize by a force parameter that accounts for statistical constraints acting on the genome at the nucleotidic and amino-acid levels, is, on average, low compared to other pathogenic betacoronaviruses. However, the CpG force widely fluctuates along the genome, with a particularly low value, comparable to the circulating seasonal HKU1, in the spike coding region and a greater value, comparable to SARS and MERS, in the highly expressed nucleocapside coding region (N ORF), whose transcripts are relatively abundant in the cytoplasm of infected cells and present in the 3’UTRs of all subgenomic RNA. This dual nature of CpG content could confer to SARS-CoV-2 the ability to avoid triggering pattern recognition receptors upon entry, while eliciting a stronger response during replication. We then investigate the evolution of synonymous mutations since the outbreak of the COVID-19 pandemic, finding a signature of CpG loss in regions with a greater CpG force. Sequence motifs preceding the CpG-loss-associated loci in the N ORF match recently identified binding patterns of the Zinc finger Anti-viral Protein. Using a model of the viral gene evolution under human host pressure, we find that synonymous mutations seem driven in the SARS-CoV-2 genome, and particularly in the N ORF, by the viral codon bias, the transition-transversion bias and the pressure to lower CpG content. Competing Interest Statement The authors have declared no competing interest. Footnotes * Revised text. Data analysis updated to October 2020 . * ↵5 Here we drop the subscripts nc and c used in the previous section to identify non-coding and coding forces, since the SMS is defined for a generic force.