Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
191
result(s) for
"Vu, Mai Ha"
Sort by:
Unconstrained generation of synthetic antibody–antigen structures to guide machine learning methodology for antibody specificity prediction
2022
Machine learning (ML) is a key technology for accurate prediction of antibody-antigen binding. Two orthogonal problems hinder the application of ML to antibody-specificity prediction and the benchmarking thereof: the lack of a unified ML formalization of immunological antibody-specificity prediction problems and the unavailability of large-scale synthetic datasets to benchmark real-world relevant ML methods and dataset design. Here we developed the Absolut! software suite that enables parameter-based unconstrained generation of synthetic lattice-based three-dimensional antibody-antigen-binding structures with ground-truth access to conformational paratope, epitope and affinity. We formalized common immunological antibody-specificity prediction problems as ML tasks and confirmed that for both sequence- and structure-based tasks, accuracy-based rankings of ML methods trained on experimental data hold for ML methods trained on Absolut!-generated data. The Absolut! framework has the potential to enable real-world relevant development and benchmarking of ML strategies for biotherapeutics design.
Journal Article
A New Pyrrolidone Alkaloid and Other Constituents from Rourea oligophlebia Stems
2021
Phytochemical study of Rourea oligophlebia stems led to the isolation of a new 2-pyrrolidone alkaloid (R,S)-N-(5-hydroxyl-pyrrolidin-2-one-1-yl)acetamide (1), together with 14 known compounds including friedelin (2), friedanol (3), taraxerol (4), vanillin (5), coniferyl aldehyde (6), apigenin (7), 7α-hydroxy-3β-sitosterol (8), coniferyl alcohol (9), scopoletin (10), emodin (11), protocatechuic acid (12), catechin (13), procyanidin A1 (14), and (E)-2,3,5,4’-tetrahydroxystilbene-2-β-D-glucoside (15). Several isolated compounds were evaluated for cytotoxicity and antimicrobial activity. Compound 11 exhibited good antimicrobial activity on Gram (+) strains and moderate cytotoxicity against KB, Hep-G2, and LU cancer cell lines. Compounds 6 and 8–10 showed selective activity on HepG-2 and MCF-7 over KB and LU cancer cell lines, while compound 7 exhibited similar effects on KB, HepG-2, and MCF-7 cell lines with IC50 values of 36.46 ± 0.81, 32.00 ± 0.58, and 32.03 ± 0.61 µg/mL, respectively.
Journal Article
Linguistically inspired roadmap for building biologically reliable protein language models
by
Greiff, Victor
,
Haug, Dag Trygve Truslew
,
Sandve, Geir Kjetil
in
4007/4009
,
631/114/2397
,
631/114/2410
2023
Deep neural-network-based language models (LMs) are increasingly applied to large-scale protein sequence data to predict protein function. However, being largely black-box models and thus challenging to interpret, current protein LM approaches do not contribute to a fundamental understanding of sequence–function mappings, hindering rule-based biotherapeutic drug development. We argue that guidance drawn from linguistics, a field specialized in analytical rule extraction from natural language data, can aid with building more interpretable protein LMs that are more likely to learn relevant domain-specific rules. Differences between protein sequence data and linguistic sequence data require the integration of more domain-specific knowledge in protein LMs compared with natural language LMs. Here, we provide a linguistics-based roadmap for protein LM pipeline choices with regard to training data, tokenization, token embedding, sequence embedding and model interpretation. Incorporating linguistic ideas into protein LMs enables the development of next-generation interpretable machine learning models with the potential of uncovering the biological mechanisms underlying sequence–function relationships.
Language models trained on proteins can help to predict functions from sequences but provide little insight into the underlying mechanisms. Vu and colleagues explain how extracting the underlying rules from a protein language model can make them interpretable and help explain biological mechanisms.
Journal Article
Linguistically inspired roadmap for building biologically reliable protein language models
2023
Deep neural-network-based language models (LMs) are increasingly applied to large-scale protein sequence data to predict protein function. However, being largely black-box models and thus challenging to interpret, current protein LM approaches do not contribute to a fundamental understanding of sequence-function mappings, hindering rule-based biotherapeutic drug development. We argue that guidance drawn from linguistics, a field specialized in analytical rule extraction from natural language data, can aid with building more interpretable protein LMs that are more likely to learn relevant domain-specific rules. Differences between protein sequence data and linguistic sequence data require the integration of more domain-specific knowledge in protein LMs compared to natural language LMs. Here, we provide a linguistics-based roadmap for protein LM pipeline choices with regard to training data, tokenization, token embedding, sequence embedding, and model interpretation. Incorporating linguistic ideas into protein LMs enables the development of next-generation interpretable machine-learning models with the potential of uncovering the biological mechanisms underlying sequence-function relationships.
ImmunoLingo: Linguistics-based formalization of the antibody language
2022
Apparent parallels between natural language and biological sequence have led to a recent surge in the application of deep language models (LMs) to the analysis of antibody and other biological sequences. However, a lack of a rigorous linguistic formalization of biological sequence languages, which would define basic components, such as lexicon (i.e., the discrete units of the language) and grammar (i.e., the rules that link sequence well-formedness, structure, and meaning) has led to largely domain-unspecific applications of LMs, which do not take into account the underlying structure of the biological sequences studied. A linguistic formalization, on the other hand, establishes linguistically-informed and thus domain-adapted components for LM applications. It would facilitate a better understanding of how differences and similarities between natural language and biological sequences influence the quality of LMs, which is crucial for the design of interpretable models with extractable sequence-functions relationship rules, such as the ones underlying the antibody specificity prediction problem. Deciphering the rules of antibody specificity is crucial to accelerating rational and in silico biotherapeutic drug design. Here, we formalize the properties of the antibody language and thereby establish not only a foundation for the application of linguistic tools in adaptive immune receptor analysis but also for the systematic immunolinguistic studies of immune receptor specificity in general.
Linguistics-based formalization of the antibody language as a basis for antibody language models
by
Haug, Dag Trygve Truslew
,
Greiff, Victor
,
Sandve, Geir Kjetil
in
Amino acids
,
Antibodies
,
Antigens
2024
Apparent parallels between natural language and antibody sequences have led to a surge in deep language models applied to antibody sequences for predicting cognate antigen recognition. However, a linguistic formal definition of antibody language does not exist, and insight into how antibody language models capture antibody-specific binding features remains largely uninterpretable. Here we describe how a linguistic formalization of the antibody language, by characterizing its tokens and grammar, could address current challenges in antibody language model rule mining.
Journal Article
A Quantifier-Based Approach to NPI-Licensing Typology: Empirical and Computational Investigations
by
Vu, Mai Ha
in
Linguistics
2020
This thesis examines the quantifier-based approach to NPI-licensing (as proposed in (Giannakidou, 2000)) from empirical and computational perspectives. This approach argues that all NPIs can be categorized as either existentially or universally quantified items, and that this difference drives cross-linguistically divergent NPI-behaviors. After providing the necessary background and assumptions, in the first half of the thesis I show that English any-NPIs are existentially quantified, whereas Hungarian se-NPIs are universally quantified. I also demonstrate how this approach can help understand the behavior of NPIs in other languages and language families such as Slavic, Mandarin Chinese, Turkish, and Romance languages. In the second half of the thesis, I analyze the quantifier-based NPI-licensing constraints for computational complexity. I find that except for the constraints that rely on derived c-command, all other constraints can be described with Input-local Tier-based Strictly Local (I-TSL) or Multiple Input-local Tier-based Strictly Local (MITSL) restrictions, which means that tree-languages that satisfy NPI-licensing constraints for the most part fit into a fairly restrictive subregular class of tree-languages. Taken together, this thesis argues that a theoretically informed approach to linguistic phenomena can significantly affect results on their computational complexity.
Dissertation
Unconstrained generation of synthetic antibody-antigen structures to guide machine learning methodology for real-world antibody specificity prediction
2022
Machine learning (ML) is a key technology for accurate prediction of antibody-antigen binding. Two orthogonal problems hinder the application of ML to antibody-specificity prediction and the benchmarking thereof: The lack of a unified ML formalization of immunological antibody specificity prediction problems and the unavailability of large-scale synthetic benchmarking datasets of real-world relevance. Here, we developed the Absolut! software suite that enables parameter-based unconstrained generation of synthetic lattice-based 3D-antibody-antigen binding structures with ground-truth access to conformational paratope, epitope, and affinity. We formalized common immunological antibody specificity prediction problems as ML tasks and confirmed that for both sequence and structure-based tasks, accuracy-based rankings of ML methods trained on experimental data hold for ML methods trained on Absolut!-generated data. The Absolut! framework thus enables real-world relevant development and benchmarking of ML strategies for biotherapeutics design.
biorxiv;2021.07.06.451258v3/UFIG1F1ufig1The software framework Absolut! enables (A,B) the generation of virtually arbitrarily large numbers of synthetic 3D-antibody-antigen structures, (C,D) the formalization of antibody specificity as machine learning (ML) tasks as well as the exploration of ML strategies for real-world antibody-antigen binding or paratope-epitope prediction.
The software framework Absolut! enables (A,B) the generation of virtually arbitrarily large numbers of synthetic 3D-antibody-antigen structures, (C,D) the formalization of antibody specificity as machine learning (ML) tasks as well as the exploration of ML strategies for real-world antibody-antigen binding or paratope-epitope prediction.
Software framework Absolut! to generate an arbitrarily large number of synthetic 3D-antibody-antigen structures that contain biological layers of antibody-antigen binding complexity that render ML predictions challenging
Immunological antibody specificity prediction problems formalized as machine learning tasks for which the in silico complexes are immediately usable as benchmark datasets
Exploration of machine learning prediction accuracy as a function of architecture, dataset size, choice of negatives, and sequence-structure encoding
Relative ML performance learnt on Absolut! datasets transfers to experimental datasets
One billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction
by
Lund-Johansen, Fridtjof
,
Hochreiter, Sepp
,
Prósz, Aurél
in
Antibodies
,
Antigens
,
Computer applications
2021
Machine learning (ML) is a key technology to enable accurate prediction of antibody-antigen binding, a prerequisite for in silico vaccine and antibody design. Two orthogonal problems hinder the current application of ML to antibody-specificity prediction and the benchmarking thereof: (i) The lack of a unified formalized mapping of immunological antibody specificity prediction problems into ML notation and (ii) the unavailability of large-scale training datasets. Here, we developed the Absolut! software suite that allows the parameter-based unconstrained generation of synthetic lattice-based 3D-antibody-antigen binding structures with ground-truth access to conformational paratope, epitope, and affinity. We show that Absolut!-generated datasets recapitulate critical biological sequence and structural features that render antibody-antigen binding prediction challenging. To demonstrate the immediate, high-throughput, and large-scale applicability of Absolut!, we have created an online database of 1 billion antibody-antigen structures, the extension of which is only constrained by moderate computational resources. We translated immunological antibody specificity prediction problems into ML tasks and used our database to investigate paratope-epitope binding prediction accuracy as a function of structural information encoding, dataset size, and ML method, which is unfeasible with existing experimental data. Furthermore, we found that in silico investigated conditions, predicted to increase antibody specificity prediction accuracy, align with and extend conclusions drawn from experimental antibody-antigen structural data. In summary, the Absolut! framework enables the development and benchmarking of ML strategies for biotherapeutics discovery and design. Competing Interest Statement E.M. declares holding shares in aiNET GmbH. V.G. declares advisory board positions in aiNET GmbH and Enpicom B.V. VG is a consultant for Roche/Genentech. Footnotes * Linking the present findings with the back-to-back twin paper https://www.biorxiv.org/content/10.1101/2021.07.08.451480v1 * https://github.com/csi-greifflab/Absolut
Safety and immunogenicity of an egg-based inactivated Newcastle disease virus vaccine expressing SARS-CoV-2 spike: Interim results of a randomized, placebo-controlled, phase 1/2 trial in Vietnam
by
Krammer, Florian
,
Mai Nguyen, Huong
,
Raghunandan, Rama
in
adjuvants
,
Adjuvants, Immunologic
,
Adolescent
2022
Production of affordable coronavirus disease 2019 (COVID-19) vaccines in low- and middle-income countries is needed. NDV-HXP-S is an inactivated egg-based Newcastle disease virus (NDV) vaccine expressing the spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Wuhan-Hu-1. The spike protein was stabilized and incorporated into NDV virions by removing the polybasic furin cleavage site, introducing the transmembrane domain and cytoplasmic tail of the fusion protein of NDV, and introducing six prolines for stabilization in the prefusion state. Vaccine production and clinical development was initiated in Vietnam, Thailand, and Brazil. Here the interim results from the first stage of the randomized, dose-escalation, observer-blind, placebo-controlled, phase 1/2 trial conducted at the Hanoi Medical University (Vietnam) are presented. Healthy adults aged 18–59 years, non-pregnant, and with self-reported negative history for SARS-CoV-2 infection were eligible. Participants were randomized to receive one of five treatments by intramuscular injection twice, 28 days apart: 1 μg +/- CpG1018 (a toll-like receptor 9 agonist), 3 μg alone, 10 μg alone, or placebo. Participants and personnel assessing outcomes were masked to treatment. The primary outcomes were solicited adverse events (AEs) during 7 days and subject-reported AEs during 28 days after each vaccination. Investigators further reviewed subject-reported AEs. Secondary outcomes were immunogenicity measures (anti-spike immunoglobulin G [IgG] and pseudotyped virus neutralization). This interim analysis assessed safety 56 days after first vaccination (day 57) in treatment-exposed individuals and immunogenicity through 14 days after second vaccination (day 43) per protocol. Between March 15 and April 23, 2021, 224 individuals were screened and 120 were enrolled (25 per group for active vaccination and 20 for placebo). All subjects received two doses. The most common solicited AEs among those receiving active vaccine or placebo were all predominantly mild and included injection site pain or tenderness (<58%), fatigue or malaise (<22%), headache (<21%), and myalgia (<14%). No higher proportion of the solicited AEs were observed for any group of active vaccine. The proportion reporting vaccine-related AEs during the 28 days after either vaccination ranged from 4% to 8% among vaccine groups and was 5% in controls. No vaccine-related serious adverse event occurred. The immune response in the 10 μg formulation group was highest, followed by 1 μg + CpG1018, 3 μg, and 1 μg formulations. Fourteen days after the second vaccination, the geometric mean concentrations (GMC) of 50% neutralizing antibody against the homologous Wuhan-Hu-1 pseudovirus ranged from 56.07 IU/mL (1 μg, 95% CI 37.01, 84.94) to 246.19 IU/mL (10 μg, 95% CI 151.97, 398.82), with 84% to 96% of vaccine groups attaining a ≥ 4-fold increase over baseline. This was compared to a panel of human convalescent sera (N = 29, 72.93 95% CI 33.00–161.14). Live virus neutralization to the B.1.617.2 (Delta) variant of concern was reduced but in line with observations for vaccines currently in use. Since the adjuvant has shown modest benefit, GMC ratio of 2.56 (95% CI, 1.4–4.6) for 1 μg +/- CpG1018, a decision was made not to continue studying it with this vaccine. NDV-HXP-S had an acceptable safety profile and potent immunogenicity. The 3 μg dose was advanced to phase 2 along with a 6 μg dose. The 10 μg dose was not selected for evaluation in phase 2 due to potential impact on manufacturing capacity. ClinicalTrials.gov NCT04830800.
Journal Article