Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
18
result(s) for
"Gussow, Ayal B."
Sort by:
Genomic determinants of pathogenicity in SARS-CoV-2 and other human coronaviruses
by
Gussow, Ayal B.
,
Wolf, Yuri I.
,
Koonin, Eugene V.
in
Animals
,
Betacoronavirus - classification
,
Betacoronavirus - genetics
2020
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) poses an immediate, major threat to public health across the globe. Here we report an in-depth molecular analysis to reconstruct the evolutionary origins of the enhanced pathogenicity of SARS-CoV-2 and other coronaviruses that are severe human pathogens. Using integrated comparative genomics and machine learning techniques, we identify key genomic features that differentiate SARS-CoV-2 and the viruses behind the two previous deadly coronavirus outbreaks, SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), from less pathogenic coronaviruses. These features include enhancement of the nuclear localization signals in the nucleocapsid protein and distinct inserts in the spike glycoprotein that appear to be associated with high case fatality rate of these coronaviruses as well as the host switch from animals to humans. The identified features could be crucial contributors to coronavirus pathogenicity and possible targets for diagnostics, prognostication, and interventions.
Journal Article
The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
by
Gussow, Ayal B.
,
Weir, William H.
,
Goldstein, David B.
in
Acquired immune deficiency syndrome
,
AIDS
,
Disease
2015
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease.
Journal Article
Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning
by
Gussow, Ayal B.
,
Vitsios, Dimitrios
,
Petrovski, Slavé
in
631/114/1305
,
631/114/2415
,
631/208/212
2021
Elucidating functionality in non-coding regions is a key challenge in human genomics. It has been shown that intolerance to variation of coding and proximal non-coding sequence is a strong predictor of human disease relevance. Here, we integrate intolerance to variation, functional genomic annotations and primary genomic sequence to build JARVIS: a comprehensive deep learning model to prioritize non-coding regions, outperforming other human lineage-specific scores. Despite being agnostic to evolutionary conservation, JARVIS performs comparably or outperforms conservation-based scores in classifying pathogenic single-nucleotide and structural variants. In constructing JARVIS, we introduce the genome-wide residual variation intolerance score (gwRVIS), applying a sliding-window approach to whole genome sequencing data from 62,784 individuals. gwRVIS distinguishes Mendelian disease genes from more tolerant CCDS regions and highlights ultra-conserved non-coding elements as the most intolerant regions in the human genome. Both JARVIS and gwRVIS capture previously inaccessible human-lineage constraint information and will enhance our understanding of the non-coding genome.
Intolerance to variation is a strong indicator of disease relevance for coding regions of the human genome. Here, the authors present JARVIS, a deep learning method integrating intolerance to variation in non-coding regions and sequence-specific annotations to infer non-coding variant pathogenicity.
Journal Article
Thousands of previously unknown phages discovered in whole-community human gut metagenomes
2021
Background
Double-stranded DNA bacteriophages (dsDNA phages) play pivotal roles in structuring human gut microbiomes; yet, the gut virome is far from being fully characterized, and additional groups of phages, including highly abundant ones, continue to be discovered by metagenome mining. A multilevel framework for taxonomic classification of viruses was recently adopted, facilitating the classification of phages into evolutionary informative taxonomic units based on hallmark genes. Together with advanced approaches for sequence assembly and powerful methods of sequence analysis, this revised framework offers the opportunity to discover and classify unknown phage taxa in the human gut.
Results
A search of human gut metagenomes for circular contigs encoding phage hallmark genes resulted in the identification of 3738 apparently complete phage genomes that represent 451 putative genera. Several of these phage genera are only distantly related to previously identified phages and are likely to found new families. Two of the candidate families, “Flandersviridae” and “Quimbyviridae”, include some of the most common and abundant members of the human gut virome that infect
Bacteroides
,
Parabacteroides
, and
Prevotella
. The third proposed family, “Gratiaviridae,” consists of less abundant phages that are distantly related to the families
Autographiviridae
,
Drexlerviridae
, and
Chaseviridae
. Analysis of CRISPR spacers indicates that phages of all three putative families infect bacteria of the phylum Bacteroidetes. Comparative genomic analysis of the three candidate phage families revealed features without precedent in phage genomes. Some “Quimbyviridae” phages possess Diversity-Generating Retroelements (DGRs) that generate hypervariable target genes nested within defense-related genes, whereas the previously known targets of phage-encoded DGRs are structural genes. Several “Flandersviridae” phages encode enzymes of the isoprenoid pathway, a lipid biosynthesis pathway that so far has not been known to be manipulated by phages. The “Gratiaviridae” phages encode a HipA-family protein kinase and glycosyltransferase, suggesting these phages modify the host cell wall, preventing superinfection by other phages. Hundreds of phages in these three and other families are shown to encode catalases and iron-sequestering enzymes that can be predicted to enhance cellular tolerance to reactive oxygen species.
Conclusions
Analysis of phage genomes identified in whole-community human gut metagenomes resulted in the delineation of at least three new candidate families of
Caudovirales
and revealed diverse putative mechanisms underlying phage-host interactions in the human gut. Addition of these phylogenetically classified, diverse, and distinct phages to public databases will facilitate taxonomic decomposition and functional characterization of human gut viromes.
E-7rced89cjMzTjfwKghHn
Video abstract
Journal Article
The intolerance to functional genetic variation of protein domains predicts the localization of pathogenic mutations within genes
by
Gussow, Ayal B.
,
Goldstein, David B.
,
Petrovski, Slavé
in
Animal Genetics and Genomics
,
Autism
,
Bioinformatics
2016
Ranking human genes based on their tolerance to functional genetic variation can greatly facilitate patient genome interpretation. It is well established, however, that different parts of proteins can have different functions, suggesting that it will ultimately be more informative to focus attention on functionally distinct portions of genes. Here we evaluate the intolerance of genic sub-regions using two biological sub-region classifications. We show that the intolerance scores of these sub-regions significantly correlate with reported pathogenic mutations. This observation extends the utility of intolerance scores to indicating where pathogenic mutations are mostly likely to fall within genes.
Journal Article
Machine-learning approach expands the repertoire of anti-CRISPR protein families
by
Gussow, Ayal B.
,
Wolf, Yuri I.
,
Koonin, Eugene V.
in
631/114/2785
,
631/326/4041
,
Adaptive immunity
2020
The CRISPR-Cas are adaptive bacterial and archaeal immunity systems that have been harnessed for the development of powerful genome editing and engineering tools. In the incessant host-parasite arms race, viruses evolved multiple anti-defense mechanisms including diverse anti-CRISPR proteins (Acrs) that specifically inhibit CRISPR-Cas and therefore have enormous potential for application as modulators of genome editing tools. Most Acrs are small and highly variable proteins which makes their bioinformatic prediction a formidable task. We present a machine-learning approach for comprehensive Acr prediction. The model shows high predictive power when tested against an unseen test set and was employed to predict 2,500 candidate Acr families. Experimental validation of top candidates revealed two unknown Acrs (AcrIC9, IC10) and three other top candidates were coincidentally identified and found to possess anti-CRISPR activity. These results substantially expand the repertoire of predicted Acrs and provide a resource for experimental Acr discovery.
CRISPR-Cas is a host adaptive immunity system and viruses harbor diverse anti-CRISPR proteins (Acrs). Here, the authors develop a random forest machine-learning approach to predict Acrs, identifying 2500 candidate Acr families, which expand the current repertoire of predicted Acrs by two orders of magnitude.
Journal Article
Prediction of the incubation period for COVID-19 and future virus disease outbreaks
by
Gussow, Ayal B.
,
Wolf, Yuri I.
,
Koonin, Eugene V.
in
Analysis
,
Biomedical and Life Sciences
,
Computer Simulation
2020
Background
A crucial factor in mitigating respiratory viral outbreaks is early determination of the duration of the incubation period and, accordingly, the required quarantine time for potentially exposed individuals. At the time of the COVID-19 pandemic, optimization of quarantine regimes becomes paramount for public health, societal well-being, and global economy. However, biological factors that determine the duration of the virus incubation period remain poorly understood.
Results
We demonstrate a strong positive correlation between the length of the incubation period and disease severity for a wide range of human pathogenic viruses. Using a machine learning approach, we develop a predictive model that accurately estimates, solely from several virus genome features, in particular, the number of protein-coding genes and the GC content, the incubation time ranges for diverse human pathogenic RNA viruses including SARS-CoV-2. The predictive approach described here can directly help in establishing the appropriate quarantine durations and thus facilitate controlling future outbreaks.
Conclusions
The length of the incubation period in viral diseases strongly correlates with disease severity, emphasizing the biological and epidemiological importance of the incubation period. Perhaps, surprisingly, incubation times of pathogenic RNA viruses can be accurately predicted solely from generic features of virus genomes. Elucidation of the biological underpinnings of the connections between these features and disease progression can be expected to reveal key aspects of virus pathogenesis.
Journal Article
Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics
2017
There is broad agreement that genetic mutations occurring outside of the protein-coding regions play a key role in human disease. Despite this consensus, we are not yet capable of discerning which portions of non-coding sequence are important in the context of human disease. Here, we present Orion, an approach that detects regions of the non-coding genome that are depleted of variation, suggesting that the regions are intolerant of mutations and subject to purifying selection in the human lineage. We show that Orion is highly correlated with known intolerant regions as well as regions that harbor putatively pathogenic variation. This approach provides a mechanism to identify pathogenic variation in the human non-coding genome and will have immediate utility in the diagnostic interpretation of patient genomes and in large case control studies using whole-genome sequences.
Journal Article
Incorporating Machine Learning into Established Bioinformatics Frameworks
by
Auslander, Noam
,
Gussow, Ayal B.
,
Koonin, Eugene V.
in
Algorithms
,
Computational Biology - trends
,
Databases, Factual - trends
2021
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Journal Article
Correction: Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics
by
Gussow, Ayal B.
,
Goldstein, David B.
,
Petrovski, Slavé
in
Genetic diversity
,
Genetics
,
Genomes
2018
[This corrects the article DOI: 10.1371/journal.pone.0181604.].
Journal Article