Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
27
result(s) for
"Sulovari, Arvis"
Sort by:
Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads
2021
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing
1
,
2
with continuous long-read or high-fidelity
3
sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
Assembly of haplotype-resolved human genomes is achieved by combining short and long reads.
Journal Article
Recurrent de novo mutations in neurodevelopmental disorders: properties and clinical implications
by
Wilfert, Amy B.
,
Sulovari, Arvis
,
Eichler, Evan E.
in
Autism
,
Autism spectrum disorder
,
Bioinformatics
2017
Next-generation sequencing (NGS) is now more accessible to clinicians and researchers. As a result, our understanding of the genetics of neurodevelopmental disorders (NDDs) has rapidly advanced over the past few years. NGS has led to the discovery of new NDD genes with an excess of recurrent de novo mutations (DNMs) when compared to controls. Development of large-scale databases of normal and disease variation has given rise to metrics exploring the relative tolerance of individual genes to human mutation. Genetic etiology and diagnosis rates have improved, which have led to the discovery of new pathways and tissue types relevant to NDDs. In this review, we highlight several key findings based on the discovery of recurrent DNMs ranging from copy number variants to point mutations. We explore biases and patterns of DNM enrichment and the role of mosaicism and secondary mutations in variable expressivity. We discuss the benefit of whole-genome sequencing (WGS) over whole-exome sequencing (WES) to understand more complex, multifactorial cases of NDD and explain how this improved understanding aids diagnosis and management of these disorders. Comprehensive assessment of the DNM landscape across the genome using WGS and other technologies will lead to the development of novel functional and bioinformatics approaches to interpret DNMs and drive new insights into NDD biology.
Journal Article
An evolutionary driver of interspersed segmental duplications in primates
by
Sorensen, Melanie
,
Sulovari, Arvis
,
Welch, AnneMarie E.
in
Animal Genetics and Genomics
,
Animal models
,
APE gene
2020
Background
The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human–ape gene families, nuclear pore interacting protein (
NPIP
).
Results
Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis.
Conclusions
LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to
NPIP
gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.
Journal Article
Characterization of Hepatitis B Virus Integrations Identified in Hepatocellular Carcinoma Genomes
2021
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality. Almost half of HCC cases are associated with hepatitis B virus (HBV) infections, which often lead to HBV sequence integrations in the human genome. Accurate identification of HBV integration sites at a single nucleotide resolution is critical for developing a better understanding of the cancer genome landscape and of the disease itself. Here, we performed further analyses and characterization of HBV integrations identified by our recently reported VIcaller platform in recurrent or known HCC genes (such as TERT, MLL4, and CCNE1) as well as non-recurrent cancer-related genes (such as CSMD2, NKD2, and RHOU). Our pathway enrichment analysis revealed multiple pathways involving the alcohol dehydrogenase 4 gene, such as the metabolism pathways of retinol, tyrosine, and fatty acid. Further analysis of the HBV integration sites revealed distinct patterns involving the integration upper breakpoints, integrated genome lengths, and integration allele fractions between tumor and normal tissues. Our analysis also implies that the VIcaller method has diagnostic potential through discovering novel clonal integrations in cancer-related genes. In conclusion, although VIcaller is a hypothesis free virome-wide approach, it can still be applied to accurately identify genome-wide integration events of a specific candidate virus and their integration allele fractions.
Journal Article
Neurodevelopmental disease genes implicated by de novo mutation and copy number variation morbidity
2019
We combined de novo mutation (DNM) data from 10,927 individuals with developmental delay and autism to identify 253 candidate neurodevelopmental disease genes with an excess of missense and/or likely gene-disruptive (LGD) mutations. Of these genes, 124 reach exome-wide significance (
P
< 5 × 10
−7
) for DNM. Intersecting these results with copy number variation (CNV) morbidity data shows an enrichment for genomic disorder regions (30/253, likelihood ratio (LR) +1.85,
P
= 0.0017). We identify genes with an excess of missense DNMs overlapping deletion syndromes (for example,
KIF1A
and the 2q37 deletion) as well as duplication syndromes, such as recurrent
MAPK3
missense mutations within the chromosome 16p11.2 duplication, recurrent
CHD4
missense DNMs in the 12p13 duplication region, and recurrent
WDFY4
missense DNMs in the 10q11.23 duplication region. Network analyses of genes showing an excess of DNMs highlights functional networks, including cell-specific enrichments in the D1
+
and D2
+
spiny neurons of the striatum.
Analysis of ~10,000 cases of developmental delay and autism identifies 253 candidate neurodevelopmental disease genes. Network analysis highlights cell-specific enrichments of disease-related genes in the D1
+
and D2
+
spiny neurons of the striatum.
Journal Article
GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies
by
Li, Dawei
,
Sulovari, Arvis
in
Alleles
,
Animal Genetics and Genomics
,
Biomedical and Life Sciences
2014
Background
Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions.
Results
In this study, we have developed a tool, GACT, which stands for
G
enome build and
A
llele definition
C
onversion
T
ool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs.
Conclusion
GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep-sequencing, particularly for data from the dbGaP and other public databases.
GACT software
http://www.uvm.edu/genomics/software/gact
Journal Article
Association of Gamma-Aminobutyric Acid A Receptor alpha 2 Gene (GABRA2) with Alcohol Use Disorder
2014
Gamma-aminobutyric acid (GABA) is a major inhibitory neurotransmitter in mammalian brain. GABA receptor are involved in a number of complex disorders, including substance abuse. No variants of the commonly studied GABA receptor genes that have been associated with substance dependence have been determined to be functional or pathogenic. To reconcile the conflicting associations with substance dependence traits, we performed a meta-analysis of variants in the GABA sub(A) receptor genes (GABRB2, GABRA6, GABRA1, and GABRG2 on chromosome 5q and GABRA2 on chromosome 4p12) using genotype data from 4739 cases of alcohol, opioid, or methamphetamine dependence and 4924 controls. Then, we combined the data from candidate gene association studies in the literature with two alcohol dependence (AD) samples, including 1691 cases and 1712 controls from the Study of Addiction: Genetics and Environment (SAGE), and 2644 cases and 494 controls from our own study. Using a Bonferroni-corrected threshold of 0.007, we found strong associations between GABRA2 and AD (P=9 10 super(-6) and odds ratio (OR) 95% confidence interval (CI)=1.27 (1.15, 1.4) for rs567926, P=4 10 super(-5) and OR=1.21 (1.1, 1.32) for rs279858), and between GABRG2 and both dependence on alcohol and dependence on heroin (P=0.0005 and OR=1.22 (1.09, 1.37) for rs211014). Significant association was also observed between GABRA6 rs3219151 and AD. The GABRA2 rs279858 association was observed in the SAGE data sets with a combined P of 9 10 super(-6) (OR=1.17 (1.09, 1.26)). When all of these data sets, including our samples, were meta-analyzed, associations of both GABRA2 single-nucleotide polymorphisms remained (for rs567926, P=7 10 super(-5) (OR=1.18 (1.09, 1.29)) in all the studies, and P=8 10 super(-6) (OR=1.25 (1.13, 1.38)) in subjects of European ancestry and for rs279858, P=5 10 super(-6) (OR=1.18 (1.1, 1.26)) in subjects of European ancestry. Findings from this extensive meta-analysis of five GABA sub(A) receptor genes and substance abuse support their involvement (with the best evidence for GABRA2) in the pathogenesis of AD. Further replications with larger samples are warranted.
Journal Article
Recent ultra-rare inherited variants implicate new autism candidate risk genes
by
Hoekzema, Kendra
,
Sulovari, Arvis
,
Zody, Michael C.
in
631/208
,
631/208/366
,
631/208/366/1373
2021
Autism is a highly heritable complex disorder in which de novo mutation (DNM) variation contributes significantly to risk. Using whole-genome sequencing data from 3,474 families, we investigate another source of large-effect risk variation, ultra-rare variants. We report and replicate a transmission disequilibrium of private, likely gene-disruptive (LGD) variants in probands but find that 95% of this burden resides outside of known DNM-enriched genes. This variant class more strongly affects multiplex family probands and supports a multi-hit model for autism. Candidate genes with private LGD variants preferentially transmitted to probands converge on the E3 ubiquitin–protein ligase complex, intracellular transport and Erb signaling protein networks. We estimate that these variants are approximately 2.5 generations old and significantly younger than other variants of similar type and frequency in siblings. Overall, private LGD variants are under strong purifying selection and appear to act on a distinct set of genes not yet associated with autism.
Analysis of whole-genome sequence data from 3,474 families finds an excess of private, likely gene-disrupting variants in individuals with autism. These variants are under purifying selection and suggest candidate genes not previously associated with autism.
Journal Article
Recurrent inversion toggling and great ape genome evolution
2020
Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (
n
= 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.
Genome-wide detection of inversions in great ape genomes by using long-read sequencing and single-cell DNA template strand sequencing (Strand-seq) expands the number of known ape inversions and identifies several regions that have recurrently toggled between a direct and an inverted state during primate evolution.
Journal Article
Quantitative assessment reveals the dominance of duplicated sequences in germline-derived extrachromosomal circular DNA
by
Osia, Beth
,
Sulovari, Arvis
,
Mouakkad-Montoya, Lila
in
Amplification
,
Animals
,
Biological Sciences
2021
Extrachromosomal circular DNA (eccDNA) originates from linear chromosomal DNA in various human tissues under physiological and disease conditions. The genomic origins of eccDNA have largely been investigated using in vitro–amplified DNA. However, in vitro amplification obscures quantitative information by skewing the total population stoichiometry. In addition, the analyses have focused on eccDNA stemming from single-copy genomic regions, leaving eccDNA from multicopy regions unexamined. To address these issues, we isolated eccDNA without in vitro amplification (naïve small circular DNA, nscDNA) and assessed the populations quantitatively by integrated genomic, molecular, and cytogenetic approaches. nscDNA of up to tens of kilobases were successfully enriched by our approach and were predominantly derived from multicopy genomic regions including segmental duplications (SDs). SDs, which account for 5% of the human genome and are hotspots for copy number variations, were significantly overrepresented in sperm nscDNA, with three times more sequencing reads derived from SDs than from the entire single-copy regions. SDs were also overrepresented in mouse sperm nscDNA, which we estimated to comprise 0.2% of nuclear DNA. Considering that eccDNA can be integrated into chromosomes, germline-derived nscDNA may be a mediator of genome diversity.
Journal Article