Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Series TitleSeries Title
-
Reading LevelReading Level
-
YearFrom:-To:
-
More FiltersMore FiltersContent TypeItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
244,044
result(s) for
"Song, S."
Sort by:
Deep Learning for Population Genetic Inference
2016
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Journal Article
Robust and scalable inference of population history from hundreds of unphased whole genomes
2017
Yun Song and colleagues present SMC++, a statistical method for population history inference capable of analyzing unphased whole genomes and sample sizes much larger than can be analyzed by current methods. The authors apply SMC++ to sequence data from human,
Drosophila
and finch populations.
It has recently been demonstrated that inference methods based on genealogical processes with recombination can uncover past population history in unprecedented detail. However, these methods scale poorly with sample size, limiting resolution in the recent past, and they require phased genomes, which contain switch errors that can catastrophically distort the inferred history. Here we present SMC++, a new statistical tool capable of analyzing orders of magnitude more samples than existing methods while requiring only unphased genomes (its results are independent of phasing). SMC++ can jointly infer population size histories and split times in diverged populations, and it employs a novel spline regularization scheme that greatly reduces estimation error. We apply SMC++ to analyze sequence data from over a thousand human genomes in Africa and Eurasia, hundreds of genomes from a
Drosophila melanogaster
population in Africa, and tens of genomes from zebra finch and long-tailed finch populations in Australia.
Journal Article
The impact of ribosomal interference, codon usage, and exit tunnel interactions on translation elongation rate variation
2018
Previous studies have shown that translation elongation is regulated by multiple factors, but the observed heterogeneity remains only partially explained. To dissect quantitatively the different determinants of elongation speed, we use probabilistic modeling to estimate initiation and local elongation rates from ribosome profiling data. This model-based approach allows us to quantify the extent of interference between ribosomes on the same transcript. We show that neither interference nor the distribution of slow codons is sufficient to explain the observed heterogeneity. Instead, we find that electrostatic interactions between the ribosomal exit tunnel and specific parts of the nascent polypeptide govern the elongation rate variation as the polypeptide makes its initial pass through the tunnel. Once the N-terminus has escaped the tunnel, the hydropathy of the nascent polypeptide within the ribosome plays a major role in modulating the speed. We show that our results are consistent with the biophysical properties of the tunnel.
Journal Article
Genotype and SNP calling from next-generation sequencing data
by
Albrechtsen, Anders
,
Song, Yun S.
,
Nielsen, Rasmus
in
631/208/514/2254
,
Agriculture
,
Algorithms
2011
Key Points
Converting next-generation sequencing (NGS) image files into a set of called SNPs involves a number of steps including image analysis, alignment and assembly, SNP calling and genotype calling.
Genotype probabilities for a single individual can be calculated from alignments using recalibrated quality scores.
SNP calling and genotype calling is best done using information from multiple individuals simultaneously. The pattern of linkage disequilibrium should be used to call SNPs and genotypes when possible.
Analyses of low coverage data can proceed by taking uncertainty in the genotype calls into account, rather than assuming any particular genotype call is correct.
The methods used for calling SNPs and for taking uncertainty in SNP genotypes into account can have a strong effect on downstream analyses, including association mapping analyses.
An overview of the steps required in converting next-generation sequencing (NGS) data into accurate called SNPs and genotypes, a process that is crucial for the many downstream analyses of NGS data.
Meaningful analysis of next-generation sequencing (NGS) data, which are produced extensively by genetics and genomics studies, relies crucially on the accurate calling of SNPs and genotypes. Recently developed statistical methods both improve and quantify the considerable uncertainty associated with genotype calling, and will especially benefit the growing number of studies using low- to medium-coverage data. We review these methods and provide a guide for their use in NGS studies.
Journal Article
A fast machine-learning-guided primer design pipeline for selective whole genome amplification
by
Brisson, Dustin
,
Song, Yun S.
,
Dwivedi-Yu, Jane A.
in
Algorithms
,
Amplification
,
Binding sites
2023
Addressing many of the major outstanding questions in the fields of microbial evolution and pathogenesis will require analyses of populations of microbial genomes. Although population genomic studies provide the analytical resolution to investigate evolutionary and mechanistic processes at fine spatial and temporal scales—precisely the scales at which these processes occur—microbial population genomic research is currently hindered by the practicalities of obtaining sufficient quantities of the relatively pure microbial genomic DNA necessary for next-generation sequencing. Here we present swga2.0 , an optimized and parallelized pipeline to design selective whole genome amplification (SWGA) primer sets. Unlike previous methods, swga2.0 incorporates active and machine learning methods to evaluate the amplification efficacy of individual primers and primer sets. Additionally, swga2.0 optimizes primer set search and evaluation strategies, including parallelization at each stage of the pipeline, to dramatically decrease program runtime. Here we describe the swga2.0 pipeline, including the empirical data used to identify primer and primer set characteristics, that improve amplification performance. Additionally, we evaluate the novel swga2.0 pipeline by designing primer sets that successfully amplify Prevotella melaninogenica , an important component of the lung microbiome in cystic fibrosis patients, from samples dominated by human DNA.
Journal Article
Testican-1-mediated epithelial–mesenchymal transition signaling confers acquired resistance to lapatinib in HER2-positive gastric cancer
2014
Human epidermal growth factor receptor 2 (HER2)-directed treatment using trastuzumab has shown clinical benefit in HER2-positive gastric cancer. Clinical trials using lapatinib in HER2-positive gastric cancer are also currently underway. As with other molecularly targeted agents, the emergence of acquired resistance to HER2-directed treatment is an imminent therapeutic problem for HER2-positive gastric cancer. In order to investigate the mechanisms of acquired resistance to HER2-directed treatment in gastric cancer, we generated lapatinib-resistant gastric cancer cell lines (SNU216 LR)
in vitro
by chronic exposure of a HER2-positive gastric cancer cell line (SNU216) to lapatinib. The resultant SNU216 LR cells were also resistant to gefitinib, cetuximab, trastuzumab, afatinib and dacomitinib. Interestingly, SNU216 LR cells displayed an epithelial–mesenchymal transition (EMT) phenotype and maintained the activation of MET, HER3, Stat3, Akt and mitogen-activated protein kinase signaling in the presence of lapatinib. Using gene expression arrays, we identified the upregulation of a variety of EMT-related genes and extracellular matrix molecules, such as Testican-1, in SNU216 LR cells. We showed that the inhibition of Testican-1 by small interfering RNA decreased Testican-1-induced, MET-dependent, downstream signaling, and restored sensitivity to lapatinib in these cells. Furthermore, treatment with XAV939 selectively inhibited β-catenin-mediated transcription and Testican-1-induced EMT signaling, leading to G1 arrest. Taken together, these data support the potential role of EMT in acquired resistance to HER2-directed treatment in HER2-positive gastric cancer, and provide insights into strategies for preventing and/or overcoming this resistance in patients.
Journal Article
Cross-protein transfer learning substantially improves disease variant prediction
by
Ye, Chengzhong
,
Koehl, Antoine
,
Ioannidis, Nilah
in
Amino Acid Sequence
,
amino acid sequences
,
Animal Genetics and Genomics
2023
Background
Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity.
Results
We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes.
Conclusions
Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.
Journal Article
Seasonal overturn and stratification changes drive deep-water warming in one of Earth’s largest lakes
2021
Most of Earth’s fresh surface water is consolidated in just a few of its largest lakes, and because of their unique response to environmental conditions, lakes have been identified as climate change sentinels. While the response of lake surface water temperatures to climate change is well documented from satellite and summer in situ measurements, our understanding of how water temperatures in large lakes are responding at depth is limited, as few large lakes have detailed long-term subsurface observations. We present an analysis of three decades of high frequency (3-hourly and hourly) subsurface water temperature data from Lake Michigan. This unique data set reveals that deep water temperatures are rising in the winter and provides precise measurements of the timing of fall overturn, the point of minimum temperature, and the duration of the winter cooling period. Relationships from the data show a shortened winter season results in higher subsurface temperatures and earlier onset of summer stratification. Shifts in the thermal regimes of large lakes will have profound impacts on the ecosystems of the world’s surface freshwater.
This study presents hourly data from a thermistor string in Lake Michigan, inspecting its response at depth to surface warming. Based on the data, the study suggests bottom lake temperatures respond to changes in turnover and re-stratification, with the ultimate possibility of the lake shifting from dimictic to monomictic.
Journal Article