Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2,256
result(s) for
"Alkan, Can"
Sort by:
Limitations of next-generation genome sequence assembly
2011
High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct
de novo
genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent
de novo
assemblies using the short oligonucleotide analysis package (SOAP), generated from the genomes of a Han Chinese individual and a Yoruban individual, to experimentally validated genomic features. We found that
de novo
assemblies were 16.2% shorter than the reference genome and that 420.2 megabase pairs of common repeats and 99.1% of validated duplicated sequences were missing from the genome. Consequently, over 2,377 coding exons were completely missing. We conclude that high-quality sequencing approaches must be considered in conjunction with high-throughput sequencing for comparative genomics analyses and studies of genome evolution.
Journal Article
Population-level gene copy number variations reveal distinct genetic properties of different Malus species
by
Alkan, Can
,
Khan, Awais
,
Sakina, Aafreen
in
Adaptation
,
Analysis
,
Animal Genetics and Genomics
2025
Background
Copy number variations (CNVs) are crucial in plant evolution, adaptation, and domestication. In this study, we explored how CNVs contribute to genetic diversity, evolution, and adaptation during apple domestication. We examined the genome-wide CNV profiles and segmental duplications (SDs) in 116
Malus
accessions, including domesticated apple (
Malus domestica
) and its primary progenitor species (
M. sieversii
and
M. sylvestris
).
Results
On average, two accessions of the same species showed differences in at least 7,000 genes with varying copy number (CN) profiles. In contrast, accessions from different species had at least 20,000 genes with differing CN profiles. Notably, 700 genes exhibited distinct CN profiles between
M. domestica
and
M. sieversii
, with an enrichment in defense response genes. Genes related to fruit ripening, flavor, and anthocyanin biosynthesis had higher copy numbers in
M. domestica
. Additionally, 360 genes showed differential CN profiles between
M. domestica
and
M. sylvestris
, with enrichment in polygalacturonase activity, which may influence differences in fruit flavor. The study also identified 3,000 genes with significant CN differentiation (V
ST
> 0.28) between
M. domestica
rootstock and scion cultivars enriched in lignin metabolic pathways, underscoring their role in stress resistance and mechanical support. Segmental duplications were particularly enriched in genes related to sorbitol metabolism, fruit development, and fruit quality traits, highlighting their evolutionary importance in defining apple morphology and physiology.
Conclusions
These findings offer valuable insights into the evolutionary mechanisms driving apple domestication and adaptation and provide a comprehensive resource for future research and apple breeding.
Journal Article
Technology dictates algorithms: recent developments in read alignment
by
Alkan, Can
,
Xue, Victor
,
Mangul, Serghei
in
Accuracy
,
Algorithms
,
Animal Genetics and Genomics
2021
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today’s diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.
Journal Article
GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies
by
Senol Cali, Damla
,
Alkan, Can
,
Lee, Donghyuk
in
3D-stacked DRAM
,
Algorithms
,
Animal Genetics and Genomics
2018
Background
Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments.
Results
We propose a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM). GRIM-Filter quickly filters seed locations by 1) introducing a new representation of coarse-grained segments of the reference genome, and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for a sequence alignment error tolerance of 0.05, GRIM-Filter 1) reduces the false negative rate of filtering by 5.59x–6.41x, and 2) provides an end-to-end read mapper speedup of 1.81x–3.65x, compared to a state-of-the-art read mapper employing the best previous seed location filtering algorithm.
Conclusion
GRIM-Filter exploits 3D-stacked memory, which enables the efficient use of processing-in-memory, to overcome the memory bandwidth bottleneck in seed location filtering. We show that GRIM-Filter significantly improves the performance of a state-of-the-art read mapper. GRIM-Filter is a universal seed location filter that can be applied to any read mapper. We hope that our results provide inspiration for new works to design other bioinformatics algorithms that take advantage of emerging technologies and new processing paradigms, such as processing-in-memory using 3D-stacked memory devices.
Journal Article
A robust benchmark for detection of germline large deletions and insertions
2020
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.
Detection of structural variants in the human genome is facilitated by a benchmark set of large deletions and insertions.
Journal Article
An integrated map of structural variation in 2,504 human genomes
by
Mills, Ryan E.
,
Cerveira, Eliza
,
Kashin, Seva
in
631/208/212
,
631/208/726/649/2157
,
Algorithms
2015
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
The Structural Variation Analysis Group of The 1000 Genomes Project reports an integrated structural variation map based on discovery and genotyping of eight major structural variation classes in 2,504 unrelated individuals from across 26 populations; structural variation is compared within and between populations and its functional impact is quantified.
Structural variation mapped in over 2,500 human genomes
The Structural Variation Analysis Group of The 1000 Genomes Project reports an integrated structural variation map based on discovery and genotyping of eight major structural variation classes in genomes for 2,504 unrelated individuals from across 26 populations. They characterize structural variation within and between populations and quantify its functional effect. The authors further create a phased reference panel that will be valuable for population genetic and disease association studies.
Journal Article
A High-Coverage Genome Sequence from an Archaic Denisovan Individual
2012
We present a DNA library preparation method that has allowed us to reconstruct a high-coverage (30×) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of \"missing evolution\" in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans.
Journal Article
ECOLE: Learning to call copy number variants on whole exome sequencing data
2024
Copy number variants (CNV) are shown to contribute to the etiology of several genetic disorders. Accurate detection of CNVs on whole exome sequencing (WES) data has been a long sought-after goal for use in clinics. This was not possible despite recent improvements in performance because algorithms mostly suffer from low precision and even lower recall on expert-curated gold standard call sets. Here, we present a deep learning-based somatic and germline CNV caller for WES data, named
ECOLE
. Based on a variant of the transformer architecture, the model learns to call CNVs per exon, using high-confidence calls made on matched WGS samples. We further train and fine-tune the model with a small set of expert calls via transfer learning. We show that ECOLE achieves high performance on human expert labelled data for the first time with 68.7% precision and 49.6% recall. This corresponds to precision and recall improvements of 18.7% and 30.8% over the next best-performing methods, respectively. We also show that the same fine-tuning strategy using tumor samples enables ECOLE to detect RT-qPCR-validated variations in bladder cancer samples without the need for a control sample. ECOLE is available at
https://github.com/ciceklab/ECOLE
.
Copy number variants (CNV) are shown to contribute to the etiology of various genetic disorders. Here, authors present ECOLE, a deep learning-based somatic and germline CNV caller for WES data. Utilising a variant of the transformer architecture, the model is trained to call CNVs per exon.
Journal Article
Accelerating read mapping with FastHASH
by
Alkan, Can
,
Lee, Donghyuk
,
Hormozdiari, Farhad
in
Algorithms
,
Animal Genetics and Genomics
,
Biomedical and Life Sciences
2013
With the introduction of next-generation sequencing (NGS) technologies, we are facing an exponential increase in the amount of genomic sequence data. The success of all medical and genetic applications of next-generation sequencing critically depends on the existence of computational techniques that can process and analyze the enormous amount of sequence data quickly and accurately. Unfortunately, the current read mapping algorithms have difficulties in coping with the massive amounts of data generated by NGS.
We propose a new algorithm, FastHASH, which drastically improves the performance of the seed-and-extend type hash table based read mapping algorithms, while maintaining the high sensitivity and comprehensiveness of such methods. FastHASH is a generic algorithm compatible with all seed-and-extend class read mapping algorithms. It introduces two main techniques, namely
Adjacency Filtering
, and
Cheap K-mer Selection
.
We implemented FastHASH and merged it into the codebase of the popular read mapping program, mrFAST. Depending on the edit distance cutoffs, we observed up to 19-fold speedup while still maintaining 100% sensitivity and high comprehensiveness.
Journal Article
Genetic history of an archaic hominin group from Denisova Cave in Siberia
by
Kircher, Martin
,
Shunkov, Michael V.
,
Viola, Bence
in
631/181/19/27
,
631/208/212/748
,
ADN mitocondrial
2010
Using DNA extracted from a finger bone found in Denisova Cave in southern Siberia, we have sequenced the genome of an archaic hominin to about 1.9-fold coverage. This individual is from a group that shares a common origin with Neanderthals. This population was not involved in the putative gene flow from Neanderthals into Eurasians; however, the data suggest that it contributed 4–6% of its genetic material to the genomes of present-day Melanesians. We designate this hominin population ‘Denisovans’ and suggest that it may have been widespread in Asia during the Late Pleistocene epoch. A tooth found in Denisova Cave carries a mitochondrial genome highly similar to that of the finger bone. This tooth shares no derived morphological features with Neanderthals or modern humans, further indicating that Denisovans have an evolutionary history distinct from Neanderthals and modern humans.
A digital record of an early Siberian
Anatomically modern humans were in Africa from some point after 200,000 years ago and reached Eurasia rather later. Meanwhile, archaic hominins — including the Neanderthals — had been in Eurasia from at least 230,000 years ago and disappear from the fossil record only about 30,000 years ago. The genome of a female archaic hominin from Denisova Cave in southern Siberia has now been sequenced from DNA extracted from a finger bone. The group to which this 'Denisovan' individual belonged shares a common origin with Neanderthals and, although it was not involved in the putative gene flow from Neanderthals into Eurasians, it contributed 4–6% of the genomes of present-day Melanesians. In addition, the morphology of a tooth with a mitochondrial genome very similar to that of the finger bone suggests that these hominins are evolutionarily distinct from both Neanderthals and modern humans.
Using DNA from a finger bone, the genome of an archaic hominin from southern Siberia has been sequenced to about 1.9-fold coverage. The group to which this individual belonged shares a common origin with Neanderthals, and although it was not involved in the putative gene flow from Neanderthals into Eurasians, it contributed 4–6% of its genetic material to the genomes of present-day Melanesians. A tooth whose mitochondrial genome is very similar to that of the finger bone further suggests that these hominins are evolutionarily distinct from Neanderthals and modern humans.
Journal Article