Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
4 result(s) for "Abramovs, Nikita"
Sort by:
GeVIR is a continuous gene-level metric that uses variant distribution patterns to prioritize disease candidate genes
With large-scale population sequencing projects gathering pace, there is a need for strategies that advance disease gene prioritization 1 , 2 . Metrics that provide information about a gene and its ability to tolerate protein-altering variation can aid in clinical interpretation of human genomes and can advance disease gene discovery 1 – 4 . Previous reported methods analyzed the total variant load in a gene 1 – 4 , but did not analyze the distribution pattern of variants within a gene. Using data from 138,632 exome and genome sequences 2 , we developed gene variation intolerance rank (GeVIR), a continuous gene-level metric for 19,361 genes that is able to prioritize both dominant and recessive Mendelian disease genes 5 , that outperforms missense constraint metrics 3 and that is comparable—but complementary—to loss-of-function (LOF) constraint metrics 2 . GeVIR is also able to prioritize short genes, for which LOF constraint cannot be estimated with confidence 2 . The majority of the most intolerant genes identified here have no defined phenotype and are candidates for severe dominant disorders. GeVIR is a continuous gene-level metric that uses variant distribution patterns to prioritize both dominant and recessive Mendelian disease genes. GeVIR outperforms missense constraint metrics and complements loss-of-function constraint metrics.
Developing Metrics for Prioritisation of Candidate Disease Genes Using Genetic Variation Databases
Each human exome contains thousands of protein-altering variants located in more than 19,000 genes. Humans typically have two copies of a gene, and variants that affect one or both gene copies are called heterozygous and homozygous, respectively. If one gene copy is affected by deleterious heterozygous variation and cannot produce normal protein, this could result in a dominant disease. However, some genes can tolerate disruption of one copy, but deleterious homozygous or two heterozygous variants in different copies could still result in a recessive disease. Finally, humans can tolerate the inactivation or deletion of both copies of some genes without developing diseases. Because studied diseases’ inheritance patterns are frequently known (e.g. if one of the parents and a child both have a disease, the inheritance pattern is likely to be dominant), clinical researchers want to know a candidate disease-causing variant inheritance pattern to prioritise candidate disease genes for laboratory validation. Although inheritance pattern is a property of disease causing variants, it can be predicted using gene-level properties. The aim of this study was to develop gene-level computational metrics that can be used for this task, and recently created large variant population databases such as Genome Aggregation Database (gnomAD, >137,000 individual exomes/genomes) provided novel data for such studies.This thesis is written in the journal format and consists of three paper-style result chapters. In the first paper, we analysed deviations from Hardy-Weinberg Equilibrium of rare variants in gnomAD to detect potential disease-causing and heterozygous advantageous variants based on homozygous deficiency in the healthy populations. The second paper developed a gene variation intolerance ranking (GeVIR) system by measuring how unevenly variants in gnomAD were distributed in a gene relative to other genes. Finally, in the third paper, we developed multiple supervised machine learning models based on various gene properties (including GeVIR) and combined them into a single continuous gene ranking metric that can be used to measure gene predisposition to disease inheritance patterns (DIP).In conclusion, this thesis contributed to the understanding of variant population data and the application of supervised ML methods to classify candidate disease genes in the context of disease inheritance patterns. The primary outcome of this research was the development of two continuous gene metrics, GeVIR and DIP (available for 19,361 and 15,794 protein-coding genes, respectively), both of which can be used to distinguish dominant, recessive and non-disease genes. We anticipate that these metrics will aid clinical researchers in the prioritisation of candidate disease genes.
Developing Sequence Analysis Pipelines to Characterise Human Genome Variation
The latest genome sequencing platforms generate large catalogues of genomic variants, with individual genomes containing about four to five million variants. Large general population studies also estimate that individuals carry up to 100 loss of function (LoF) variants with ~20 genes (mostly participating in the immune system) completely inactivated. Deciding which variants are important in disease is a difficult task, and a crucial step in disease candidate gene prioritisation is comparison of variants in affected and healthy individuals. The purpose of this study is to characterise genes based on variant data in large apparently healthy populations, and create datasets which can be integrated into other variant studies, sequence analysis pipelines, or used independently. There are about 18,000-20,000 protein coding genes in humans, all of which are present in two copies (alleles), except for sex chromosome genes. One or both alleles can be affected by deleterious variants and result in dominant or recessive disease respectively. Genes which require both alleles to maintain their functions are called haploinsufficient, but their proportion in all protein coding genes is still unknown. It is also hypothesized that many more genes are nonessential for human survival, and loss of both alleles of these genes can be tolerated.Variant biallelic distribution within the genes was analysed on 2504 individual genomes from the 1000 Genomes Project Phase 3 dataset, and a custom NoSQL database was created from the VCF files. This can be reused in studies which involve whole genome variant analysis at the individual level, as this information is not publicly available in other variant databases. A dataset of 76,254 rare variant pairs, which affected both gene alleles in some individuals, was produced and can be used for candidate gene prioritisation.Overall load of variants within 18,225 genes was analysed on 60,706 exomes from the Exome Aggregation Consortium (ExAC) database, to create a dataset of gene haploinsufficiency scores. The scores were calculated by several models based on supervised machine learning algorithms which were trained and evaluated on known dominant and recessive genes from the Online Mendelian Inheritance in Man (OMIM) database. The scores were called Gene Variant Haploinsufficiency Scores (GVHS), as they were based on six different types of variant statistical data. This approach is different from existing methods used by ExAC or DECIPHER (DatabasE of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources) to calculate gene haploinsufficiency scores. ExAC used an unsupervised learning algorithm and considered only splicing and nonsense variants, whereas DECIPHER used gene biological properties and ignored variant data completely. Evaluation performed in this study showed that, on average, GVHS models performance metrics were similar to ExAC, and both of them had better haploinsufficiency predictions than DECIPHER. However, one of the GVHS models was ~4.5% more precise in detecting haploinsufficient genes and produced more interpretable probabilities, which can be useful for candidate gene prioritisation in disease sequencing studies.
Hardy-Weinberg Equilibrium in the Large Scale Genomic Sequencing Era
Hardy-Weinberg Equilibrium (HWE) is used to estimate the number of homozygous and heterozygous variant carriers based on its allele frequency in populations that are not evolving. Previously, deviation from HWE in large population databases were investigated to detect genotyping errors, which can result in extreme heterozygote excess (HetExc). However, HetExc might also be a sign of natural selection since recessive disease causing variants are expected to occur less frequently in a homozygous state in the general population, but might reach high allele frequency, especially if they are advantageous, in a heterozygote state. We developed a filtering strategy to detect these variants and applied it on genome data from 137,842 individuals. We found that the main limitations of this approach were quality of genotype calls and insufficient population sizes, whereas population structure and high level of inbreeding could reduce sensitivity, but not precision, in certain populations. Nevertheless, we identified 365 HetExc variants in 326 genes, most of which were specific to African/African American populations (~84.7%). Although the majority of them were not associated with known diseases, or were classified as \"benign\", they were enriched in genes associated with autosomal recessive diseases. The resulting dataset also contained two known recessive disease causing variants with evidence of heterozygote advantage in the genes HBB and CFTR. Finally, we provide in silico evidence of a novel heterozygote advantageous variant in the CHD6 gene (involved in influenza virus replication). We anticipate that our approach will allow the detection of rare recessive disease causing variants in the future.