Catalogue Search | MBRL

PyToxo: a Python tool for calculating penetrance tables of high-order epistasis models

by Martín, María J. , González-Domínguez, Jorge , Ponte-Fernández, Christian in Algorithms , Batch processing , Bioinformatics

2022

Background Epistasis is the interaction between different genes when expressing a certain phenotype. If epistasis involves more than two loci it is called high-order epistasis. High-order epistasis is an area under active research because it could be the cause of many complex traits. The most common way to specify an epistasis interaction is through a penetrance table. Results This paper presents PyToxo, a Python tool for generating penetrance tables from any-order epistasis models. Unlike other tools available in the bibliography, PyToxo is able to work with high-order models and realistic penetrance and heritability values, achieving high-precision results in a short time. In addition, PyToxo is distributed as open-source software and includes several interfaces to ease its use. Conclusions PyToxo provides the scientific community with a useful tool to evaluate algorithms and methods that can detect high-order epistasis to continue advancing in the discovery of the causes behind complex diseases.

Journal Article

Share this book

Add to My Shelf

Considerations in the search for epistasis

by Browning, Brian L. , Byrne, Ross P. , Alhathli, Elham in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2024

Epistasis refers to changes in the effect on phenotype of a unit of genetic information, such as a single nucleotide polymorphism or a gene, dependent on the context of other genetic units. Such interactions are both biologically plausible and good candidates to explain observations which are not fully explained by an additive heritability model. However, the search for epistasis has so far largely failed to recover this missing heritability. We identify key challenges and propose that future works need to leverage idealized systems, known biology and even previously identified epistatic interactions, in order to guide the search for new interactions.

Journal Article

Share this book

Add to My Shelf

Toxo: a library for calculating penetrance tables of high-order epistasis models

by Martín, María J. , González-Domínguez, Jorge , Ponte-Fernández, Christian in Algorithms , Analysis and modelling of complex systems , Bioinformatics

2020

Background Epistasis is defined as the interaction between different genes when expressing a specific phenotype. The most common way to characterize an epistatic relationship is using a penetrance table, which contains the probability of expressing the phenotype under study given a particular allele combination. Available simulators can only create penetrance tables for well-known epistasis models involving a small number of genes and under a large number of limitations. Results Toxo is a MATLAB library designed to calculate penetrance tables of epistasis models of any interaction order which resemble real data more closely. The user specifies the desired heritability (or prevalence) and the program maximizes the table’s prevalence (or heritability) according to the input epistatic model boundaries. Conclusions Toxo extends the capabilities of existing simulators that define epistasis using penetrance tables. These tables can be directly used as input for software simulators such as GAMETES so that they are able to generate data samples with larger interactions and more realistic prevalences/heritabilities.

Journal Article

Share this book

Add to My Shelf

HSRA: Hadoop-based spliced read aligner for RNA sequencing data

by Touriño, Juan , González-Domínguez, Jorge , Expósito, Roberto R. in Alignment , Big Data , Bioinformatics

2018

Nowadays, the analysis of transcriptome sequencing (RNA-seq) data has become the standard method for quantifying the levels of gene expression. In RNA-seq experiments, the mapping of short reads to a reference genome or transcriptome is considered a crucial step that remains as one of the most time-consuming. With the steady development of Next Generation Sequencing (NGS) technologies, unprecedented amounts of genomic data introduce significant challenges in terms of storage, processing and downstream analysis. As cost and throughput continue to improve, there is a growing need for new software solutions that minimize the impact of increasing data volume on RNA read alignment. In this work we introduce HSRA, a Big Data tool that takes advantage of the MapReduce programming model to extend the multithreading capabilities of a state-of-the-art spliced read aligner for RNA-seq data (HISAT2) to distributed memory systems such as multi-core clusters or cloud platforms. HSRA has been built upon the Hadoop MapReduce framework and supports both single- and paired-end reads from FASTQ/FASTA datasets, providing output alignments in SAM format. The design of HSRA has been carefully optimized to avoid the main limitations and major causes of inefficiency found in previous Big Data mapping tools, which cannot fully exploit the raw performance of the underlying aligner. On a 16-node multi-core cluster, HSRA is on average 2.3 times faster than previous Hadoop-based tools. Source code in Java as well as a user's guide are publicly available for download at http://hsra.dec.udc.es.

Journal Article

Share this book

Add to My Shelf

ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems

by González-Domínguez, Jorge , Expósito, Roberto R. in Algorithms , Big Data , Bioinformatics

2018

Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/.

Journal Article

Share this book

Add to My Shelf

Speed and accuracy improvement of higher-order epistasis detection on CUDA-enabled GPUs

by Domínguez, Jorge González , Hundt, Christian , Jünger, Daniel in Accuracy , Algorithms , Candidates

2017

The discovery of higher-order epistatic interactions is an important task in the field of genome wide association studies which allows for the identification of complex interaction patterns between multiple genetic markers. Some existing bruteforce approaches explore the whole space of k -interactions in an exhaustive manner resulting in almost intractable execution times. Computational cost can be reduced drastically by restricting the search space with suitable preprocessing filters which prune unpromising candidates. Other approaches mitigate the execution time by employing massively parallel accelerators in order to benefit from the vast computational resources of these architectures. In this paper, we combine a novel preprocessing filter, namely SingleMI, with massively parallel computation on modern GPUs to further accelerate epistasis discovery. Our implementation improves both the runtime and accuracy when compared to a previous GPU counterpart that employs mutual information clustering for prefiltering. SingleMI is open source software and publicly available at: https://github.com/sleeepyjack/singlemi/ .

Journal Article

Share this book

Add to My Shelf

PARamrfinder: detecting allele-specific DNA methylation on multicore clusters

by Martín, María J. , González-Domínguez, Jorge , Fernández-Fraga, Alejandro in Algorithms , Clusters , Compilers

2024

The discovery of Allele-Specific Methylation (ASM) is an important research field in biology as it regulates genomic imprinting, which has been identified as the cause of some genetic diseases. Nevertheless, the high computational cost of the bioinformatic tools developed for this purpose prevents their application to large-scale datasets. Hence, much faster tools are required to further progress in this research field. In this work we present PARamrfinder , a parallel tool that applies a statistical model to identify ASM in data from high-throughput short-read bisulfite sequencing. It is based on the state-of-the-art sequential tool amrfinder , which is able to detect ASM at regional level from Bisulfite Sequencing (BS-Seq) experiments in the absence of Single Nucleotide Polymorphism information. PARamrfinder provides the same Allelically Methylated Regions as amrfinder but at significantly reduced runtime thanks to exploiting the compute capabilities of common multicore CPU clusters and MPI RMA operations to attain an efficient dynamic workload balance. As an example, our tool is up to 567 times faster for real data experiments on a cluster with 8 nodes, each one containing two 16-core processors. The source code of PARamrfinder, as well as a reference manual, is available at https://github.com/UDC-GAC/PARamrfinder .

Journal Article

Share this book

Add to My Shelf

Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

by González-Domínguez, Jorge , Schmidt, Bertil , Liu, Yongchao in Advantages , Algorithms , Alignment

2016

The growth of next-generation sequencing (NGS) datasets poses a challenge to the alignment of reads to reference genomes in terms of alignment quality and execution speed. Some available aligners have been shown to obtain high quality mappings at the expense of long execution times. Finding fast yet accurate software solutions is of high importance to research, since availability and size of NGS datasets continue to increase. In this work we present an efficient parallelization approach for NGS short-read alignment on multi-core clusters. Our approach takes advantage of a distributed shared memory programming model based on the new UPC++ language. Experimental results using the CUSHAW3 aligner show that our implementation based on dynamic scheduling obtains good scalability on multi-core clusters. Through our evaluation, we are able to complete the single-end and paired-end alignments of 246 million reads of length 150 base-pairs in 11.54 and 16.64 minutes, respectively, using 32 nodes with four AMD Opteron 6272 16-core CPUs per node. In contrast, the multi-threaded original tool needs 2.77 and 5.54 hours to perform the same alignments on the 64 cores of one node. The source code of our parallel implementation is publicly available at the CUSHAW3 homepage (http://cushaw3.sourceforge.net).

Journal Article

Share this book

Add to My Shelf

ScalaParBiBit: scaling the binary biclustering in distributed-memory systems

by Fraguela, Basilio B. , González-Domínguez, Jorge , Andrade, Diego in Algorithms , Applications programs , Binary data

2021

Biclustering is a data mining technique that allows us to find groups of rows and columns that are highly correlated in a 2D dataset. Although there exist several software applications to perform biclustering, most of them suffer from a high computational complexity which prevents their use in large datasets. In this work we present ScalaParBiBit , a parallel tool to find biclusters on binary data, quite common in many research fields such as text mining, marketing or bioinformatics. ScalaParBiBit takes advantage of the special characteristics of these binary datasets, as well as of an efficient parallel implementation and algorithm, to accelerate the biclustering procedure in distributed-memory systems. The experimental evaluation proves that our tool is significantly faster and more scalable that the state-of-the-art tool ParBiBit in a cluster with 32 nodes and 768 cores. Our tool together with its reference manual are freely available at https://github.com/fraguela/ScalaParBiBit .

Journal Article

Share this book

Add to My Shelf

Author Correction: Considerations in the search for epistasis

by Browning, Brian L. , Byrne, Ross P. , Alhathli, Elham in Animal Genetics and Genomics , Author , Author Correction

2025

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter