Catalogue Search | MBRL

kegg_pull: a software package for the RESTful access and pulling from the Kyoto Encyclopedia of Gene and Genomes

by Moseley, Hunter N. B. , Huckvale, Erik in Accessibility , Algorithms , Application programming interface

2023

Background The Kyoto Encyclopedia of Genes and Genomes (KEGG) provides organized genomic, biomolecular, and metabolic information and knowledge that is reasonably current and highly useful for a wide range of analyses and modeling. KEGG follows the principles of data stewardship to be findable, accessible, interoperable, and reusable (FAIR) by providing RESTful access to their database entries via their web-accessible KEGG API. However, the overall FAIRness of KEGG is often limited by the library and software package support available in a given programming language. While R library support for KEGG is fairly strong, Python library support has been lacking. Moreover, there is no software that provides extensive command line level support for KEGG access and utilization. Results We present kegg_pull, a package implemented in the Python programming language that provides better KEGG access and utilization functionality than previous libraries and software packages. Not only does kegg_pull include an application programming interface (API) for Python programming, it also provides a command line interface (CLI) that enables utilization of KEGG for a wide range of shell scripting and data analysis pipeline use-cases. As kegg_pull’s name implies, both the API and CLI provide versatile options for pulling (downloading and saving) an arbitrary (user defined) number of database entries from the KEGG API. Moreover, this functionality is implemented to efficiently utilize multiple central processing unit cores as demonstrated in several performance tests. Many options are provided to optimize fault-tolerant performance across a single or multiple processes, with recommendations provided based on extensive testing and practical network considerations. Conclusions The new kegg_pull package enables new flexible KEGG retrieval use cases not available in previous software packages. The most notable new feature that kegg_pull provides is its ability to robustly pull an arbitrary number of KEGG entries with a single API method or CLI command, including pulling an entire KEGG database. We provide recommendations to users for the most effective use of kegg_pull according to their network and computational circumstances.

Journal Article

Share this book

Add to My Shelf

TreeViewer: Flexible, modular software to visualise and manipulate phylogenetic trees

by Bianchini, Giorgio , Sánchez‐Baracaldo, Patricia in Accessibility , Customization , Design

2024

Phylogenetic trees illustrate evolutionary relationships between taxa or genes. Tree figures are crucial when presenting results and data, and by creating clear and effective plots, researchers can describe many kinds of evolutionary patterns. However, producing tree plots can be a time‐consuming task, especially as multiple different programs are often needed to adjust and illustrate all data associated with a tree. We present TreeViewer, a new software to draw phylogenetic trees. TreeViewer is flexible, modular, and user‐friendly. Plots are produced as the result of a user‐defined pipeline, which can be finely customised and easily applied to different trees. Every feature of the program is documented and easily accessible, either in the online manual or within the program's interface. We show how TreeViewer can be used to produce publication‐ready figures, saving time by not requiring additional graphical post‐processing tools. TreeViewer is freely available for Windows, macOS, and Linux operating systems and distributed under an AGPLv3 licence from https://treeviewer.org. It has a graphical user interface (GUI), as well as a command‐line interface, which is useful to work with very large trees and for automated pipelines. A detailed user manual with examples and tutorials is also available. TreeViewer is mainly aimed at users wishing to produce highly customised, publication‐quality tree figures using a single GUI software tool. Compared to other GUI tools, TreeViewer offers a richer feature set and a finer degree of customisation. Compared to command‐line‐based tools and software libraries, TreeViewer's graphical interface is more accessible. The flexibility of TreeViewer's approach to phylogenetic tree plotting enables the program to produce a wide variety of publication‐ready figures. Users are encouraged to create their own custom modules to expand the functionalities of the program. This sets the scene for an ever‐expanding and ever‐adapting software framework that can easily adjust to respond to new challenges. TreeViewer is a new software to draw phylogenetic trees that is flexible, modular, and user‐friendly. Plots are produced as the result of a user‐defined pipeline, which can be finely customised and easily applied to different trees. TreeViewer is mainly aimed at users wishing to produce highly customised, publication‐quality tree figures using a single GUI software tool and is released under an AGPLv3 licence for Windows, Linux, and macOS operating systems.

Journal Article

Share this book

Add to My Shelf

Cooltools: Enabling high-resolution Hi-C analysis in Python

by Galitsyna, Aleksandra A. , Oksuz, Betul A. , Imakaev, Maxim in Application programming interface , Biology and Life Sciences , Cell cycle

2024

Chromosome conformation capture (3C) technologies reveal the incredible complexity of genome organization. Maps of increasing size, depth, and resolution are now used to probe genome architecture across cell states, types, and organisms. Larger datasets add challenges at each step of computational analysis, from storage and memory constraints to researchers’ time; however, analysis tools that meet these increased resource demands have not kept pace. Furthermore, existing tools offer limited support for customizing analysis for specific use cases or new biology. Here we introduce cooltools ( https://github.com/open2c/cooltools ), a suite of computational tools that enables flexible, scalable, and reproducible analysis of high-resolution contact frequency data. Cooltools leverages the widely-adopted cooler format which handles storage and access for high-resolution datasets. Cooltools provides a paired command line interface (CLI) and Python application programming interface (API), which respectively facilitate workflows on high-performance computing clusters and in interactive analysis environments. In short, cooltools enables the effective use of the latest and largest genome folding datasets.

Journal Article

Share this book

Add to My Shelf

TEQC: The Multi-Purpose Toolkit for GPS/GLONASS Data

by Estey, Louis H. , Meertens, Charles M. in Format , Freeware , GLONASS

1999

For many common GPS/GLONASS native receiver formats, a single freeware program called TEQC now allows the user to translate from the binary receiver format to the standard Receiver Independent Exchange (RINEX) format, to edit existing RINEX files, and to quality-check the data before postprocessing. TEQC is 100% noninteractive and has a command line interface modeled after common UNIX commands. This combined with TEQC's extensive documentation makes it simple to use for new and experienced users and in automated processing scripts. © 1999 John Wiley & Sons, Inc.

Journal Article

Share this book

Add to My Shelf

Easy and accurate protein structure prediction using ColabFold

by Moriwaki, Yoshitaka , Kim, Gyuri , Ovchinnikov, Sergey in 631/114/2411 , 631/114/794 , 631/1647/48

2025

Since its public release in 2021, AlphaFold2 (AF2) has made investigating biological questions, by using predicted protein structures of single monomers or full complexes, a common practice. ColabFold-AF2 is an open-source Jupyter Notebook inside Google Colaboratory and a command-line tool that makes it easy to use AF2 while exposing its advanced options. ColabFold-AF2 shortens turnaround times of experiments because of its optimized usage of AF2’s models. In this protocol, we guide the reader through ColabFold best practices by using three scenarios: (i) monomer prediction, (ii) complex prediction and (iii) conformation sampling. The first two scenarios cover classic static structure prediction and are demonstrated on the human glycosylphosphatidylinositol transamidase protein. The third scenario demonstrates an alternative use case of the AF2 models by predicting two conformations of the human alanine serine transporter 2. Users can run the protocol without computational expertise via Google Colaboratory or in a command-line environment for advanced users. Using Google Colaboratory, it takes <2 h to run each procedure. The data and code for this protocol are available at https://protocol.colabfold.com . Key points We present an outline of how to use ColabFold to perform structure prediction of monomers, complexes and alternative conformations and guidance on interpreting the results through appropriate confidence metrics and visualizations. Integrating MMseqs2’s quick homology search, ColabFold enables accelerated structure prediction compared with AlphaFold2 at similar accuracy, while exposing many advanced parameters. ColabFold can be accessed through a Google Colaboratory notebook for beginners and a command-line interface for advanced users. We describe the use of ColabFold to perform structure prediction of monomers, complexes and alternative conformations, either on the web or locally, and provide guidance on interpreting the results through confidence metrics and visualizations.

Journal Article

Share this book

Add to My Shelf

Pairtools: From sequencing data to chromosome contacts

by Galitsyna, Aleksandra A. , Imakaev, Maxim , Flyamer, Ilya M. in Analysis , Benchmarks , Biology and Life Sciences

2024

The field of 3D genome organization produces large amounts of sequencing data from Hi-C and a rapidly-expanding set of other chromosome conformation protocols (3C+). Massive and heterogeneous 3C+ data require high-performance and flexible processing of sequenced reads into contact pairs. To meet these challenges, we present pairtools –a flexible suite of tools for contact extraction from sequencing data. Pairtools provides modular command-line interface (CLI) tools that can be flexibly chained into data processing pipelines. The core operations provided by pairtools are parsing of.sam alignments into Hi-C pairs, sorting and removal of PCR duplicates. In addition, pairtools provides auxiliary tools for building feature-rich 3C+ pipelines, including contact pair manipulation, filtration, and quality control. Benchmarking pairtools against popular 3C+ data pipelines shows advantages of pairtools for high-performance and flexible 3C+ analysis. Finally, pairtools provides protocol-specific tools for restriction-based protocols, haplotype-resolved contacts, and single-cell Hi-C. The combination of CLI tools and tight integration with Python data analysis libraries makes pairtools a versatile foundation for a broad range of 3C+ pipelines.

Journal Article

Share this book

Add to My Shelf

Easy353: A Tool to Get Angiosperms353 Genes for Phylogenomic Research

by Zhou, Wenbin , Yu, Yan , Xie, Pulin in Assembly , Computers , Gene sequencing

2022

Abstract The Angiosperms353 gene set (AGS) consists of a set of 353 universal low-copy nuclear genes that were selected by examining more than 600 angiosperm species. These genes can be used for phylogenetic studies and population genetics at multiple taxonomic scales. However, current pipelines are not able to recover Angiosperms353 genes efficiently and accurately from high-throughput sequences. Here, we developed Easy353, a reference-guided assembly tool to recover the AGS from high-throughput sequencing (HTS) data (including genome skimming, RNA-seq, and target enrichment). Easy353 is an open-source user-friendly assembler for diverse types of high-throughput data. It has a graphical user interface and a command-line interface that is compatible with all widely-used computer systems. Evaluations, based on both simulated and empirical data, suggest that Easy353 yields low rates of assembly errors.

Journal Article

Share this book

Add to My Shelf

CircPrimer 2.0: a software for annotating circRNAs and predicting translation potential of circRNAs

by Feng, Jifeng , Zhong, Shanliang in Accuracy , Algorithms , Applications software

2022

Background Some circular RNAs (circRNAs) can be translated into functional peptides by small open reading frames (ORFs) in a cap-independent manner. Internal ribosomal entry site (IRES) and N 6 -methyladenosine (m 6 A) were reported to drive translation of circRNAs. Experimental methods confirming the presence of IRES and m 6 A site are time consuming and labor intensive. Lacking computational tools to predict ORFs, IRESs and m 6 A sites for circRNAs makes it harder. Results In this report, we present circPrimer 2.0, a Java based software for annotating circRNAs and predicting ORFs, IRESs, and m6A sites of circRNAs. circPrimer 2.0 has a graphical and a command-line interface that enables the tool to be embed into an analysis pipeline. Conclusions circprimer 2.0 is an easy-to-use software for annotating circRNAs and predicting translation potential of circRNAs, and freely available at www.bio-inf.cn .

Journal Article

Share this book

Add to My Shelf

Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

by Makunin, Igor , Afgan, Enis , Gladman, Simon in Animals , Best practice , Bioinformatics

2015

Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.

Journal Article

Share this book

Add to My Shelf

Neurodesk: an accessible, flexible and portable data analysis environment for reproducible neuroimaging

by Renton, Angela I. , White, David J. , Ribeiro, Fernanda L. in 631/378 , 706/648/496 , 706/703/559

2024

Neuroimaging research requires purpose-built analysis software, which is challenging to install and may produce different results across computing environments. The community-oriented, open-source Neurodesk platform ( https://www.neurodesk.org/ ) harnesses a comprehensive and growing suite of neuroimaging software containers. Neurodesk includes a browser-accessible virtual desktop, command-line interface and computational notebook compatibility, allowing for accessible, flexible, portable and fully reproducible neuroimaging analysis on personal workstations, high-performance computers and the cloud. Neurodesk is a platform for analyzing human neuroimaging data, which provides numerous tools in a containerized form, thereby ensuring reproducibility and portability.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter