Catalogue Search | MBRL

PhANNs, a fast and accurate tool and web server to classify phage structural proteins

by Salamon, Peter , Segall, Anca M. , Salamon, David in Bacteriophages - metabolism , Biology and Life Sciences , Computer and Information Sciences

2020

For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50–90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F 1 -score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as “other,” providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.

Journal Article

Share this book

Add to My Shelf

Coxsackievirus B Exits the Host Cell in Shed Microvesicles Displaying Autophagosomal Markers

by Segall, Anca M. , Tsueng, Ginger , Mangale, Vrushali in Animals , Biology and Life Sciences , Cell-Derived Microparticles - genetics

2014

Coxsackievirus B3 (CVB3), a member of the picornavirus family and enterovirus genus, causes viral myocarditis, aseptic meningitis, and pancreatitis in humans. We genetically engineered a unique molecular marker, \"fluorescent timer\" protein, within our infectious CVB3 clone and isolated a high-titer recombinant viral stock (Timer-CVB3) following transfection in HeLa cells. \"Fluorescent timer\" protein undergoes slow conversion of fluorescence from green to red over time, and Timer-CVB3 can be utilized to track virus infection and dissemination in real time. Upon infection with Timer-CVB3, HeLa cells, neural progenitor and stem cells (NPSCs), and C2C12 myoblast cells slowly changed fluorescence from green to red over 72 hours as determined by fluorescence microscopy or flow cytometric analysis. The conversion of \"fluorescent timer\" protein in HeLa cells infected with Timer-CVB3 could be interrupted by fixation, suggesting that the fluorophore was stabilized by formaldehyde cross-linking reactions. Induction of a type I interferon response or ribavirin treatment reduced the progression of cell-to-cell virus spread in HeLa cells or NPSCs infected with Timer-CVB3. Time lapse photography of partially differentiated NPSCs infected with Timer-CVB3 revealed substantial intracellular membrane remodeling and the assembly of discrete virus replication organelles which changed fluorescence color in an asynchronous fashion within the cell. \"Fluorescent timer\" protein colocalized closely with viral 3A protein within virus replication organelles. Intriguingly, infection of partially differentiated NPSCs or C2C12 myoblast cells induced the release of abundant extracellular microvesicles (EMVs) containing matured \"fluorescent timer\" protein and infectious virus representing a novel route of virus dissemination. CVB3 virions were readily observed within purified EMVs by transmission electron microscopy, and infectious virus was identified within low-density isopycnic iodixanol gradient fractions consistent with membrane association. The preferential detection of the lipidated form of LC3 protein (LC3 II) in released EMVs harboring infectious virus suggests that the autophagy pathway plays a crucial role in microvesicle shedding and virus release, similar to a process previously described as autophagosome-mediated exit without lysis (AWOL) observed during poliovirus replication. Through the use of this novel recombinant virus which provides more dynamic information from static fluorescent images, we hope to gain a better understanding of CVB3 tropism, intracellular membrane reorganization, and virus-associated microvesicle dissemination within the host.

Journal Article

Share this book

Add to My Shelf

PRFect: a tool to predict programmed ribosomal frameshifts in prokaryotic and viral genomes

by McNair, Katelyn , Salamon, Peter , Segall, Anca M. in Algorithms , Amino Acids , Anopheles

2024

Background One of the stranger phenomena that can occur during gene translation is where, as a ribosome reads along the mRNA, various cellular and molecular properties contribute to stalling the ribosome on a slippery sequence and shifting the ribosome into one of the other two alternate reading frames. The alternate frame has different codons, so different amino acids are added to the peptide chain. More importantly, the original stop codon is no longer in-frame, so the ribosome can bypass the stop codon and continue to translate the codons past it. This produces a longer version of the protein, a fusion of the original in-frame amino acids, followed by all the alternate frame amino acids. There is currently no automated software to predict the occurrence of these programmed ribosomal frameshifts (PRF), and they are currently only identified by manual curation. Results Here we present PRFect, an innovative machine-learning method for the detection and prediction of PRFs in coding genes of various types. PRFect combines advanced machine learning techniques with the integration of multiple complex cellular properties, such as secondary structure, codon usage, ribosomal binding site interference, direction, and slippery site motif. Calculating and incorporating these diverse properties posed significant challenges, but through extensive research and development, we have achieved a user-friendly approach. The code for PRFect is freely available, open-source, and can be easily installed via a single command in the terminal. Our comprehensive evaluations on diverse organisms, including bacteria, archaea, and phages, demonstrate PRFect’s strong performance, achieving high sensitivity, specificity, and an accuracy exceeding 90%. The code for PRFect is freely available and installs with a single terminal command. Conclusion PRFect represents a significant advancement in the field of PRF detection and prediction, offering a powerful tool for researchers and scientists to unravel the intricacies of programmed ribosomal frameshifting in coding genes.

Journal Article

Share this book

Add to My Shelf

Discovery of several thousand highly diverse circular DNA viruses

by Whited, Jessica L , Tisza, Michael J , Buck, Christopher B in Analysis , Animals , Artificial neural networks

2020

Although millions of distinct virus species likely exist, only approximately 9000 are catalogued in GenBank's RefSeq database. We selectively enriched for the genomes of circular DNA viruses in over 70 animal samples, ranging from nematodes to human tissue specimens. A bioinformatics pipeline, Cenote-Taker, was developed to automatically annotate over 2500 complete genomes in a GenBank-compliant format. The new genomes belong to dozens of established and emerging viral families. Some appear to be the result of previously undescribed recombination events between ssDNA and ssRNA viruses. In addition, hundreds of circular DNA elements that do not encode any discernable similarities to previously characterized sequences were identified. To characterize these ‘dark matter’ sequences, we used an artificial neural network to identify candidate viral capsid proteins, several of which formed virus-like particles when expressed in culture. These data further the understanding of viral sequence diversity and allow for high throughput documentation of the virosphere. When scientists hunt for new DNA sequences, sometimes they get a lot more than they bargained for. Such is the case in metagenomic surveys, which analyze not just DNA of a particular organism, but all the DNA in an environment at large. A vexing problem with these surveys is the overwhelming number of DNA sequences detected that are so different from any known microbe that they cannot be classified using traditional approaches. However, some of these “known unknowns” are undoubtedly viral sequences, because only a fraction of the enormous diversity of viruses has been characterized. This “viral dark matter” is a major obstacle for those studying viruses. This led Tisza et al. to attempt to classify some of the unknown viral sequences in their metagenomic surveys. The search, which specifically focused on viruses with circular DNA genomes, detected over 2,500 circular viral genomes. Intensive analysis revealed that many of these genomes had similar makeup to previously discovered viruses, but hundreds of them were totally different from any known virus, based on typical methods of comparison. Computational analysis of genes that were conserved among some of these brand-new circular sequences often revealed virus-like features. Experiments on a few of these genes showed that they encoded proteins capable of forming particles reminiscent of characteristic viral shells, implying that these new sequences are indeed viruses. Tisza et al. have added the 2,500 newly characterized viral sequences to the publicly accessible GenBank database, and the sequences are being considered for the more authoritative RefSeq database, which currently contains around 9,000 complete viral genomes. The expanded databases will hopefully now better equip scientists to explore the enormous diversity of viruses and help medics and veterinarians to detect disease-causing viruses in humans and other animals.

Journal Article

Share this book

Add to My Shelf

Genomic Analysis of Uncultured Marine Viral Communities

by Salamon, Peter , Segall, Anca M. , Mead, David in Aquatic life , Bacteriophages , Bacteriophages - classification

2002

Viruses are the most common biological entities in the oceans by an order of magnitude. However, very little is known about their diversity. Here we report a genomic analysis of two uncultured marine viral communities. Over 65% of the sequences were not significantly similar to previously reported sequences, suggesting that much of the diversity is previously uncharacterized. The most common significant hits among the known sequences were to viruses. The viral hits included sequences from all of the major families of dsDNA tailed phages, as well as some algal viruses. Several independent mathematical models based on the observed number of contigs predicted that the most abundant viral genome comprised 2-3% of the total population in both communities, which was estimated to contain between 374 and 7,114 viral types. Overall, diversity of the viral communities was extremely high. The results also showed that it would be possible to sequence the entire genome of an uncultured marine viral community.

Journal Article

Share this book

Add to My Shelf

Artificial Neural Networks Trained to Detect Viral and Phage Structural Proteins

by Burgin, Alex B. , Salamon, Peter , Segall, Anca M. in Amino acids , Bacteriology , Bacteriophages

2012

Phages play critical roles in the survival and pathogenicity of their hosts, via lysogenic conversion factors, and in nutrient redistribution, via cell lysis. Analyses of phage- and viral-encoded genes in environmental samples provide insights into the physiological impact of viruses on microbial communities and human health. However, phage ORFs are extremely diverse of which over 70% of them are dissimilar to any genes with annotated functions in GenBank. Better identification of viruses would also aid in better detection and diagnosis of disease, in vaccine development, and generally in better understanding the physiological potential of any environment. In contrast to enzymes, viral structural protein function can be much more challenging to detect from sequence data because of low sequence conservation, few known conserved catalytic sites or sequence domains, and relatively limited experimental data. We have designed a method of predicting phage structural protein sequences that uses Artificial Neural Networks (ANNs). First, we trained ANNs to classify viral structural proteins using amino acid frequency; these correctly classify a large fraction of test cases with a high degree of specificity and sensitivity. Subsequently, we added estimates of protein isoelectric points as a feature to ANNs that classify specialized families of proteins, namely major capsid and tail proteins. As expected, these more specialized ANNs are more accurate than the structural ANNs. To experimentally validate the ANN predictions, several ORFs with no significant similarities to known sequences that are ANN-predicted structural proteins were examined by transmission electron microscopy. Some of these self-assembled into structures strongly resembling virion structures. Thus, our ANNs are new tools for identifying phage and potential prophage structural proteins that are difficult or impossible to detect by other bioinformatic analysis. The networks will be valuable when sequence is available but in vitro propagation of the phage may not be practical or possible.

Journal Article

Share this book

Add to My Shelf

Compounding Achromobacter Phages for Therapeutic Applications

by Villela, Helena , Octavio, Jessica Claire , McNair, Katelyn in Achromobacter , Achromobacter - genetics , Achromobacter denitrificans - genetics

2023

Achromobacter species colonization of Cystic Fibrosis respiratory airways is an increasing concern. Two adult patients with Cystic Fibrosis colonized by Achromobacter xylosoxidans CF418 or Achromobacter ruhlandii CF116 experienced fatal exacerbations. Achromobacter spp. are naturally resistant to several antibiotics. Therefore, phages could be valuable as therapeutics for the control of Achromobacter. In this study, thirteen lytic phages were isolated and characterized at the morphological and genomic levels for potential future use in phage therapy. They are presented here as the Achromobacter Kumeyaay phage collection. Six distinct Achromobacter phage genome clusters were identified based on a comprehensive phylogenetic analysis of the Kumeyaay collection as well as the publicly available Achromobacter phages. The infectivity of all phages in the Kumeyaay collection was tested in 23 Achromobacter clinical isolates; 78% of these isolates were lysed by at least one phage. A cryptic prophage was induced in Achromobacter xylosoxidans CF418 when infected with some of the lytic phages. This prophage genome was characterized and is presented as Achromobacter phage CF418-P1. Prophage induction during lytic phage preparation for therapy interventions require further exploration. Large-scale production of phages and removal of endotoxins using an octanol-based procedure resulted in a phage concentrate of 1 × 109 plaque-forming units per milliliter with an endotoxin concentration of 65 endotoxin units per milliliter, which is below the Food and Drugs Administration recommended maximum threshold for human administration. This study provides a comprehensive framework for the isolation, bioinformatic characterization, and safe production of phages to kill Achromobacter spp. in order to potentially manage Cystic Fibrosis (CF) pulmonary infections.

Journal Article

Share this book

Add to My Shelf

In retrospect: A century of phage lessons

by Segall, Anca M , Rohwer, Forest in Bacteriophages - genetics , Bacteriophages - immunology , Bacteriophages - pathogenicity

2015

Journal Article

Share this book

Add to My Shelf

Classification Confidence in Exploratory Learning: A User’s Guide

by Salamon, Peter , Segall, Anca M. , Perry, Tyler in Accuracy , Algorithms , Bioinformatics

2023

This paper investigates the post-hoc calibration of confidence for “exploratory” machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the “one-versus-all” approach (top-label calibration) must be used rather than the “calibrate-the-full-response-matrix” approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation using only the test set and the final model. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs) as well as the classic MNIST benchmark. Finally, our analysis argues that post-hoc calibration should always be performed, may be performed using only the test dataset, and should be sanity-checked visually.

Journal Article

Share this book

Add to My Shelf

Tumor Cell Death Mediated by Peptides That Recognize Branched Intermediates of DNA Replication and Repair

by Su, Leo Y. , Segall, Anca M. , Dey, Mamon in Accumulation , Amino acids , Antineoplastic Agents - pharmacology

2013

Effective treatments for cancer are still needed, both for cancers that do not respond well to current therapeutics and for cancers that become resistant to available treatments. Herein we investigated the effect of a structure-selective d-amino acid peptide wrwycr that binds replication fork mimics and Holliday Junction (HJs) intermediates of homologous recombination (HR) in vitro, and inhibits their resolution by HJ-processing enzymes. We predicted that treating cells with HJ-binding compounds would lead to accumulation of DNA damage. As cells repair endogenous or exogenous DNA damage, collapsed replication forks and HJ intermediates will accumulate and serve as targets for the HJ-binding peptides. Inhibiting junction resolution will lead to further accumulation of DNA breaks, eventually resulting in amplification of the damage and causing cell death. Both peptide wrwycr and the related wrwyrggrywrw entered cancer cells and reduced cell survival in a dose- and time-dependent manner. Early markers for DNA damage, γH2AX foci and 53BP1 foci, increased with dose and/or time exposure to the peptides. DNA breaks persisted at least 48 h, and both checkpoint proteins Chk1 and Chk2 were activated. The passage of the cells from S to G2/M was blocked even after 72 h. Apoptosis, however, was not induced in either HeLa or PC3 cells. Based on colony-forming assays, about 35% peptide-induced cytotoxicity was irreversible. Finally, sublethal doses of peptide wrwycr (50-100 µM) in conjunction with sublethal doses of several DNA damaging agents (etoposide, doxorubicin, and HU) reduced cell survival at least additively and sometimes synergistically. Taken together, the results suggest that the peptides merit further investigation as proof-of-principle molecules for a new class of anti-cancer therapeutics, in particular in combination with other DNA damaging therapies.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter