Catalogue Search | MBRL

Text mining of CHO bioprocess bibliome: Topic modeling and document classification

by Wang, Qinghua , Olshin, Jonathan , Wu, Cathy H. in Accuracy , Analysis , Animals

2023

Chinese hamster ovary (CHO) cells are widely used for mass production of therapeutic proteins in the pharmaceutical industry. With the growing need in optimizing the performance of producer CHO cell lines, research on CHO cell line development and bioprocess continues to increase in recent decades. Bibliographic mapping and classification of relevant research studies will be essential for identifying research gaps and trends in literature. To qualitatively and quantitatively understand the CHO literature, we have conducted topic modeling using a CHO bioprocess bibliome manually compiled in 2016, and compared the topics uncovered by the Latent Dirichlet Allocation (LDA) models with the human labels of the CHO bibliome. The results show a significant overlap between the manually selected categories and computationally generated topics, and reveal the machine-generated topic-specific characteristics. To identify relevant CHO bioprocessing papers from new scientific literature, we have developed supervized models using Logistic Regression to identify specific article topics and evaluated the results using three CHO bibliome datasets, Bioprocessing set, Glycosylation set, and Phenotype set. The use of top terms as features supports the explainability of document classification results to yield insights on new CHO bioprocessing papers.

Journal Article

Share this book

Add to My Shelf

Leaders, States, and Reputations

by Wolford, Scott , Wu, Cathy X. in Ambiguity , Bias , Conflict resolution

2018

Reputational incentives are ubiquitous explanations for war, yet consistent evidence of their effects is elusive for two reasons. First, most work searches for the payment of reputational costs, yet strategic censoring systematically biases observational data against revealing them. Second, the locus of reputation is often ambiguous, yet the choice of leader or state as unit of observation has inferential consequences. Our research design (a) focuses on observable implications of reputational theories in appropriate samples and (b) considers two competing sources of reputational incentives: changes in national leaders and in political institutions. Consistent with our expectations, leadership turnover and regime change are each associated with initially high probabilities that militarized disputes escalate to the use of force before declining over time in the presence of a reasonable expectation of future disputes. Reputations are in evidence, but analysts must look for them in the right place.

Journal Article

Share this book

Add to My Shelf

Evolutionary analysis and interaction prediction for protein-protein interaction network in geometric space

by Wu, Cathy H. , Liao, Li , Huang, Lei in Algorithms , Artificial intelligence , Bioinformatics

2017

Prediction of protein-protein interaction (PPI) remains a central task in systems biology. With more PPIs identified, forming PPI networks, it has become feasible and also imperative to study PPIs at the network level, such as evolutionary analysis of the networks, for better understanding of PPI networks and for more accurate prediction of pairwise PPIs by leveraging the information gained at the network level. In this work we developed a novel method that enables us to incorporate evolutionary information into geometric space to improve PPI prediction, which in turn can be used to select and evaluate various evolutionary models. The method is tested with cross-validation using human PPI network and yeast PPI network data. The results show that the accuracy of PPI prediction measured by ROC score is increased by up to 14.6%, as compared to a baseline without using evolutionary information. The results also indicate that our modified evolutionary model DANEOsf-combining a gene duplication/neofunctionalization model and scale-free model-has a better fitness and prediction efficacy for these two PPI networks. The improved PPI prediction performance may suggest that our DANEOsf evolutionary model can uncover the underlying evolutionary mechanism for these two PPI networks better than other tested models. Consequently, of particular importance is that our method offers an effective way to select evolutionary models that best capture the underlying evolutionary mechanisms, evaluating the fitness of evolutionary models from the perspective of PPI prediction on real PPI networks.

Journal Article

Share this book

Add to My Shelf

A crowdsourcing open platform for literature curation in UniProt

by Wu, Cathy H. , Wang, Yuqi , Arighi, Cecilia N. in Amino acid sequence , Amino Acid Sequence - genetics , Annotations

2021

The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.

Journal Article

Share this book

Add to My Shelf

RNA-Seq Analysis of Abdominal Fat in Genetically Fat and Lean Chickens Highlights a Divergence in Expression of Genes Controlling Adiposity, Hemostasis, and Lipid Metabolism

by Wu, Cathy H. , Simon, Jean , Resnyk, Christopher W. in Abdomen , Abdominal Fat - metabolism , Adipokines - genetics

2015

Genetic selection for enhanced growth rate in meat-type chickens (Gallus domesticus) is usually accompanied by excessive adiposity, which has negative impacts on both feed efficiency and carcass quality. Enhanced visceral fatness and several unique features of avian metabolism (i.e., fasting hyperglycemia and insulin insensitivity) mimic overt symptoms of obesity and related metabolic disorders in humans. Elucidation of the genetic and endocrine factors that contribute to excessive visceral fatness in chickens could also advance our understanding of human metabolic diseases. Here, RNA sequencing was used to examine differential gene expression in abdominal fat of genetically fat and lean chickens, which exhibit a 2.8-fold divergence in visceral fatness at 7 wk. Ingenuity Pathway Analysis revealed that many of 1687 differentially expressed genes are associated with hemostasis, endocrine function and metabolic syndrome in mammals. Among the highest expressed genes in abdominal fat, across both genotypes, were 25 differentially expressed genes associated with de novo synthesis and metabolism of lipids. Over-expression of numerous adipogenic and lipogenic genes in the FL chickens suggests that in situ lipogenesis in chickens could make a more substantial contribution to expansion of visceral fat mass than previously recognized. Distinguishing features of the abdominal fat transcriptome in lean chickens were high abundance of multiple hemostatic and vasoactive factors, transporters, and ectopic expression of several hormones/receptors, which could control local vasomotor tone and proteolytic processing of adipokines, hemostatic factors and novel endocrine factors. Over-expression of several thrombogenic genes in abdominal fat of lean chickens is quite opposite to the pro-thrombotic state found in obese humans. Clearly, divergent genetic selection for an extreme (2.5-2.8-fold) difference in visceral fatness provokes a number of novel regulatory responses that govern growth and metabolism of visceral fat in this unique avian model of juvenile-onset obesity and glucose-insulin imbalance.

Journal Article

Share this book

Add to My Shelf

miRTex: A Text Mining System for miRNA-Gene Relation Extraction

by Wu, Cathy H. , Arighi, Cecilia N. , Li, Gang in Classification , Computational Biology - methods , Data Mining - methods

2015

MicroRNAs (miRNAs) regulate a wide range of cellular and developmental processes through gene expression suppression or mRNA degradation. Experimentally validated miRNA gene targets are often reported in the literature. In this paper, we describe miRTex, a text mining system that extracts miRNA-target relations, as well as miRNA-gene and gene-miRNA regulation relations. The system achieves good precision and recall when evaluated on a literature corpus of 150 abstracts with F-scores close to 0.90 on the three different types of relations. We conducted full-scale text mining using miRTex to process all the Medline abstracts and all the full-length articles in the PubMed Central Open Access Subset. The results for all the Medline abstracts are stored in a database for interactive query and file download via the website at http://proteininformationresource.org/mirtex. Using miRTex, we identified genes potentially regulated by miRNAs in Triple Negative Breast Cancer, as well as miRNA-gene relations that, in conjunction with kinase-substrate relations, regulate the response to abiotic stress in Arabidopsis thaliana. These two use cases demonstrate the usefulness of miRTex text mining in the analysis of miRNA-regulated biological processes.

Journal Article

Share this book

Add to My Shelf

Software for pre-processing Illumina next-generation sequencing short read sequences

by Wu, Cathy H , Huang, Hongzhan , Khaleel, Sari S in Algorithms , Bioinformatics , Biomedical and Life Sciences

2014

Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt , NGS QC Toolkit and Trimmomatic . We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7 . Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/ . ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects.

Journal Article

Share this book

Add to My Shelf

TnCentral: a Prokaryotic Transposable Element Database and Web Portal for Transposon Analysis

by Wu, Cathy , Zhang, Jian , Snesrud, Erik in Animal husbandry , Antibiotic resistance , Antibiotics

2021

The ability of bacteria to undergo rapid evolution and adapt to changing environmental circumstances drives the public health crisis of multiple antibiotic resistance, as well as outbreaks of disease in economically important agricultural crops and animal husbandry. Prokaryotic transposable elements (TE) play a critical role in this. We describe here the structure and organization of TnCentral ( https://tncentral.proteininformationresource.org/ [or the mirror link at https://tncentral.ncc.unesp.br/ ]), a web resource for prokaryotic transposable elements (TE). TnCentral currently contains ∼400 carefully annotated TE, including transposons from the Tn 3 , Tn 7 , Tn 402 , and Tn 554 families; compound transposons; integrons; and associated insertion sequences (IS). These TE carry passenger genes, including genes conferring resistance to over 25 classes of antibiotics and nine types of heavy metal, as well as genes responsible for pathogenesis in plants, toxin/antitoxin gene pairs, transcription factors, and genes involved in metabolism. Each TE has its own entry page, providing details about its transposition genes, passenger genes, and other sequence features required for transposition, as well as a graphical map of all features. TnCentral content can be browsed and queried through text- and sequence-based searches with a graphic output. We describe three use cases, which illustrate how the search interface, results tables, and entry pages can be used to explore and compare TE. TnCentral also includes downloadable software to facilitate user-driven identification, with manual annotation, of certain types of TE in genomic sequences. Through the TnCentral homepage, users can also access TnPedia, which provides comprehensive reviews of the major TE families, including an extensive general section and specialized sections with descriptions of insertion sequence and transposon families. TnCentral and TnPedia are intuitive resources that can be used by clinicians and scientists to assess TE diversity in clinical, veterinary, and environmental samples. IMPORTANCE The ability of bacteria to undergo rapid evolution and adapt to changing environmental circumstances drives the public health crisis of multiple antibiotic resistance, as well as outbreaks of disease in economically important agricultural crops and animal husbandry. Prokaryotic transposable elements (TE) play a critical role in this. Many carry “passenger genes” (not required for the transposition process) conferring resistance to antibiotics or heavy metals or causing disease in plants and animals. Passenger genes are spread by normal TE transposition activities and by insertion into plasmids, which then spread via conjugation within and across bacterial populations. Thus, an understanding of TE composition and transposition mechanisms is key to developing strategies to combat bacterial pathogenesis. Toward this end, we have developed TnCentral, a bioinformatics resource dedicated to describing and exploring the structural and functional features of prokaryotic TE whose use is intuitive and accessible to users with or without bioinformatics expertise.

Journal Article

Share this book

Add to My Shelf

Completing sparse and disconnected protein-protein network by deep learning

by Wu, Cathy H. , Liao, Li , Huang, Lei in Algorithms , Bioinformatics , Biomedical and Life Sciences

2018

Background Protein-protein interaction (PPI) prediction remains a central task in systems biology to achieve a better and holistic understanding of cellular and intracellular processes. Recently, an increasing number of computational methods have shifted from pair-wise prediction to network level prediction. Many of the existing network level methods predict PPIs under the assumption that the training network should be connected. However, this assumption greatly affects the prediction power and limits the application area because the current golden standard PPI networks are usually very sparse and disconnected. Therefore, how to effectively predict PPIs based on a training network that is sparse and disconnected remains a challenge. Results In this work, we developed a novel PPI prediction method based on deep learning neural network and regularized Laplacian kernel. We use a neural network with an autoencoder-like architecture to implicitly simulate the evolutionary processes of a PPI network. Neurons of the output layer correspond to proteins and are labeled with values (1 for interaction and 0 for otherwise) from the adjacency matrix of a sparse disconnected training PPI network. Unlike autoencoder, neurons at the input layer are given all zero input, reflecting an assumption of no a priori knowledge about PPIs, and hidden layers of smaller sizes mimic ancient interactome at different times during evolution. After the training step, an evolved PPI network whose rows are outputs of the neural network can be obtained. We then predict PPIs by applying the regularized Laplacian kernel to the transition matrix that is built upon the evolved PPI network. The results from cross-validation experiments show that the PPI prediction accuracies for yeast data and human data measured as AUC are increased by up to 8.4 and 14.9% respectively, as compared to the baseline. Moreover, the evolved PPI network can also help us leverage complementary information from the disconnected training network and multiple heterogeneous data sources. Tested by the yeast data with six heterogeneous feature kernels, the results show our method can further improve the prediction performance by up to 2%, which is very close to an upper bound that is obtained by an Approximate Bayesian Computation based sampling method. Conclusions The proposed evolution deep neural network, coupled with regularized Laplacian kernel, is an effective tool in completing sparse and disconnected PPI networks and in facilitating integration of heterogeneous data sources.

Journal Article

Share this book

Add to My Shelf

Oncogenic fusion protein EWS-FLI1 is a network hub that regulates alternative splicing

by Paulsen, Michelle T. , Selvanathan, Saravana P. , Ljungman, Mats E. in alternative splicing , Alternative Splicing - drug effects , Alternative Splicing - genetics

2015

Significance Alternative splicing of RNA allows a limited number of coding regions in the human genome to produce proteins with diverse functionality. Alternative splicing has also been implicated as an oncogenic process. Identifying aspects of cancer cells that differentiate them from noncancer cells remains an ongoing challenge, and our research suggests that alternatively spliced mRNA and subsequent protein isoforms will provide new anticancer targets. We determined that the key oncoprotein of Ewing sarcoma (ES), EWS-FLI1, regulates alternative splicing in multiple cell line models. These experiments establish oncogenic aspects of splicing that are specific to cancer cells and thereby illuminate potentially oncogenic splicing shifts as well as provide a useful stratification mechanism for ES patients. The synthesis and processing of mRNA, from transcription to translation initiation, often requires splicing of intragenic material. The final mRNA composition varies based on proteins that modulate splice site selection. EWS-FLI1 is an Ewing sarcoma (ES) oncoprotein with an interactome that we demonstrate to have multiple partners in spliceosomal complexes. We evaluate the effect of EWS-FLI1 on posttranscriptional gene regulation using both exon array and RNA-seq. Genes that potentially regulate oncogenesis, including CLK1 , CASP3 , PPFIBP1 , and TERT , validate as alternatively spliced by EWS-FLI1. In a CLIP-seq experiment, we find that EWS-FLI1 RNA-binding motifs most frequently occur adjacent to intron–exon boundaries. EWS-FLI1 also alters splicing by directly binding to known splicing factors including DDX5, hnRNP K, and PRPF6. Reduction of EWS-FLI1 produces an isoform of γ-TERT that has increased telomerase activity compared with wild-type (WT) TERT. The small molecule YK-4–279 is an inhibitor of EWS-FLI1 oncogenic function that disrupts specific protein interactions, including helicases DDX5 and RNA helicase A (RHA) that alters RNA-splicing ratios. As such, YK-4–279 validates the splicing mechanism of EWS-FLI1, showing alternatively spliced gene patterns that significantly overlap with EWS-FLI1 reduction and WT human mesenchymal stem cells (hMSC). Exon array analysis of 75 ES patient samples shows similar isoform expression patterns to cell line models expressing EWS-FLI1, supporting the clinical relevance of our findings. These experiments establish systemic alternative splicing as an oncogenic process modulated by EWS-FLI1. EWS-FLI1 modulation of mRNA splicing may provide insight into the contribution of splicing toward oncogenesis, and, reciprocally, EWS-FLI1 interactions with splicing proteins may inform the splicing code.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter