Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
9 result(s) for "Imker, Heidi J"
Sort by:
A machine learning-enabled open biodata resource inventory from the scientific literature
Modern biological research depends on data resources. These resources archive difficult-to-reproduce data and provide added-value aggregation, curation, and analyses. Collectively, they constitute a global infrastructure of biodata resources. While the organic proliferation of biodata resources has enabled incredible research, sustained support for the individual resources that make up this distributed infrastructure is a challenge. The Global Biodata Coalition (GBC) was established by research funders in part to aid in developing sustainable funding strategies for biodata resources. An important component of this work is understanding the scope of the resource infrastructure; how many biodata resources there are, where they are, and how they are supported. Existing registries require self-registration and/or extensive curation, and we sought to develop a method for assembling a global inventory of biodata resources that could be periodically updated with minimal human intervention. The approach we developed identifies biodata resources using open data from the scientific literature. Specifically, we used a machine learning-enabled natural language processing approach to identify biodata resources from titles and abstracts of life sciences publications contained in Europe PMC. Pretrained BERT (Bidirectional Encoder Representations from Transformers) models were fine-tuned to classify publications as describing a biodata resource or not and to predict the resource name using named entity recognition. To improve the quality of the resulting inventory, low-confidence predictions and potential duplicates were manually reviewed. Further information about the resources were then obtained using article metadata, such as funder and geolocation information. These efforts yielded an inventory of 3112 unique biodata resources based on articles published from 2011–2021. The code was developed to facilitate reuse and includes automated pipelines. All products of this effort are released under permissive licensing, including the biodata resource inventory itself (CC0) and all associated code (BSD/MIT).
Homology models guide discovery of diverse enzyme specificities among dipeptide epimerases in the enolase superfamily
The rapid advance in genome sequencing presents substantial challenges for protein functional assignment, with half or more of new protein sequences inferred from these genomes having uncertain assignments. The assignment of enzyme function in functionally diverse superfamilies represents a particular challenge, which we address through a combination of computational predictions, enzymology, and structural biology. Here we describe the results of a focused investigation of a group of enzymes in the enolase superfamily that are involved in epimerizing dipeptides. The first members of this group to be functionally characterized were Ala-Glu epimerases in Eschericiha coli and Bacillus subtilis, based on the operon context and enzymological studies; these enzymes are presumed to be involved in peptidoglycan recycling. We have subsequently studied more than 65 related enzymes by computational methods, including homology modeling and metabolite docking, which suggested that many would have divergent specificities;, i.e., they are likely to have different (unknown) biological roles. In addition to the Ala-Phe epimerase specificity reported previously, we describe the prediction and experimental verification of: (i) a new group of presumed Ala-Glu epimerases; (ii) several enzymes with specificity for hydrophobic dipeptides, including one from Cytophaga hutchinsonii that epimerizes D-Ala-D-Ala; and (iii) a small group of enzymes that epimerize cationic dipeptides. Crystal structures for certain of these enzymes further elucidate the structural basis of the specificities. The results highlight the potential of computational methods to guide experimental characterization of enzymes in an automated, large-scale fashion.
Prediction and assignment of function for a divergent N-succinyl amino acid racemase
The protein databases contain many proteins with unknown function. A computational approach for predicting ligand specificity that requires only the sequence of the unknown protein would be valuable for directing experiment-based assignment of function. We focused on a family of unknown proteins in the mechanistically diverse enolase superfamily and used two approaches to assign function: (i) enzymatic assays using libraries of potential substrates, and (ii) in silico docking of the same libraries using a homology model based on the most similar (35% sequence identity) characterized protein. The results matched closely; an experimentally determined structure confirmed the predicted structure of the substrate-liganded complex. We assigned the N -succinyl arginine/lysine racemase function to the family, correcting the annotation ( L -Ala- D/L -Glu epimerase) based on the function of the most similar characterized homolog. These studies establish that ligand docking to a homology model can facilitate functional assignment of unknown proteins by restricting the identities of the possible substrates that must be experimentally tested.
A RubisCO-like protein links SAM metabolism with isoprenoid biosynthesis
Combined omics techniques lead to the functional assignment of four enzymes involved in a new methionine salvage pathway linking polyamine metabolism with isoprenoid biosynthesis. This reaction sequence involves a homolog of nature's most abundant protein, the CO 2 -fixing enzyme RubisCO. Functional assignment of uncharacterized proteins is a challenge in the era of large-scale genome sequencing. Here, we combine in extracto NMR, proteomics and transcriptomics with a newly developed (knock-out) metabolomics platform to determine a potential physiological role for a ribulose-1,5-bisphosphate carboxylase/oxygenase (RubisCO)-like protein from Rhodospirillum rubrum . Our studies unraveled an unexpected link in bacterial central carbon metabolism between S -adenosylmethionine–dependent polyamine metabolism and isoprenoid biosynthesis and also provide an alternative approach to assign enzyme function at the organismic level.
From complex histories to cohesive data, a long-term agricultural dataset from the Morrow Plots
Long-term agricultural experiments are essential to measure the impacts of farming practices on crop yields, soil fertility and biogeochemical processes. However, these impacts often only manifest at decadal timescales, requiring committed and consistent data collection that exceeds the timelines for most experiments. The second oldest agricultural experiment in the world, the Morrow Plots at University of Illinois Urbana-Champaign (USA) has examined the impact of crop rotation and fertility treatments on maize ( Zea mays L.) yields since 1876. While results have been widely reported since 1888, the publicly available longitudinal dataset described here now allows for validation of those past results, as well as new analyses and investigations. A multi-disciplinary team identified, collected, and aggregated multiple historical data sources into one comprehensive and FAIR (Findable, Accessible, Interoperable and Reusable) dataset that synthesizes yield data and management practices from 1888–2021. Updated versions of the dataset will continue to be published as additional data from this ongoing experiment are made available and as new historical data sources are uncovered.
Who Bears the Burden of Long-Lived Molecular Biology Databases?
In the early 1990s the life sciences quickly adopted online databases to facilitate wide-spread dissemination and use of scientific data. From 1991, the journal Nucleic Acids Research has published an annual Database Issue dedicated to articles describing molecular biology databases. Analysis of these articles reveals a set of long-lived databases which have now remained available for more than 15 years. Given the pervasive challenge of sustaining community resources, these databases provide an opportunity to examine what factors contribute to persistence by addressing two questions 1) which organizations fund these long-lived databases? and 2) which organizations maintain these long-lived databases? Funding and operating organizations for 67 databases were determined through review of Database Issue articles. The results reveal a diverse set of contributing organizations with financial and operational support spread across six categories: academic, consortium/collective, government, industry, philanthropic, and society/association. The majority of databases reported support from more than one funding organization, of which government organizations were most common source of funds. Operational responsibilities were more distributed, with academic organizations serving as the most common hosts. Although overall there is evidence of diversification, the most acknowledged funding and operating organizations contribute to disproportionately large percentages of the long-lived databases investigated here. Footnotes * https://doi.org/10.13012/B2IDB-3993338_V1
25 Years of Molecular Biology Databases: A Study of Proliferation, Impact, and Maintenance
Online resources enable unfettered access to and analysis of scientific data and are considered crucial for the advancement of modern science. Despite the clear power of online data resources, including web-available databases, proliferation can be problematic due to challenges in sustainability and long-term persistence. As areas of research become increasingly dependent on access to collections of data, an understanding of the scientific community's capacity to develop and maintain such resources is needed. The advent of the Internet coincided with expanding adoption of database technologies in the early 1990s, and the molecular biology community was at the forefront of using online databases to broadly disseminate data. The journal Nucleic Acids Research has long published articles dedicated to the description of online databases, as either debut or update articles. Snapshots throughout the entire history of online databases can be found in the pages of Nucleic Acids Research's Database Issue. Given the prominence of the Database Issue in the molecular biology and bioinformatics communities and the relative rarity of consistent historical documentation, database articles published in Database Issues provide a particularly unique opportunity for longitudinal analysis. To take advantage of this opportunity, the study presented here first identifies each unique database described in 3055 Nucleic Acids Research Database Issue articles published between 1991-2016 to gather a rich dataset of databases debuted during this time frame, regardless of current availability. In total, 1727 unique databases were identified and associated descriptive statistics were gathered for each, including year debuted in a Database Issue and the number of all associated Database Issue publications and accompanying citation counts. Additionally, each database identified was assessed for current availability through testing of all associated URLs published. Finally, to assess maintenance, database websites were inspected to determine the last recorded update. The resulting work allows for an examination of the overall historical trends, such as the rate of database proliferation and attrition as well as an evaluation of citation metrics and on-going database maintenance.
Assignment of enzyme function through characterization of the RuBisCO and enolase superfamilies
The issue of functional assignment is now considered the rate-limiting step in understanding biological systems in detail. One method developed to address this challenge is to study enzyme superfamilies. By understanding the diverse chemistry and substrate specificity that exists within a group of enzymes that share homology (i.e., a superfamily), one can hope to assign function within a superfamily based solely on sequence and structural similarities. Two superfamilies have been under investigation in this work. The first is the D-ribulose 1,5-bisphosphate carboxylase/oxygenase (RuBisCO) superfamily. While RuBisCO is the canonical member, some superfamily members, referred to as RuBisCO-Like-Proteins (RLPs), are unable to perform CO2 fixation. Two approaches were employed to characterize and define the RuBisCO superfamily: (1) direct assaying of RLPs of unknown function to identify new activity and (2) mechanistic characterization of RLPs of known function. To address the first method, unknown RLPs from a variety of bacteria were targeted for study. Subsequently, an RLP from R. rubrum was identified as catalyzing an unusual isomerization reaction on a methionine salvage pathway intermediate. To address the second method, the RLP that functions in a different step of the methionine salvage pathway in Bacilli organisms was mechanistically and structurally characterized. These studies revealed a novel catalytic base and suggested that the plasticity of the RuBisCO scaffold allows for facile evolution of new functions. The second superfamily explored in this work is that of the enolase superfamily. Genomic context and primary amino acid sequence make it clear that although the enzymes in this superfamily are structurally and mechanistically related, the chemistry and substrates are varied. Two highly divergent enzymes from Thermotoga maritima and Enterococcus faecalis were targeted for characterization through a multi-disciplinary effort that included structural, computational, and bioinformatic analysis as well as classical enzymology. The T. maritima and E. faecalis enzymes were subsequently confirmed as dipeptide epimerases with unique specificity for hydrophobic dipeptides through screening of dipeptide libraries by mass spectrometry followed by full kinetic characterization of individual dipeptide substrates. These results have provided additional evidence for the utility of multi-disciplinary approaches to functional assignment through use of well-characterized enzyme superfamilies.
An Integrated Data Management Plan Instructional Program
Much has been written about researcher's data management and data sharing practices and needs. The published studies show that researchers have an awareness of the data sharing mandates and policies of federal grant agencies and journal publishers and there is a growing acceptance of the intrinsic value of data sharing albeit with some concerns and caveats. However, establishing an effective and consistent data management service presents challenges for libraries, given the known disciplinary differences in data management needs and the fact that faculty have not yet significantly changed their data management practices to conform to federal agency and publisher mandates. After conducting in-depth interviews with twenty-one engineering and atmospheric science faculty at the University of Illinois at Urbana-Champaign, it became clear that scientists and engineers view the research lifecycle as a holistic endeavor and treat data as one of many necessary elements in the scholarly communication workflow. The generation, usage, storage, and sharing of data are part of the integrated scholarly workflow, and are not necessarily wholly separate processes. Building on these interviews, the authors have developed an instructional and training program that better focuses on integrating data management activities focusing on research and scholarly communication processes. The goal of our project was to examine data management practices in the context of researcher scholarly workflow needs and behaviors and develop and implement an instructional program that addresses researcher data needs. The development and assessment of this program is underway.