Catalogue Search | MBRL

PreprintMatch: A tool for preprint to publication detection shows global inequities in scientific publication

by Bandrowski, Anita , Eckmann, Peter in Analysis , Biology and Life Sciences , Biomedical research

2023

Preprints, versions of scientific manuscripts that precede peer review, are growing in popularity. They offer an opportunity to democratize and accelerate research, as they have no publication costs or a lengthy peer review process. Preprints are often later published in peer-reviewed venues, but these publications and the original preprints are frequently not linked in any way. To this end, we developed a tool, PreprintMatch, to find matches between preprints and their corresponding published papers, if they exist. This tool outperforms existing techniques to match preprints and papers, both on matching performance and speed. PreprintMatch was applied to search for matches between preprints (from bioRxiv and medRxiv), and PubMed. The preliminary nature of preprints offers a unique perspective into scientific projects at a relatively early stage, and with better matching between preprint and paper, we explored questions related to research inequity. We found that preprints from low income countries are published as peer-reviewed papers at a lower rate than high income countries (39.6% and 61.1%, respectively), and our data is consistent with previous work that cite a lack of resources, lack of stability, and policy choices to explain this discrepancy. Preprints from low income countries were also found to be published quicker (178 vs 203 days) and with less title, abstract, and author similarity to the published version compared to high income countries. Low income countries add more authors from the preprint to the published version than high income countries (0.42 authors vs 0.32, respectively), a practice that is significantly more frequent in China compared to similar countries. Finally, we find that some publishers publish work with authors from lower income countries more frequently than others.

Journal Article

Share this book

Add to My Shelf

Open Science 2.0: Towards a truly collaborative research ecosystem

by Drude, Natascha I. , Thibault, Robert T. , Argolo, Felipe in Access to information , Biology and Life Sciences , Careers

2023

Conversations about open science have reached the mainstream, yet many open science practices such as data sharing remain uncommon. Our efforts towards openness therefore need to increase in scale and aim for a more ambitious target. We need an ecosystem not only where research outputs are openly shared but also in which transparency permeates the research process from the start and lends itself to more rigorous and collaborative research. To support this vision, this Essay provides an overview of a selection of open science initiatives from the past 2 decades, focusing on methods transparency, scholarly communication, team science, and research culture, and speculates about what the future of open science could look like. It then draws on these examples to provide recommendations for how funders, institutions, journals, regulators, and other stakeholders can create an environment that is ripe for improvement.

Journal Article

Share this book

Add to My Shelf

A proposal for validation of antibodies

by Lundberg, Emma , Rimm, David L , Uhlen, Mathias in 631/1647/664/2228 , 631/250 , 706/648/453

2016

We convened an ad hoc International Working Group for Antibody Validation in order to formulate the best approaches for validating antibodies used in common research applications and to provide guidelines that ensure antibody reproducibility. We recommend five conceptual 'pillars' for antibody validation to be used in an application-specific manner.

Journal Article

Share this book

Add to My Shelf

Scaling of an antibody validation procedure enables quantification of antibody performance in major research applications

by McPherson, Peter , Ayoubi, Riham , Southern, Kathleen in Amyotrophic lateral sclerosis , Antibodies , Antibodies - chemistry

2023

Antibodies are critical reagents to detect and characterize proteins. It is commonly understood that many commercial antibodies do not recognize their intended targets, but information on the scope of the problem remains largely anecdotal, and as such, feasibility of the goal of at least one potent and specific antibody targeting each protein in a proteome cannot be assessed. Focusing on antibodies for human proteins, we have scaled a standardized characterization approach using parental and knockout cell lines (Laflamme et al., 2019) to assess the performance of 614 commercial antibodies for 65 neuroscience-related proteins. Side-by-side comparisons of all antibodies against each target, obtained from multiple commercial partners, have demonstrated that: ( i ) more than 50% of all antibodies failed in one or more applications, ( ii ) yet, ~50–75% of the protein set was covered by at least one high-performing antibody, depending on application, suggesting that coverage of human proteins by commercial antibodies is significant; and ( iii ) recombinant antibodies performed better than monoclonal or polyclonal antibodies. The hundreds of underperforming antibodies identified in this study were found to have been used in a large number of published articles, which should raise alarm. Encouragingly, more than half of the underperforming commercial antibodies were reassessed by the manufacturers, and many had alterations to their recommended usage or were removed from the market. This first study helps demonstrate the scale of the antibody specificity problem but also suggests an efficient strategy toward achieving coverage of the human proteome; mine the existing commercial antibody repertoire, and use the data to focus new renewable antibody generation efforts. Commercially produced antibodies are essential research tools. Investigators at universities and pharmaceutical companies use them to study human proteins, which carry out all the functions of the cells. Scientists usually buy antibodies from commercial manufacturers who produce more than 6 million antibody products altogether. Yet many commercial antibodies do not work as advertised. They do not recognize their intended protein target or may flag untargeted proteins. Both can skew research results and make it challenging to reproduce scientific studies, which is vital to scientific integrity. Using ineffective commercial antibodies likely wastes$1 billion in research funding each year. Large-scale validation of commercial antibodies by an independent third party could reduce the waste and misinformation associated with using ineffective commercial antibodies. Previous research testing an antibody validation pipeline showed that a commercial antibody widely used in studies to detect a protein involved in amyotrophic lateral sclerosis did not work. Meanwhile, the best-performing commercial antibodies were not used in research. Testing commercial antibodies and making the resulting data available would help scientists identify the best study tools and improve research reliability. Ayoubi et al. collaborated with antibody manufacturers and organizations that produce genetic knock-out cell lines to develop a system validating the effectiveness of commercial antibodies. In the experiments, Ayoubi et al. tested 614 commercial antibodies intended to detect 65 proteins involved in neurologic diseases. An effective antibody was available for about two thirds of the 65 proteins. Yet, hundreds of the antibodies, including many used widely in studies, were ineffective. Manufacturers removed some underperforming antibodies from the market or altered their recommended uses based on these data. Ayoubi et al. shared the resulting data on Zenodo, a publicly available preprint database. The experiments suggest that 20-30% of protein studies use ineffective antibodies, indicating a substantial need for independent assessment of commercial antibodies. Ayoubi et al. demonstrated their side-by-side antibody comparison methods were an effective and efficient way of validating commercial antibodies. Using this approach to test commercial antibodies against all human proteins would cost about $ 50 million. But it could save much of the $1 billion wasted each year on research involving ineffective antibodies. Independent validation of commercial antibodies could also reduce wasted efforts by scientists using ineffective antibodies and improve the reliability of research results. It would also enable faster, more reliable research that may help scientists understand diseases and develop new therapies to improve patient’s lives.

Journal Article

Share this book

Add to My Shelf

Automated screening of COVID-19 preprints: can we help authors to improve transparency and reproducibility?

by Capes-Davis, Amanda , Eckmann, Peter , McCann, Sarah in 692/308 , 706/648/697 , Automation - instrumentation

2021

Journal Article

Share this book

Add to My Shelf

Automatic detection and extraction of key resources from tables in biomedical papers

by Bandrowski, Anita , Ozyurt, Ibrahim Burak in Algorithms , Antibodies , Archives & records

2025

Background Tables are useful information artifacts that allow easy detection of missing data and have been deployed by several publishers to improve the amount of information present for key resources and reagents such as antibodies, cell lines, and other tools that constitute the inputs to a study. STAR*Methods key resource tables have increased the “findability” of these key resources, improving transparency of the paper by warning authors (before publication) about any problems, such as key resources that cannot be uniquely identified or those that are known to be problematic, but they have not been commonly available outside of the Cell Press journal family. We believe that processing preprints and adding these ’resource table candidates’ automatically will improve the availability of structured and linked information about research resources in a broader swath of the scientific literature. However, if the authors have already added a key resource table, that table must be detected, and each entity must be correctly identified and faithfully restructured into a standard format. Methods We introduce four end-to-end table extraction pipelines to extract and faithfully reconstruct key resource tables from biomedical papers in PDF format. The pipelines employ machine learning approaches for key resource table page identification, “Table Transformer” models for table detection, and table structure recognition. We also introduce a character-level generative pre-trained transformer (GPT) language model for scientific tables pre-trained on over 11 million scientific tables. We fine-tuned our table-specific language model with synthetic training data generated with a novel approach to alleviate row over-segmentation significantly improving key resource extraction performance. Results The extraction of key resource tables in PDF files by the popular GROBID tool resulted in a Grid Table Similarity (GriTS) score of 0.12. All of our pipelines have outperformed GROBID by a large margin. Our best pipeline with table-specific language model-based row merger achieved a GriTS score of 0.90. Conclusions Our pipelines allow the detection and extraction of key resources from tables with much higher accuracy, enabling the deployment of automated research resource extraction tools on BioRxiv to help authors correct unidentifiable key resources detected in their articles and improve the reproducibility of their findings. The code, table-specific language model, annotated training and evaluation data are publicly available.

Journal Article

Share this book

Add to My Shelf

Big data from small data: data-sharing in the 'long tail' of neuroscience

by Bandrowski, Anita E , Nielson, Jessica L , Cragin, Melissa H in 631/378/2650 , Alzheimer's disease , Animal Genetics and Genomics

2014

In this Commentary, Martone and colleagues discuss the potential benefits of sharing small datasets, also called “long-tail” data, in the Neuroscience community. They introduce the pros and cons associated with data sharing, describe the existing attitudes toward such initiative, introduce best practices and offer their views on why and how the field should establish a credit system for sharing “long-tail” data. The launch of the US BRAIN and European Human Brain Projects coincides with growing international efforts toward transparency and increased access to publicly funded research in the neurosciences. The need for data-sharing standards and neuroinformatics infrastructure is more pressing than ever. However, 'big science' efforts are not the only drivers of data-sharing needs, as neuroscientists across the full spectrum of research grapple with the overwhelming volume of data being generated daily and a scientific environment that is increasingly focused on collaboration. In this commentary, we consider the issue of sharing of the richly diverse and heterogeneous small data sets produced by individual neuroscientists, so-called long-tail data. We consider the utility of these data, the diversity of repositories and options available for sharing such data, and emerging best practices. We provide use cases in which aggregating and mining diverse long-tail data convert numerous small data sources into big data for improved knowledge about neuroscience-related disorders.

Journal Article

Share this book

Add to My Shelf

Ten simple rules for being a co-author on a many-author non-empirical paper

by Kohrs, Friederike E. , Bandrowski, Anita , Weissgerber, Tracey L. in Authorship - standards , Biology and Life Sciences , Collaboration

2025

Many-author non-empirical papers include “how to” articles, recommendations or consensus statements, roadmaps for future research, catalogs of ideas, or calls to action. These papers benefit the research community and broader academic ecosystem by addressing unmet needs or introducing new perspectives and approaches. Large, diverse authorship teams that examine an issue from many different perspectives can create valuable resources that individual co-authors could not develop independently, or in smaller groups. Realizing the potential of many-author non-empirical papers, however, requires very different strategies than researchers would typically use to write papers with fewer authors. In our process, a core team of lead writers typically works together to lead the content generation and writing processes, while many co-authors collaboratively create content and provide feedback on outlines and drafts. Challenges for co-authors may include learning to write a different type of paper, adapting to high-volume feedback, and understanding the very diverse perspectives shared by fellow co-authors. This paper outlines ten simple rules for being a co-author on a many-author non-empirical paper. Although the rules were developed for papers with at least 30 authors, some rules may be useful for many-author research papers or for non-empirical papers with fewer authors. Co-authors may also want to consult our companion paper on ten simple rules for leading a many-author non-empirical paper, as understanding the challenges faced by lead writers will help co-authors to contribute more efficiently and effectively.

Journal Article

Share this book

Add to My Shelf

Resource Disambiguator for the Web: Extracting Biomedical Resources and Their Citations from the Scientific Literature

by Martone, Maryann E. , Grethe, Jeffrey S. , Ozyurt, Ibrahim Burak in Algorithms , Artificial intelligence , Automation

2016

The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking.

Journal Article

Share this book

Add to My Shelf

Establishing Institutional Scores With the Rigor and Transparency Index: Large-scale Analysis of Scientific Reporting Quality

by Ozyurt, Ibrahim Burak , Eckmann, Peter , Menke, Joe in Automation , Bidirectionality , Biology

2022

Improving rigor and transparency measures should lead to improvements in reproducibility across the scientific literature; however, the assessment of measures of transparency tends to be very difficult if performed manually. This study addresses the enhancement of the Rigor and Transparency Index (RTI, version 2.0), which attempts to automatically assess the rigor and transparency of journals, institutions, and countries using manuscripts scored on criteria found in reproducibility guidelines (eg, Materials Design, Analysis, and Reporting checklist criteria). The RTI tracks 27 entity types using natural language processing techniques such as Bidirectional Long Short-term Memory Conditional Random Field-based models and regular expressions; this allowed us to assess over 2 million papers accessed through PubMed Central. Between 1997 and 2020 (where data were readily available in our data set), rigor and transparency measures showed general improvement (RTI 2.29 to 4.13), suggesting that authors are taking the need for improved reporting seriously. The top-scoring journals in 2020 were the Journal of Neurochemistry (6.23), British Journal of Pharmacology (6.07), and Nature Neuroscience (5.93). We extracted the institution and country of origin from the author affiliations to expand our analysis beyond journals. Among institutions publishing >1000 papers in 2020 (in the PubMed Central open access set), Capital Medical University (4.75), Yonsei University (4.58), and University of Copenhagen (4.53) were the top performers in terms of RTI. In country-level performance, we found that Ethiopia and Norway consistently topped the RTI charts of countries with 100 or more papers per year. In addition, we tested our assumption that the RTI may serve as a reliable proxy for scientific replicability (ie, a high RTI represents papers containing sufficient information for replication efforts). Using work by the Reproducibility Project: Cancer Biology, we determined that replication papers (RTI 7.61, SD 0.78) scored significantly higher (P<.001) than the original papers (RTI 3.39, SD 1.12), which according to the project required additional information from authors to begin replication efforts. These results align with our view that RTI may serve as a reliable proxy for scientific replicability. Unfortunately, RTI measures for journals, institutions, and countries fall short of the replicated paper average. If we consider the RTI of these replication studies as a target for future manuscripts, more work will be needed to ensure that the average manuscript contains sufficient information for replication attempts.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter