Catalogue Search | MBRL

Software tools for conducting bibliometric analysis in science: An up-to-date review

by Herrera-Viedma, Enrique , Moral-Muñoz, José A. , Cobo, Manuel J. in Academic achievement , Analysis , Bibliometrics

2020

Bibliometrics has become an essential tool for assessing and analyzing the output of scientists, cooperation between universities, the effect of state-owned science funding on national research and development performance and educational efficiency, among other applications. Therefore, professionals and scientists need a range of theoretical and practical tools to measure experimental data. This review aims to provide an up-to-date review of the various tools available for conducting bibliometric and scientometric analyses, including the sources of data acquisition, performance analysis and visualization tools. The included tools were divided into three categories: general bibliometric and performance analysis, science mapping analysis, and libraries; a description of all of them is provided. A comparative analysis of the database sources support, pre-processing capabilities, analysis and visualization options were also provided in order to facilitate its understanding. Although there are numerous bibliometric databases to obtain data for bibliometric and scientometric analysis, they have been developed for a different purpose. The number of exportable records is between 500 and 50,000 and the coverage of the different science fields is unequal in each database. Concerning the analyzed tools, Bibliometrix contains the more extensive set of techniques and suitable for practitioners through Biblioshiny. VOSviewer has a fantastic visualization and is capable of loading and exporting information from many sources. SciMAT is the tool with a powerful pre-processing and export capability. In views of the variability of features, the users need to decide the desired analysis output and chose the option that better fits into their aims.

Journal Article

Share this book

Add to My Shelf

Scikit-Dimension: A Python Package for Intrinsic Dimension Estimation

by Zinovyev, Andrei , Bac, Jonathan , Gorban, Alexander N. in Algorithms , Application programming interface , Artificial Intelligence

2021

Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces scikit-dimension, an open-source Python package for intrinsic dimension estimation. The scikit-dimension package provides a uniform implementation of most of the known ID estimators based on the scikit-learn application programming interface to evaluate the global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation for real-life and synthetic data.

Journal Article

Share this book

Add to My Shelf

Varaps: a python package for estimating SARS-CoV-2 lineages proportions from pooled sequencing data (ANRS0160)

by Nuel, Gregory , Courbariaux, Marie , Djaout, El Hacene in Algorithms , Biochemistry , Biochemistry, Molecular Biology

2025

Background Wastewater-based epidemiology has been investigated as a very effective way of monitoring SARS-CoV-2 variants. This can be achieved through accurate lineage deconvolution of wastewater sequencing data. Variants Ratios from Pooled Sequencing (VaRaPS) is a Python package designed for this purpose, utilizing pooled sequencing data and lineage mutation profiles to estimate their proportions. Results VaRaPS re-implements core algorithms from the literature, achieving significant improvements in computational speed and efficiency. Comparative analyzes with simulated and synthetic data sets demonstrate its superior performance in lineage prevalence estimation, underscored by its user-oriented design for broader accessibility. Conclusions By improving speed and accuracy in SARS-CoV-2 variant analysis, VaRaPS offers valuable insights into viral evolution, supporting ongoing surveillance efforts in the post-pandemic landscape.

Journal Article

Share this book

Add to My Shelf

irreversibility: A Python Package for Assessing and Manipulating the Time Irreversibility of Real-World Time Series

by Zanin, Massimiliano in Asymmetry , Causality , Python package

2025

Time irreversibility refers to the property of some dynamical systems and time series of being statistically different when observed backward in time. While the theoretical foundations of irreversibility date back to the origin of statistical physics, the analysis of such property in real-world time series has only recently gained momentum. We present irreversibility, an open-source Python (version ≥ 3.11) package aimed at providing a large set of irreversibility metrics and tests and at facilitating their use. Besides the tests themselves, it includes a set of utilities, like functions to downsample and manipulate the time series, and to optimise the parameters of the metrics. By providing a unified software package, irreversibility simplifies the analysis of real data, allowing the researcher to compare multiple tests and obtain a better and more reproducible view of the underlying system. In this contribution we explore the features of the package and provide examples of its use.

Journal Article

Share this book

Add to My Shelf

SOAPy: a Python package to dissect spatial architecture, dynamics, and communication

by Lin, Ping , Qiu, Yiling , Li, Yu in Algorithms , Animal Genetics and Genomics , Animals

2025

Advances in spatial omics enable deeper insights into tissue microenvironments while posing computational challenges. Therefore, we developed SOAPy, a comprehensive tool for analyzing spatial omics data, which offers methods for spatial domain identification, spatial expression tendency, spatiotemporal expression pattern, cellular co-localization, multi-cellular niches, cell–cell communication, and so on. SOAPy can be applied to diverse spatial omics technologies and multiple areas in physiological and pathological contexts, such as tumor biology and developmental biology. Its versatility and robust performance make it a universal platform for spatial omics analysis, providing diverse insights into the dynamics and architecture of tissue microenvironments.

Journal Article

Share this book

Add to My Shelf

A computational framework to systematize uncertainty analysis in the sediment fingerprinting approach using least square methods

by Buligon, Lidiane , Evrard, Olivier , Buriol, Tiago Martinuzzi in Applications of Mathematics , Computational Mathematics and Numerical Analysis , Mathematical Applications in Computer Science

2024

Simulating sediment transfer processes in catchments has contributed significantly to solving environmental problems due to its importance in the silting of rivers and reservoirs and for controlling the pollution of water bodies. Among the methods used to improve data collection and modelling, the “sediment fingerprinting approach” uses tracers reflecting the composition of eroded soils and sediments in multivariate statistical analyses and mathematical models for optimizing equation systems. Based on generalized least squares (GLS) method and Mahalanobis distance, this study sought to present a computational framework to solve over-determined systems applied to sediment tracing, systematize the uncertainty analysis and sample number optimization. Hence, this approach takes into account the influence of collinearity among the chemical variables that compose the tracer set to be evaluated by the presence of the variance-covariance matrix. A dataset from the Arvorezinha experimental catchment in southern Brazil was used to validate the modeling, and our findings confirmed the assumption of increased uncertainty as the number of target samples decreases in the sources or eroded sediment samples. Sharing the code files with the PySASF (Python package for Source Apportionment with Sediment Fingerprinting) contributes to improving the technique as it allows other researchers to systematically improve the definition of the number of samples required based on the uncertainty analysis.

Journal Article

Share this book

Add to My Shelf

Reproducible MS/MS library cleaning pipeline in matchms

by Wang, Mingxun , van der Hooft, Justin J. J. , Strobel, Michael in Algorithms , Annotations , Chemistry

2024

Mass spectral libraries have proven to be essential for mass spectrum annotation, both for library matching and training new machine learning algorithms. A key step in training machine learning models is the availability of high-quality training data. Public libraries of mass spectrometry data that are open to user submission often suffer from limited metadata curation and harmonization. The resulting variability in data quality makes training of machine learning models challenging. Here we present a library cleaning pipeline designed for cleaning tandem mass spectrometry library data. The pipeline is designed with ease of use, flexibility, and reproducibility as leading principles. Scientific contribution This pipeline will result in cleaner public mass spectral libraries that will improve library searching and the quality of machine-learning training datasets in mass spectrometry. This pipeline builds on previous work by adding new functionality for curating and correcting annotated libraries, by validating structure annotations. Due to the high quality of our software, the reproducibility, and improved logging, we think our new pipeline has the potential to become the standard in the field for cleaning tandem mass spectrometry libraries. Graphical Abstract

Journal Article

Share this book

Add to My Shelf

PyAGH: a python package to fast construct kinship matrices based on different levels of omic data

by Zhao, Wei , Zhang, Zhenyang , Qadri, Qamar Raza in Accuracy , Algorithms , Bioinformatics

2023

Background Construction of kinship matrices among individuals is an important step for both association studies and prediction studies based on different levels of omic data. Methods for constructing kinship matrices are becoming diverse and different methods have their specific appropriate scenes. However, software that can comprehensively calculate kinship matrices for a variety of scenarios is still in an urgent demand. Results In this study, we developed an efficient and user-friendly python module, PyAGH, that can accomplish (1) conventional additive kinship matrces construction based on pedigree, genotypes, abundance data from transcriptome or microbiome; (2) genomic kinship matrices construction in combined population; (3) dominant and epistatic effects kinship matrices construction; (4) pedigree selection, tracing, detection and visualization; (5) visualization of cluster, heatmap and PCA analysis based on kinship matrices. The output from PyAGH can be easily integrated in other mainstream software based on users’ purposes. Compared with other softwares, PyAGH integrates multiple methods for calculating the kinship matrix and has advantages in terms of speed and data size compared to other software. PyAGH is developed in python and C + + and can be easily installed by pip tool. Installation instructions and a manual document can be freely available from https://github.com/zhaow-01/PyAGH . Conclusion PyAGH is a fast and user-friendly Python package for calculating kinship matrices using pedigree, genotype, microbiome and transcriptome data as well as processing, analyzing and visualizing data and results. This package makes it easier to perform predictions and association studies processes based on different levels of omic data.

Journal Article

Share this book

Add to My Shelf

md_(h)armonize: A Python Package for Atom-Level Harmonization of Public Metabolic Databases

by Moseley, Hunter N. B , Jin, Huan in Analysis , Data mining , database harmonization

2023

A major challenge to integrating public metabolic resources is the use of different nomenclatures by individual databases. This paper presents md_harmonize, an open-source Python package for harmonizing compounds and metabolic reactions across various metabolic databases. The md_harmonize package utilizes a neighborhood-specific graph coloring method for generating a unique identifier for each compound via atom identifiers based on a compound’s chemical structure. The resulting harmonized compounds and reactions can be used for various downstream analyses, including the construction of atom-resolved metabolic networks and models for metabolic flux analysis. Parts of the md_harmonize package have been optimized using a variety of computational techniques to allow certain NP-complete problems handled by the software to be tractable for these specific use-cases. The software is available on GitHub and through the Python Package Index, with end-user documentation hosted on GitHub Pages.

Journal Article

Share this book

Add to My Shelf

CSESpy: A Unified Framework for Data Analysis of the Payloads on Board the CSES Satellite

by Follega, Francesco Maria , Battiston, Roberto , Piersanti, Mirko in Charged particles , China Seismo Electromagnetic Satellite , Data analysis

2025

The China Seismo Electromagnetic Satellite (CSES) mission provides in situ measurements of the electromagnetic field, plasma, and charged particles in the topside ionosphere. Each CSES spacecraft carries several different scientific payloads delivering a wealth of information about the ionospheric plasma dynamics and properties, as well as measurement about energetic particles precipitating in the ionosphere. In this work, we introduce CSESpy, a Python package designed to provide an interface to CSES data products, with the aim of easing the pathway for scientists to carry out analyses of CSES data. Beyond simply being an interface to the data, CSESpy aims to provide higher-level analysis and visualization tools, as well as methods for combining concurrent measurements from different instruments, so as to allow multipayload studies in a unified framework. Moreover, CSESpy is designed to be highly flexible as such, it can be extended to interface with datasets from other sources and can be embedded in wider software ecosystems. We highlight some applications, also demonstrating that CSESpy is a powerful visualization tool for investigating complex events involving variations across multiple physical observables.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter