Catalogue Search | MBRL

Computational pathology for musculoskeletal conditions using machine learning: advances, trends, and challenges

by Brendel, Matthew , Otero, Miguel , Wang, Fei in Care and treatment , Computational pathology , Convolutional neural network

2022

Histopathology is widely used to analyze clinical biopsy specimens and tissues from pre-clinical models of a variety of musculoskeletal conditions. Histological assessment relies on scoring systems that require expertise, time, and resources, which can lead to an analysis bottleneck. Recent advancements in digital imaging and image processing provide an opportunity to automate histological analyses by implementing advanced statistical models such as machine learning and deep learning, which would greatly benefit the musculoskeletal field. This review provides a high-level overview of machine learning applications, a general pipeline of tissue collection to model selection, and highlights the development of image analysis methods, including some machine learning applications, to solve musculoskeletal problems. We discuss the optimization steps for tissue processing, sectioning, staining, and imaging that are critical for the successful generalizability of an automated image analysis model. We also commenting on the considerations that should be taken into account during model selection and the considerable advances in the field of computer vision outside of histopathology, which can be leveraged for image analysis. Finally, we provide a historic perspective of the previously used histopathological image analysis applications for musculoskeletal diseases, and we contrast it with the advantages of implementing state-of-the-art computational pathology approaches. While some deep learning approaches have been used, there is a significant opportunity to expand the use of such approaches to solve musculoskeletal problems.

Journal Article

Share this book

Add to My Shelf

Frequentmers - a novel way to look at metagenomic next generation sequencing data and an application in detecting liver cirrhosis

by Mouratidis, Ioannis , Moeckel, Camille , Konnaris, Maxwell A. in Algorithms , Analysis , Animal Genetics and Genomics

2023

Early detection of human disease is associated with improved clinical outcomes. However, many diseases are often detected at an advanced, symptomatic stage where patients are past efficacious treatment periods and can result in less favorable outcomes. Therefore, methods that can accurately detect human disease at a presymptomatic stage are urgently needed. Here, we introduce “frequentmers”; short sequences that are specific and recurrently observed in either patient or healthy control samples, but not in both. We showcase the utility of frequentmers for the detection of liver cirrhosis using metagenomic Next Generation Sequencing data from stool samples of patients and controls. We develop classification models for the detection of liver cirrhosis and achieve an AUC score of 0.91 using ten-fold cross-validation. A small subset of 200 frequentmers can achieve comparable results in detecting liver cirrhosis. Finally, we identify the microbial organisms in liver cirrhosis samples, which are associated with the most predictive frequentmer biomarkers.

Journal Article

Share this book

Add to My Shelf

Increased Comorbidity Burden Among Hip Fracture Patients During the COVID-19 Pandemic in New York City

by Goldwyn, Elan M. , Ricci, William M. , Mendias, Christopher L in Comorbidity , Coronaviruses , COVID-19

2021

Background The coronavirus disease 19 (COVID-19) pandemic had a devastating effect on New York City in the spring of 2020. Several global reports suggested worse early outcomes among COVID-positive patients with hip fractures. However, there is limited data comparing baseline comorbidities among patients treated during the pandemic relative to those treated in non-pandemic conditions. Materials and Methods A multicenter retrospective cohort study was performed at two Level 1 Trauma centers and one orthopedic specialty hospital to assess demographics, comorbidities, and outcomes among 67 hip fracture patients treated (OTA/AO 31, 32.1) during the peak of the COVID-19 pandemic in New York City (March 20, 2020 to April 24, 2020), including 9 who were diagnosed with COVID-19. These patients were compared to a cohort of 76 hip fracture patients treated 1 year prior (March 20, 2019 to April 24, 2019). Baseline demographics, comorbidities, treatment characteristics, and respiratory symptomatology were evaluated. The primary outcome was inpatient mortality. Results Relative to patients treated in 2019, patients with hip fractures during the pandemic had worse Charlson Comorbidity Indices (median 5.0 vs 6.0, P = .03) and American Society of Anesthesiologists (ASA) scores (mean 2.4 vs 2.7, P = .04). Patients during the COVID-19 pandemic were more likely to have decreased ambulatory status (P<.01) and a smoking history (P = .04). Patients in 2020 had longer inpatient stays (median 5 vs 7 days, P = .01), and were more likely to be discharged home (61% vs 9%, P<.01). Inpatient mortality was significantly increased during the COVID-19 pandemic (12% vs 0%, P = .002). Conclusions Patients with hip fractures during the COVID-19 pandemic had worse comorbidity profiles and decreased functional status compared to patients treated the year prior. This information may be relevant in negotiations regarding reimbursement for cost of care of hip fracture patients with COVID-19, as these patients may require more expensive care.

Journal Article

Share this book

Add to My Shelf

The determinants of the rarity of nucleic and peptide short sequences in nature

by Mouratidis, Ioannis , Montgomery, Austin , Patsakis, Michail in Genomes , Oligonucleotides , Peptides

2024

The prevalence of nucleic and peptide short sequences across organismal genomes and proteomes has not been thoroughly investigated. We examined 45 785 reference genomes and 21 871 reference proteomes, spanning archaea, bacteria, eukaryotes and viruses to calculate the rarity of short sequences in them. To capture this, we developed a metric of the rarity of each sequence in nature, the rarity index. We find that the frequency of certain dipeptides in rare oligopeptide sequences is hundreds of times lower than expected, which is not the case for any dinucleotides. We also generate predictive regression models that infer the rarity of nucleic and proteomic sequences across nature or within each domain of life and viruses separately. When examining each of the three domains of life and viruses separately, the R² performance of the model predicting rarity for 5-mer peptides from mono- and dipeptides ranged between 0.814 and 0.932. A separate model predicting rarity for 10-mer oligonucleotides from mono- and dinucleotides achieved R² performance between 0.408 and 0.606. Our results indicate that the mono- and dinucleotide composition of nucleic sequences and the mono- and dipeptide composition of peptide sequences can explain a significant proportion of the variance in their frequencies in nature.

Journal Article

Share this book

Add to My Shelf

Uncertainty Modeling Outperforms Machine Learning for Microbiome Data Analysis

by Lazar, Nicole , Konnaris, Maxwell A , Saxena, Manan in Bioinformatics

2025

Microbiome sequencing measures relative rather than absolute abundances, providing no direct information about total microbial load. Normalization methods attempt to compensate, but rely on strong, often untestable assumptions that can bias inference. Experimental measurements of load (e.g., qPCR, flow cytometry) offer a solution, but remain costly and uncommon. A recent high-profile study proposed that machine learning could bypass this limitation by predicting microbial load from sequencing data alone. To evaluate this claim, we assembled , the largest public database of paired sequencing and load measurements, spanning 35 studies and over 15,000 samples. Using , we show that published machine learning models fail to generalize: on average they perform worse than a naive baseline that always predicted the training set mean. These failures stem from covariate shift-limited shared taxa between studies, differences in community composition, and differences in preprocessing pipelines-that silently derail model inputs. In contrast, Bayesian partially identified models do not attempt to impute microbial load, but instead propagate scale uncertainty through downstream analyses. Across 30 benchmark datasets, Bayesian partially identified models consistently outperformed normalization and machine learning approaches, providing a principled and reproducible foundation for microbiome inference.

Journal Article

Share this book

Add to My Shelf

MPRAbase: A Massively Parallel Reporter Assay Database

by Zhao, Jingjing , Mouratidis, Ioannis , Liu, Zhe in Genomics

2023

Massively parallel reporter assays (MPRAs) represent a set of high-throughput technologies that measure the functional effects of thousands of sequences/variants on gene regulatory activity. There are several different variations of MPRA technology and they are used for numerous applications, including regulatory element discovery, variant effect measurement, saturation mutagenesis, synthetic regulatory element generation or characterization of evolutionary gene regulatory differences. Despite their many designs and uses, there is no comprehensive database that incorporates the results of these experiments. To address this, we developed MPRAbase, a manually curated database that currently harbors 129 experiments, encompassing 17,718,677 elements tested across 35 cell types and 4 organisms. The MPRAbase web interface (http://www.mprabase.com) serves as a centralized user-friendly repository to download existing MPRA data for independent analysis and is designed with the ability to allow researchers to share their published data for rapid dissemination to the community.

Journal Article

Share this book

Add to My Shelf

Automated multi-scale computational pathotyping (AMSCP) of inflamed synovial tissue

by Peter K. Gregersen , Deepak A. Rao , Josh Keegan in 631/114/1305 , 631/250/2503 , 692/699/1670/498

2024

Rheumatoid arthritis (RA) is a complex immune-mediated inflammatory disorder in which patients suffer from inflammatory-erosive arthritis. Recent advances on histopathology heterogeneity of RA synovial tissue revealed three distinct phenotypes based on cellular composition (pauci-immune, diffuse and lymphoid), suggesting that distinct etiologies warrant specific targeted therapy which motivates a need for cost effective phenotyping tools in preclinical and clinical settings. To this end, we developed an automated multi-scale computational pathotyping (AMSCP) pipeline for both human and mouse synovial tissue with two distinct components that can be leveraged together or independently: (1) segmentation of different tissue types to characterize tissue-level changes, and (2) cell type classification within each tissue compartment that assesses change across disease states. Here, we demonstrate the efficacy, efficiency, and robustness of the AMSCP pipeline as well as the ability to discover novel phenotypes. Taken together, we find AMSCP to be a valuable cost-effective method for both pre-clinical and clinical research. Automated pathotyping of synovial tissue in arthritis is a major unmet need. Here, the authors develop and demonstrate the efficacy of an automated tissue and cellular pathotyping tool for inflammatory arthritis.

Journal Article

Share this book

Add to My Shelf

Nucleic Quasi-Primes: Identification of the Shortest Unique Oligonucleotide Sequences in a Species

by Mouratidis, Ioannis , Pavlopoulos, Georgios , Chartoumpekis, Dionysios V in Astrocytes , Cell activation , Cognitive ability

2023

Despite the exponential increase in sequencing information driven by massively parallel DNA sequencing technologies, universal and succinct genomic fingerprints for each organism are still missing. Identifying the shortest species-specific nucleic sequences offers insights into species evolution and holds potential practical applications in agriculture, wildlife conservation, and healthcare. We propose a new method for sequence analysis termed nucleic quasi-primes, the shortest occurring sequences in each of 45,785 organismal reference genomes, present in one genome and absent from every other examined genome. In the human genome, we find that the genomic loci of nucleic quasi-primes are most enriched for genes associated with brain development and cognitive function. In a single-cell case study focusing on the human primary motor cortex, nucleic quasi-prime genes account for a significantly larger proportion of the variation based on average gene expression. Non-neuronal cell-types, including astrocytes, endothelial cells, microglia perivascular-macrophages, oligodendrocytes, and vascular and leptomeningeal cells, exhibited significant activation of quasi-prime containing gene associations related to cancer, while simultaneously suppressing quasi-prime containing genes were associated with cognitive, mental, and developmental disorders. We also show that human disease-causing variants, eQTLs, mQTLs and sQTLs are 4.43-fold, 4.34-fold, 4.29-fold and 4.21-fold enriched at human quasi-prime loci, respectively. These findings indicate that nucleic quasi-primes are genomic loci linked to the evolution of species-specific traits and in humans they provide insights in the development of cognitive traits and human diseases, including neurodevelopmental disorders.Competing Interest StatementThe authors have declared no competing interest.

Paper

Share this book

Add to My Shelf

kmerDB: A Database Encompassing the Set of Genomic and Proteomic Sequence Information for Each Species

by Chan, Candace , Mouratidis, Ioannis , Pavlopoulos, Georgios in Genomes , Genomics , Population decline

2023

The rapid decline in sequencing cost has enabled the generation of reference genomes and proteomes for a growing number of organisms. However, at the present time, there is no established repository that provides information about organism-specific genomic and proteomic sequences of certain lengths, also known as kmers, that are either present or absent in each genome or proteome. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 45,785 and 22,386 reference genomes and proteomes, respectively, as well as 14,658,776 and 149,264,442 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences that are absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.Competing Interest StatementThe authors have declared no competing interest.

Paper

Share this book

Add to My Shelf

The determinants of the rarity of nucleic and peptide short sequences in nature

by Montgomery, Austin , Mouratidis, Ioannis , Mareboina, Manvita in Genomics

2023

The prevalence of nucleic and peptide short sequences across organismal genomes and proteomes has not been thoroughly investigated. Here we examined 45,785 reference genomes and 21,871 reference proteomes, spanning archaea, bacteria, viruses and eukaryotes to calculate the rarity of short sequences in them. To capture this, we developed a metric of the rarity of each sequence in nature, the Anti-Kardashian index. We find that the frequency of certain dipeptides in rare oligopeptide sequences is hundreds of times lower than expected, which is not the case for any dinucleotides. We also generate predictive regression models that infer the rarity of nucleic and proteomic sequences in nature. For six-mer peptide kmers the R2 performance of the regression models based on amino acid and dipeptide content is 0.816, whereas models based on physicochemical features achieve an R2 of 0.788. For twelve-mer nucleic kmers the R2 performance of our models based on mono and dinucleotides is 0.481. Our results indicate that the mono and dinucleotide composition of nucleic sequences and the amino acids, dipeptides and physicochemical properties of peptide sequences can explain a significant proportion of the variance in their frequencies between organisms in nature.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter