Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
209
result(s) for
"Ceccarelli, Michele"
Sort by:
Deep learning predicts short non-coding RNA functions from only raw sequence data
by
Cerulo, Luigi
,
Ceccarelli, Francesco
,
Noviello, Teresa Maria Rosaria
in
Accuracy
,
Artificial neural networks
,
Binding sites
2020
Small non-coding RNAs (ncRNAs) are short non-coding sequences involved in gene regulation in many biological processes and diseases. The lack of a complete comprehension of their biological functionality, especially in a genome-wide scenario, has demanded new computational approaches to annotate their roles. It is widely known that secondary structure is determinant to know RNA function and machine learning based approaches have been successfully proven to predict RNA function from secondary structure information. Here we show that RNA function can be predicted with good accuracy from a lightweight representation of sequence information without the necessity of computing secondary structure features which is computationally expensive. This finding appears to go against the dogma of secondary structure being a key determinant of function in RNA. Compared to recent secondary structure based methods, the proposed solution is more robust to sequence boundary noise and reduces drastically the computational cost allowing for large data volume annotations. Scripts and datasets to reproduce the results of experiments proposed in this study are available at:
https://github.com/bioinformatics-sannio/ncrna-deep
.
Journal Article
Machine learning prediction of oncology drug targets based on protein and network properties
by
Dezső, Zoltán
,
Ceccarelli, Michele
in
Algorithms
,
Analysis
,
Antineoplastic Agents - pharmacology
2020
Background
The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed.
Results
We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an Area Under the Curve (AUC) of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates.
Conclusions
We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.
Journal Article
TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach
by
Zoppoli, Pietro
,
Morganella, Sandro
,
Ceccarelli, Michele
in
Algorithms
,
Bioinformatics
,
Biomedical and Life Sciences
2010
Background
One of main aims of Molecular Biology is the gain of knowledge about how molecular components interact each other and to understand gene function regulations. Using microarray technology, it is possible to extract measurements of thousands of genes into a single analysis step having a picture of the cell gene expression. Several methods have been developed to infer gene networks from steady-state data, much less literature is produced about time-course data, so the development of algorithms to infer gene networks from time-series measurements is a current challenge into bioinformatics research area. In order to detect dependencies between genes at different time delays, we propose an approach to infer gene regulatory networks from time-series measurements starting from a well known algorithm based on information theory.
Results
In this paper we show how the ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) algorithm can be used for gene regulatory network inference in the case of time-course expression profiles. The resulting method is called TimeDelay-ARACNE. It just tries to extract dependencies between two genes at different time delays, providing a measure of these dependencies in terms of mutual information. The basic idea of the proposed algorithm is to detect time-delayed dependencies between the expression profiles by assuming as underlying probabilistic model a stationary Markov Random Field. Less informative dependencies are filtered out using an auto calculated threshold, retaining most reliable connections. TimeDelay-ARACNE can infer small local networks of time regulated gene-gene interactions detecting their versus and also discovering cyclic interactions also when only a medium-small number of measurements are available. We test the algorithm both on synthetic networks and on microarray expression profiles. Microarray measurements concern
S. cerevisiae
cell cycle,
E. coli
SOS pathways and a recently developed network for in vivo assessment of reverse engineering algorithms. Our results are compared with ARACNE itself and with the ones of two previously published algorithms: Dynamic Bayesian Networks and systems of ODEs, showing that TimeDelay-ARACNE has good accuracy, recall and
F
-score for the network reconstruction task.
Conclusions
Here we report the adaptation of the ARACNE algorithm to infer gene regulatory networks from time-course data, so that, the resulting network is represented as a directed graph. The proposed algorithm is expected to be useful in reconstruction of small biological directed networks from time course data.
Journal Article
A variational algorithm to detect the clonal copy number substructure of tumors from scRNA-seq data
2023
Single-cell RNA sequencing is the reference technology to characterize the composition of the tumor microenvironment and to study tumor heterogeneity at high resolution. Here we report Single CEll Variational ANeuploidy analysis (SCEVAN), a fast variational algorithm for the deconvolution of the clonal substructure of tumors from single-cell RNA-seq data. It uses a multichannel segmentation algorithm exploiting the assumption that all the cells in a given copy number clone share the same breakpoints. Thus, the smoothed expression profile of every individual cell constitutes part of the evidence of the copy number profile in each subclone. SCEVAN can automatically and accurately discriminate between malignant and non-malignant cells, resulting in a practical framework to analyze tumors and their microenvironment. We apply SCEVAN to datasets encompassing 106 samples and 93,322 cells from different tumor types and technologies. We demonstrate its application to characterize the intratumor heterogeneity and geographic evolution of malignant brain tumors.
The inference of clonal architectures in cancer using single-cell RNA-seq data remains challenging. Here, the authors develop SCEVAN, a variational algorithm for copy number-based clonal structure inference in single-cell RNA-seq data that can characterise evolution and heterogeneity in the tumour and the microenvironment.
Journal Article
Pan-cancer onco-signatures reveal a novel mitochondrial subtype of luminal breast cancer with specific regulators
2023
Background
Somatic alterations in cancer cause dysregulation of signaling pathways that control cell-cycle progression, apoptosis, and cell growth. The effect of individual alterations in these pathways differs between individual tumors and tumor types. Recognizing driver events is a complex task requiring integrating multiple molecular data, including genomics, epigenomics, and functional genomics. A common hypothesis is that these driver events share similar effects on the hallmarks of cancer. The availability of large-scale multi-omics studies allows for inferring these common effects from data. Once these effects are known, one can then deconvolve in every individual patient whether a given genomics alteration is a driver event.
Methods
Here, we develop a novel data-driven approach to identify shared oncogenic expression signatures among tumors. We aim to identify gene onco-signature for classifying tumor patients in homogeneous subclasses with distinct prognoses and specific genomic alterations. We derive expression pan-cancer onco-signatures from TCGA gene expression data using a discovery set of 9107 primary pan-tumor samples together with respective matched mutational data and a list of known cancer-related genes from COSMIC database.
Results
We use the derived ono-signatures to state their prognostic significance and apply them to the TCGA breast cancer dataset as proof of principle of our approach. We uncover a “mitochondrial” sub-group of Luminal patients characterized by its biological features and regulated by specific genetic modulators. Collectively, our results demonstrate the effectiveness of onco-signatures-based methodologies, and they also contribute to a comprehensive understanding of the metabolic heterogeneity of Luminal tumors.
Conclusions
These findings provide novel genomics evidence for developing personalized breast cancer patient treatments. The onco-signature approach, demonstrated here on breast cancer, is general and can be applied to other cancer types.
Journal Article
Learning gene regulatory networks from only positive and unlabeled data
by
Elkan, Charles
,
Cerulo, Luigi
,
Ceccarelli, Michele
in
Algorithms
,
Artificial Intelligence
,
Bioinformatics
2010
Background
Recently, supervised learning methods have been exploited to reconstruct gene regulatory networks from gene expression data. The reconstruction of a network is modeled as a binary classification problem for each pair of genes. A statistical classifier is trained to recognize the relationships between the activation profiles of gene pairs. This approach has been proven to outperform previous unsupervised methods. However, the supervised approach raises open questions. In particular, although known regulatory connections can safely be assumed to be positive training examples, obtaining negative examples is not straightforward, because definite knowledge is typically not available that a given pair of genes do not interact.
Results
A recent advance in research on data mining is a method capable of learning a classifier from only positive and unlabeled examples, that does not need labeled negative examples. Applied to the reconstruction of gene regulatory networks, we show that this method significantly outperforms the current state of the art of machine learning methods. We assess the new method using both simulated and experimental data, and obtain major performance improvement.
Conclusions
Compared to unsupervised methods for gene network inference, supervised methods are potentially more accurate, but for training they need a complete set of known regulatory connections. A supervised method that can be trained using only positive and unlabeled data, as presented in this paper, is especially beneficial for the task of inferring gene regulatory networks, because only an incomplete set of known regulatory connections is available in public databases such as RegulonDB, TRRD, KEGG, Transfac, and IPA.
Journal Article
Differential methylation of circulating free DNA assessed through cfMeDiP as a new tool for breast cancer diagnosis and detection of BRCA1/2 mutation
by
Giuffrida, Raffaella
,
Addeo, Raffaele
,
Tufano, Rossella
in
Adult
,
Biomedical and Life Sciences
,
Biomedicine
2024
Background
Recent studies have highlighted the importance of the cell-free DNA (cfDNA) methylation profile in detecting breast cancer (BC) and its different subtypes. We investigated whether plasma cfDNA methylation, using cell-free Methylated DNA Immunoprecipitation and High-Throughput Sequencing (cfMeDIP-seq), may be informative in characterizing breast cancer in patients with BRCA1/2 germline mutations for early cancer detection and response to therapy.
Methods
We enrolled 23 BC patients with germline mutation of BRCA1 and BRCA2 genes, 19 healthy controls without BRCA1/2 mutation, and two healthy individuals who carried BRCA1/2 mutations. Blood samples were collected for all study subjects at the diagnosis, and plasma was isolated by centrifugation. Cell-free DNA was extracted from 1 mL of plasma, and cfMeDIP-seq was performed for each sample. Shallow whole genome sequencing was performed on the immuno-precipitated samples. Then, the differentially methylated 300-bp regions (DMRs) between 25 BRCA germline mutation carriers and 19 non-carriers were identified. DMRs were compared with tumor-specific regions from public datasets to perform an unbiased analysis. Finally, two statistical classifiers were trained based on the GLMnet and random forest model to evaluate if the identified DMRs could discriminate BRCA-positive from healthy samples.
Results
We identified 7,095 hypermethylated and 212 hypomethylated regions in 25 BRCA germline mutation carriers compared to 19 controls. These regions discriminate tumors from healthy samples with high accuracy and sensitivity. We show that the circulating tumor DNA of BRCA1/2 mutant breast cancers is characterized by the hypomethylation of genes involved in DNA repair and cell cycle. We uncovered the TFs associated with these DRMs and identified that proteins of the Erythroblast Transformation Specific (ETS) family are particularly active in the hypermethylated regions. Finally, we assessed that these regions could discriminate between BRCA positives from healthy samples with an AUC of 0.95, a sensitivity of 88%, and a specificity of 94.74%.
Conclusions
Our study emphasizes the importance of tumor cell-derived DNA methylation in BC, reporting a different methylation profile between patients carrying mutations in BRCA1, BRCA2, and wild-type controls. Our minimally invasive approach could allow early cancer diagnosis, assessment of minimal residual disease, and monitoring of response to therapy.
Journal Article
A metabolic function of FGFR3-TACC3 gene fusions in cancer
2018
Oncogenic
FGFR3–TACC3
gene fusions signal through phosphorylated PIN4 to trigger biogenesis of peroxisomes and synthesis of new proteins, enabling mitochondrial respiration and tumour growth.
Fusion gene stimulates tumour metabolism
FGFR3
-
TACC3
gene fusions are oncogenic and have been found in many cancer types, but how they drive tumour growth is unknown. Antonio Iavarone and colleagues show that the fusion protein activates mitochondrial metabolism, which promotes protein synthesis and thereby stimulates tumour growth. The team suggest that this reliance on mitochondrial respiration could open up a new therapeutic route for treating tumours that carry the
FGFR3
-
TACC3
fusion gene.
Chromosomal translocations that generate in-frame oncogenic gene fusions are notable examples of the success of targeted cancer therapies
1
,
2
,
3
. We have previously described gene fusions of
FGFR3
-
TACC3
(F3–T3) in 3% of human glioblastoma cases
4
. Subsequent studies have reported similar frequencies of F3–T3 in many other cancers, indicating that F3–T3 is a commonly occuring fusion across all tumour types
5
,
6
. F3–T3 fusions are potent oncogenes that confer sensitivity to FGFR inhibitors, but the downstream oncogenic signalling pathways remain unknown
2
,
4
,
5
,
6
. Here we show that human tumours with F3–T3 fusions cluster within transcriptional subgroups that are characterized by the activation of mitochondrial functions. F3–T3 activates oxidative phosphorylation and mitochondrial biogenesis and induces sensitivity to inhibitors of oxidative metabolism. Phosphorylation of the phosphopeptide PIN4 is an intermediate step in the signalling pathway of the activation of mitochondrial metabolism. The F3–T3–PIN4 axis triggers the biogenesis of peroxisomes and the synthesis of new proteins. The anabolic response converges on the PGC1α coactivator through the production of intracellular reactive oxygen species, which enables mitochondrial respiration and tumour growth. These data illustrate the oncogenic circuit engaged by F3–T3 and show that F3–T3-positive tumours rely on mitochondrial respiration, highlighting this pathway as a therapeutic opportunity for the treatment of tumours with F3–T3 fusions. We also provide insights into the genetic alterations that initiate the chain of metabolic responses that drive mitochondrial metabolism in cancer.
Journal Article
Single-cell transcriptome analysis of lineage diversity in high-grade glioma
by
Canoll, Peter
,
Lasorella, Anna
,
Samanamud, Jorge
in
Animal models
,
Astrocytes
,
Bioinformatics
2018
Background
Despite extensive molecular characterization, we lack a comprehensive understanding of lineage identity, differentiation, and proliferation in high-grade gliomas (HGGs).
Methods
We sampled the cellular milieu of HGGs by profiling dissociated human surgical specimens with a high-density microwell system for massively parallel single-cell RNA-Seq. We analyzed the resulting profiles to identify subpopulations of both HGG and microenvironmental cells and applied graph-based methods to infer structural features of the malignantly transformed populations.
Results
While HGG cells can resemble glia or even immature neurons and form branched lineage structures, mesenchymal transformation results in unstructured populations. Glioma cells in a subset of mesenchymal tumors lose their neural lineage identity, express inflammatory genes, and co-exist with marked myeloid infiltration, reminiscent of molecular interactions between glioma and immune cells established in animal models. Additionally, we discovered a tight coupling between lineage resemblance and proliferation among malignantly transformed cells. Glioma cells that resemble oligodendrocyte progenitors, which proliferate in the brain, are often found in the cell cycle. Conversely, glioma cells that resemble astrocytes, neuroblasts, and oligodendrocytes, which are non-proliferative in the brain, are generally non-cycling in tumors.
Conclusions
These studies reveal a relationship between cellular identity and proliferation in HGG and distinct population structures that reflects the extent of neural and non-neural lineage resemblance among malignantly transformed cells.
Journal Article
TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages version 2; peer review: 1 approved, 2 approved with reservations
by
D'Angelo, Fulvio
,
Colaprico, Antonio
,
Silva, Tiago C
in
Bioinformatics
,
Brain cancer
,
Genomics
2016
Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as
The Cancer Genome Atlas (TCGA),
The Encyclopedia of DNA Elements (ENCODE), and
The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The
Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages:
AnnotationHub,
ChIPSeeker,
ComplexHeatmap,
pathview,
ELMER,
GAIA,
MINET,
RTCGAToolbox,
TCGAbiolinks.
Journal Article