Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
73
result(s) for
"Piccolo, Stephen R"
Sort by:
The molecular landscape of pediatric acute myeloid leukemia reveals recurrent structural alterations and age-specific mutational interactions
2018
A comprehensive molecular analysis of almost 1,000 pediatric subjects with acute myeloid leukemia (AML) uncovers widespread differences in pediatric AML as compared to adult AML, including a higher frequency of structural variants and different mutational patterns and epigenetic signatures. Future studies are needed to characterize the functional relevance of these alterations and to explore age-tailored therapies to improve disease control in younger patients.
We present the molecular landscape of pediatric acute myeloid leukemia (AML) and characterize nearly 1,000 participants in Children's Oncology Group (COG) AML trials. The COG–National Cancer Institute (NCI) TARGET AML initiative assessed cases by whole-genome, targeted DNA, mRNA and microRNA sequencing and CpG methylation profiling. Validated DNA variants corresponded to diverse, infrequent mutations, with fewer than 40 genes mutated in >2% of cases. In contrast, somatic structural variants, including new gene fusions and focal deletions of
MBNL1
,
ZEB2
and
ELF1
, were disproportionately prevalent in young individuals as compared to adults. Conversely, mutations in
DNMT3A
and
TP53
, which were common in adults, were conspicuously absent from virtually all pediatric cases. New mutations in
GATA2
,
FLT3
and
CBL
and recurrent mutations in
MYC
-ITD,
NRAS
,
KRAS
and
WT1
were frequent in pediatric AML. Deletions, mutations and promoter DNA hypermethylation convergently impacted Wnt signaling, Polycomb repression, innate immune cell interactions and a cluster of zinc finger–encoding genes associated with
KMT2A
rearrangements. These results highlight the need for and facilitate the development of age-tailored targeted therapies for the treatment of pediatric AML.
Journal Article
The ability to classify patients based on gene-expression data varies by algorithm and performance metric
by
Piccolo, Stephen R.
,
Johnson, Jérémie L.
,
Golightly, Nathan P.
in
Accuracy
,
Algorithms
,
Biology and Life Sciences
2022
By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist—and most support diverse hyperparameters—so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.
Journal Article
trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios
2021
Background
When analyzing DNA sequence data of an individual, knowing which nucleotide was inherited from each parent can be beneficial when trying to identify certain types of DNA variants. Mendelian inheritance logic can be used to accurately phase (haplotype) the majority (67–83%) of an individual's heterozygous nucleotide positions when genotypes are available for both parents (trio). However, when all members of a trio are heterozygous at a position, Mendelian inheritance logic cannot be used to phase. For such positions, a computational phasing algorithm can be used. Existing phasing algorithms use a haplotype reference panel, sequencing reads, and/or parental genotypes to phase an individual; however, they are limited in that they can only phase certain types of variants, require a specific genotype build, require large amounts of storage capacity, and/or require long run times. We created trioPhaser to address these challenges.
Results
trioPhaser uses gVCF files from an individual and their parents as initial input, and then outputs a phased VCF file. Input trio data are first phased using Mendelian inheritance logic. Then, the positions that cannot be phased using inheritance information alone are phased by the
SHAPEIT4
phasing algorithm. Using whole-genome sequencing data of 52 trios, we show that trioPhaser, on average, increases the total number of phased positions by 21.0% and 10.5%, respectively, when compared to the number of positions that
SHAPEIT4
or Mendelian inheritance logic can phase when either is used alone. In addition, we show that the accuracy of the phased calls output by trioPhaser are similar to linked-read and read-backed phasing.
Conclusion
trioPhaser is a containerized software tool that uses both Mendelian inheritance logic and
SHAPEIT4
to phase trios when gVCF files are available. By implementing both phasing methods, more variant positions are phased compared to what either method is able to phase alone.
Journal Article
Evaluating a large language model’s ability to solve programming exercises from an introductory bioinformatics course
by
Piccolo, Stephen R.
,
Payne, Samuel H.
,
Ridge, Perry G.
in
Analysis
,
Artificial intelligence
,
Bioinformatics
2023
Computer programming is a fundamental tool for life scientists, allowing them to carry out essential research tasks. However, despite various educational efforts, learning to write code can be a challenging endeavor for students and researchers in life-sciences disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code, raising questions about whether these technologies can aid (or replace) life scientists’ efforts to write code. Using 184 programming exercises from an introductory-bioinformatics course, we evaluated the extent to which one such tool—OpenAI’s ChatGPT—could successfully complete programming tasks. ChatGPT solved 139 (75.5%) of the exercises on its first attempt. For the remaining exercises, we provided natural-language feedback to the model, prompting it to try different approaches. Within 7 or fewer attempts, ChatGPT solved 179 (97.3%) of the exercises. These findings have implications for life-sciences education and research. Instructors may need to adapt their pedagogical approaches and assessment techniques to account for these new capabilities that are available to the general public. For some programming tasks, researchers may be able to work in collaboration with machine-learning models to produce functional code.
Journal Article
A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
2016
Public compendia of sequencing data are now measured in petabytes. Accordingly, it is infeasible for researchers to transfer these data to local computers. Recently, the National Cancer Institute began exploring opportunities to work with molecular data in cloud-computing environments. With this approach, it becomes possible for scientists to take their tools to the data and thereby avoid large data transfers. It also becomes feasible to scale computing resources to the needs of a given analysis. We quantified transcript-expression levels for 12,307 RNA-Sequencing samples from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas. We used two cloud-based configurations and examined the performance and cost profiles of each configuration. Using preemptible virtual machines, we processed the samples for as little as $0.09 (USD) per sample. As the samples were processed, we collected performance metrics, which helped us track the duration of each processing step and quantified computational resources used at different stages of sample processing. Although the computational demands of reference alignment and expression quantification have decreased considerably, there remains a critical need for researchers to optimize preprocessing steps. We have stored the software, scripts, and processed data in a publicly accessible repository (
https://osf.io/gqrz9
).
Journal Article
Predicting drug sensitivity of cancer cells based on DNA methylation levels
by
Miranda, Sofia P.
,
Piccolo, Stephen R.
,
Baião, Fernanda A.
in
Algorithms
,
Analysis
,
Antineoplastic Agents - pharmacology
2021
Cancer cell lines, which are cell cultures derived from tumor samples, represent one of the least expensive and most studied preclinical models for drug development. Accurately predicting drug responses for a given cell line based on molecular features may help to optimize drug-development pipelines and explain mechanisms behind treatment responses. In this study, we focus on DNA methylation profiles as one type of molecular feature that is known to drive tumorigenesis and modulate treatment responses. Using genome-wide, DNA methylation profiles from 987 cell lines in the Genomics of Drug Sensitivity in Cancer database, we used machine-learning algorithms to evaluate the potential to predict cytotoxic responses for eight anti-cancer drugs. We compared the performance of five classification algorithms and four regression algorithms representing diverse methodologies, including tree-, probability-, kernel-, ensemble-, and distance-based approaches. We artificially subsampled the data to varying degrees, aiming to understand whether training based on relatively extreme outcomes would yield improved performance. When using classification or regression algorithms to predict discrete or continuous responses, respectively, we consistently observed excellent predictive performance when the training and test sets consisted of cell-line data. Classification algorithms performed best when we trained the models using cell lines with relatively extreme drug-response values, attaining area-under-the-receiver-operating-characteristic-curve values as high as 0.97. The regression algorithms performed best when we trained the models using the full range of drug-response values, although this depended on the performance metrics we used. Finally, we used patient data from The Cancer Genome Atlas to evaluate the feasibility of classifying clinical responses for human tumors based on models derived from cell lines. Generally, the algorithms were unable to identify patterns that predicted patient responses reliably; however, predictions by the Random Forests algorithm were significantly correlated with Temozolomide responses for low-grade gliomas.
Journal Article
Toward a methodology for evaluating DNA variants in nuclear families
2021
The genetic underpinnings of most pediatric-cancer cases are unknown. Population-based studies use large sample sizes but have accounted for only a small proportion of the estimated heritability of pediatric cancers. Pedigree-based studies are infeasible for most human populations. One alternative is to collect genetic data from a single nuclear family and use inheritance patterns within the family to filter candidate variants. This approach can be applied to common and rare variants, including those that are private to a given family or to an affected individual. We evaluated this approach using genetic data from three nuclear families with 5, 4, and 7 children, respectively. Only one child in each nuclear family had been diagnosed with cancer, and neither parent had been affected. Diagnoses for the affected children were benign low-grade astrocytoma, Wilms tumor (stage 2), and Burkitt’s lymphoma, respectively. We used whole-genome sequencing to profile normal cells from each family member and a linked-read technology for genomic phasing. For initial variant filtering, we used global minor allele frequencies, deleteriousness scores, and functional-impact annotations. Next, we used genetic variation in the unaffected siblings as a guide to filter the remaining variants. As a way to evaluate our ability to detect variant(s) that may be relevant to disease status, the corresponding author blinded the primary author to affected status; the primary author then assigned a risk score to each child. Based on this evidence, the primary author predicted which child had been affected in each family. The primary author’s prediction was correct for the child who had been diagnosed with a Wilms tumor; the child with Burkitt’s lymphoma had the second-highest risk score among the seven children in that family. This study demonstrates a methodology for filtering and evaluating candidate genomic variants and genes within nuclear families that may merit further exploration.
Journal Article
Combating subclonal evolution of resistant cancer phenotypes
by
Boltax, Jonathan P.
,
Cohen, Adam L.
,
Reddy, Chakravarthy B.
in
631/67/1059/2326
,
631/67/1347
,
631/67/2329
2017
Metastatic breast cancer remains challenging to treat, and most patients ultimately progress on therapy. This acquired drug resistance is largely due to drug-refractory sub-populations (subclones) within heterogeneous tumors. Here, we track the genetic and phenotypic subclonal evolution of four breast cancers through years of treatment to better understand how breast cancers become drug-resistant. Recurrently appearing post-chemotherapy mutations are rare. However, bulk and single-cell RNA sequencing reveal acquisition of malignant phenotypes after treatment, including enhanced mesenchymal and growth factor signaling, which may promote drug resistance, and decreased antigen presentation and TNF-α signaling, which may enable immune system avoidance. Some of these phenotypes pre-exist in pre-treatment subclones that become dominant after chemotherapy, indicating selection for resistance phenotypes. Post-chemotherapy cancer cells are effectively treated with drugs targeting acquired phenotypes. These findings highlight cancer’s ability to evolve phenotypically and suggest a phenotype-targeted treatment strategy that adapts to cancer as it evolves.
In metastatic breast cancer, subclonal evolution can drive drug resistance. Here, the authors genetically and transcriptionally follow the evolution of four breast cancers over time and treatment, and suggest a phenotype-targeted treatment strategy to adapt to cancer as it evolves.
Journal Article
Effects of germline and somatic events in candidate BRCA-like genes on breast-tumor signatures
2020
Mutations in BRCA1 and BRCA2 cause deficiencies in homologous recombination repair (HR), resulting in repair of DNA double-strand breaks by the alternative non-homologous end-joining pathway, which is more error prone. HR deficiency of breast tumors is important because it is associated with better responses to platinum salt therapies and PARP inhibitors. Among other consequences of HR deficiency are characteristic somatic-mutation signatures and gene-expression patterns. The term \"BRCA-like\" (or \"BRCAness\") describes tumors that harbor an HR defect but have no detectable germline mutation in BRCA1 or BRCA2. A better understanding of the genes and molecular events associated with tumors being BRCA-like could provide mechanistic insights and guide development of targeted treatments. Using data from The Cancer Genome Atlas (TCGA) for 1101 breast-cancer patients, we identified individuals with a germline mutation, somatic mutation, homozygous deletion, and/or hypermethylation event in BRCA1, BRCA2, and 59 other cancer-predisposition genes. Based on the assumption that BRCA-like events would have similar downstream effects on tumor biology as BRCA1/BRCA2 germline mutations, we quantified these effects based on somatic-mutation signatures and gene-expression profiles. We reduced the dimensionality of the somatic-mutation signatures and expression data and used a statistical resampling approach to quantify similarities among patients who had a BRCA1/BRCA2 germline mutation, another type of aberration in BRCA1 or BRCA2, or any type of aberration in one of the other genes. Somatic-mutation signatures of tumors having a non-germline aberration in BRCA1/BRCA2 (n = 80) were generally similar to each other and to tumors from BRCA1/BRCA2 germline carriers (n = 44). Additionally, somatic-mutation signatures of tumors with germline or somatic events in ATR (n = 16) and BARD1 (n = 8) showed high similarity to tumors from BRCA1/BRCA2 carriers. Other genes (CDKN2A, CTNNA1, PALB2, PALLD, PRSS1, SDHC) also showed high similarity but only for a small number of events or for a single event type. Tumors with germline mutations or hypermethylation of BRCA1 had relatively similar gene-expression profiles and overlapped considerably with the Basal-like subtype; but the transcriptional effects of the other events lacked consistency. Our findings confirm previously known relationships between molecular signatures and germline or somatic events in BRCA1/BRCA2. Our methodology represents an objective way to identify genes that have similar downstream effects on molecular signatures when mutated, deleted, or hypermethylated.
Journal Article
Identifying images in the biology literature that are problematic for people with a color-vision deficiency
by
Oakley, Arwen F
,
Winegar, Carly V
,
Stevens, Harlan P
in
accessibility
,
Algorithms
,
Automation
2024
To help maximize the impact of scientific journal articles, authors must ensure that article figures are accessible to people with color-vision deficiencies (CVDs), which affect up to 8% of males and 0.5% of females. We evaluated images published in biology- and medicine-oriented research articles between 2012 and 2022. Most included at least one color contrast that could be problematic for people with deuteranopia (‘deuteranopes’), the most common form of CVD. However, spatial distances and within-image labels frequently mitigated potential problems. Initially, we reviewed 4964 images from
eLife
, comparing each against a simulated version that approximated how it might appear to deuteranopes. We identified 636 (12.8%) images that we determined would be difficult for deuteranopes to interpret. Our findings suggest that the frequency of this problem has decreased over time and that articles from cell-oriented disciplines were most often problematic. We used machine learning to automate the identification of problematic images. For a hold-out test set from
eLife
(n=879), a convolutional neural network classified the images with an area under the precision-recall curve of 0.75. The same network classified images from PubMed Central (n=1191) with an area under the precision-recall curve of 0.39. We created a Web application (
https://bioapps.byu.edu/colorblind_image_tester
); users can upload images, view simulated versions, and obtain predictions. Our findings shed new light on the frequency and nature of scientific images that may be problematic for deuteranopes and motivate additional efforts to increase accessibility.
Journal Article