Catalogue Search | MBRL

Discovery and Validation of a Prostate Cancer Genomic Classifier that Predicts Early Metastasis Following Radical Prostatectomy

by Vergara, Ismael A. , Triche, Timothy J. , Fink, Stephanie in Aged , Androgens , Biological activity

2013

Clinicopathologic features and biochemical recurrence are sensitive, but not specific, predictors of metastatic disease and lethal prostate cancer. We hypothesize that a genomic expression signature detected in the primary tumor represents true biological potential of aggressive disease and provides improved prediction of early prostate cancer metastasis. A nested case-control design was used to select 639 patients from the Mayo Clinic tumor registry who underwent radical prostatectomy between 1987 and 2001. A genomic classifier (GC) was developed by modeling differential RNA expression using 1.4 million feature high-density expression arrays of men enriched for rising PSA after prostatectomy, including 213 who experienced early clinical metastasis after biochemical recurrence. A training set was used to develop a random forest classifier of 22 markers to predict for cases--men with early clinical metastasis after rising PSA. Performance of GC was compared to prognostic factors such as Gleason score and previous gene expression signatures in a withheld validation set. Expression profiles were generated from 545 unique patient samples, with median follow-up of 16.9 years. GC achieved an area under the receiver operating characteristic curve of 0.75 (0.67-0.83) in validation, outperforming clinical variables and gene signatures. GC was the only significant prognostic factor in multivariable analyses. Within Gleason score groups, cases with high GC scores experienced earlier death from prostate cancer and reduced overall survival. The markers in the classifier were found to be associated with a number of key biological processes in prostate cancer metastatic disease progression. A genomic classifier was developed and validated in a large patient cohort enriched with prostate cancer metastasis patients and a rising PSA that went on to experience metastatic disease. This early metastasis prediction model based on genomic expression in the primary tumor may be useful for identification of aggressive prostate cancer.

Journal Article

Share this book

Add to My Shelf

The Importance of Data Visualization in Combating a Pandemic

by Crisan, Anamaria in Coronaviruses , COVID-19 , COVID-19 - epidemiology

2022

An unprecedented volume and variety of data have been produced to analyze and monitor the changing dynamics of the COVID-19 pandemic. Within the backdrop of this data deluge, the use of data visualization, and especially dashboards, served as a core component of the public health response to disseminate and distill key indicators of community spread and to contextualize mitigating actions. Individuals also began to create dashboards of their own on personal websites or public platforms, such as PowerBI or Tableau Public, adding a personal interpretation to the broader public health narrative. At no other point in history has there been such an extensive collection of data, visualizations, and dashboards of a singular virus and the disease it causes. As the pandemic continues into its third year, it is important to reflect on the effectiveness of these dashboard creation strategies and what we can learn from them.

Journal Article

Share this book

Add to My Shelf

Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory

by Crisan, Anamaria , McKee, Geoffrey , Gardy, Jennifer L. in Analysis , Antimicrobial agents , Archives & records

2018

Microbial genome sequencing is now being routinely used in many clinical and public health laboratories. Understanding how to report complex genomic test results to stakeholders who may have varying familiarity with genomics-including clinicians, laboratorians, epidemiologists, and researchers-is critical to the successful and sustainable implementation of this new technology; however, there are no evidence-based guidelines for designing such a report in the pathogen genomics domain. Here, we describe an iterative, human-centered approach to creating a report template for communicating tuberculosis (TB) genomic test results. We used Design Study Methodology-a human centered approach drawn from the information visualization domain-to redesign an existing clinical report. We used expert consults and an online questionnaire to discover various stakeholders' needs around the types of data and tasks related to TB that they encounter in their daily workflow. We also evaluated their perceptions of and familiarity with genomic data, as well as its utility at various clinical decision points. These data shaped the design of multiple prototype reports that were compared against the existing report through a second online survey, with the resulting qualitative and quantitative data informing the final, redesigned, report. We recruited 78 participants, 65 of whom were clinicians, nurses, laboratorians, researchers, and epidemiologists involved in TB diagnosis, treatment, and/or surveillance. Our first survey indicated that participants were largely enthusiastic about genomic data, with the majority agreeing on its utility for certain TB diagnosis and treatment tasks and many reporting some confidence in their ability to interpret this type of data (between 58.8% and 94.1%, depending on the specific data type). When we compared our four prototype reports against the existing design, we found that for the majority (86.7%) of design comparisons, participants preferred the alternative prototype designs over the existing version, and that both clinicians and non-clinicians expressed similar design preferences. Participants showed clearer design preferences when asked to compare individual design elements versus entire reports. Both the quantitative and qualitative data informed the design of a revised report, available online as a LaTeX template. We show how a human-centered design approach integrating quantitative and qualitative feedback can be used to design an alternative report for representing complex microbial genomic data. We suggest experimental and design guidelines to inform future design studies in the bioinformatics and microbial genomics domains, and suggest that this type of mixed-methods study is important to facilitate the successful translation of pathogen genomics in the clinic, not only for clinical reports but also more complex bioinformatics data visualization software.

Journal Article

Share this book

Add to My Shelf

Mutation Discovery in Regions of Segmental Cancer Genome Amplifications with CoNAn-SNV: A Mixture Model for Next Generation Sequencing of Tumors

by Ha, Gavin , Oloumi, Arusha , Tse, Kane in Algorithms , Alterations , Analysis

2012

Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome-in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)-which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes.

Journal Article

Share this book

Add to My Shelf

Genomic Analysis of a Serotype 5 Streptococcus pneumoniae Outbreak in British Columbia, Canada, 2005–2009

by Jones, Steven J. M. , Brinkman, Fiona S. L. , Stefanovic, Aleksandra in Analysis , Bacteria , Bioinformatics

2016

Background. Streptococcus pneumoniae can cause a wide spectrum of disease, including invasive pneumococcal disease (IPD). From 2005 to 2009 an outbreak of IPD occurred in Western Canada, caused by a S. pneumoniae strain with multilocus sequence type (MLST) 289 and serotype 5. We sought to investigate the incidence of IPD due to this S. pneumoniae strain and to characterize the outbreak in British Columbia using whole-genome sequencing. Methods. IPD was defined according to Public Health Agency of Canada guidelines. Two isolates representing the beginning and end of the outbreak were whole-genome sequenced. The sequences were analyzed for single nucleotide variants (SNVs) and putative genomic islands. Results. The peak of the outbreak in British Columbia was in 2006, when 57% of invasive S. pneumoniae isolates were serotype 5. Comparison of two whole-genome sequenced strains showed only 10 SNVs between them. A 15.5 kb genomic island was identified in outbreak strains, allowing the design of a PCR assay to track the spread of the outbreak strain. Discussion. We show that the serotype 5 MLST 289 strain contains a distinguishing genomic island, which remained genetically consistent over time. Whole-genome sequencing holds great promise for real-time characterization of outbreaks in the future and may allow responses tailored to characteristics identified in the genome.

Journal Article

Share this book

Add to My Shelf

Linting is People! Exploring the Potential of Human Computation as a Sociotechnical Linter of Data Visualizations

by Crisan, Anamaria , McNutt, Andrew M in Artificial intelligence , Computation

2025

Traditionally, linters are code analysis tools that help developers by flagging potential issues from syntax and logic errors to enforcing syntactical and stylistic conventions. Recently, linting has been taken as an interface metaphor, allowing it to be extended to more complex inputs, such as visualizations, which demand a broader perspective and alternative approach to evaluation. We explore a further extended consideration of linting inputs, and modes of evaluation, across the puritanical, neutral, and rebellious dimensions. We specifically investigate the potential for leveraging human computation in linting operations through Community Notes -- crowd-sourced contextual text snippets aimed at checking and critiquing potentially accurate or misleading content on social media. We demonstrate that human-powered assessments not only identify misleading or error-prone visualizations but that integrating human computation enhances traditional linting by offering social insights. As is required these days, we consider the implications of building linters powered by Artificial Intelligence.

Paper

Share this book

Add to My Shelf

Probing the Visualization Literacy of Vision Language Models: the Good, the Bad, and the Ugly

by Crisan, Anamaria , Dong, Lianghan in Charts , Cognition & reasoning , Literacy

2025

Vision Language Models (VLMs) demonstrate promising chart comprehension capabilities. Yet, prior explorations of their visualization literacy have been limited to assessing their response correctness and fail to explore their internal reasoning. To address this gap, we adapted attention-guided class activation maps (AG-CAM) for VLMs, to visualize the influence and importance of input features (image and text) on model responses. Using this approach, we conducted an examination of four open-source (ChartGemma, Janus 1B and 7B, and LLaVA) and two closed-source (GPT-4o, Gemini) models comparing their performance and, for the open-source models, their AG-CAM results. Overall, we found that ChartGemma, a 3B parameter VLM fine-tuned for chart question-answering (QA), outperformed other open-source models and exhibited performance on par with significantly larger closed-source VLMs. We also found that VLMs exhibit spatial reasoning by accurately localizing key chart features, and semantic reasoning by associating visual elements with corresponding data values and query tokens. Our approach is the first to demonstrate the use of AG-CAM on early fusion VLM architectures, which are widely used, and for chart QA. We also show preliminary evidence that these results can align with human reasoning. Our promising open-source VLMs results pave the way for transparent and reproducible research in AI visualization literacy.

Paper

Share this book

Add to My Shelf

AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data

by Crisan, Anamaria , Shakiba Amirshahi , Abolnejadian, Mohammad in Datasets , Decision making , Embedding

2025

In decision-making conversations, experts must navigate complex choices and make on-the-spot decisions while engaged in conversation. Although extensive historical data often exists, the real-time nature of these scenarios makes it infeasible for decision-makers to review and leverage relevant information. This raises an interesting question: What if experts could utilize relevant past data in real-time decision-making through insights derived from past data? To explore this, we implemented a conversational user interface, taking doctor-patient interactions as an example use case. Our system continuously listens to the conversation, identifies patient problems and doctor-suggested solutions, and retrieves related data from an embedded dataset, generating concise insights using a pipeline built around a retrieval-based Large Language Model (LLM) agent. We evaluated the prototype by embedding Health Canada datasets into a vector database and conducting simulated studies using sample doctor-patient dialogues, showing effectiveness but also challenges, setting directions for the next steps of our work.

Paper

Share this book

Add to My Shelf

Conversational AI Threads for Visualizing Multidimensional Datasets

by Crisan, Anamaria , Matt-Heun Hong in Chatbots , Conversational artificial intelligence , Data analysis

2023

Generative Large Language Models (LLMs) show potential in data analysis, yet their full capabilities remain uncharted. Our work explores the capabilities of LLMs for creating and refining visualizations via conversational interfaces. We used an LLM to conduct a re-analysis of a prior Wizard-of-Oz study examining the use of chatbots for conducting visual analysis. We surfaced the strengths and weaknesses of LLM-driven analytic chatbots, finding that they fell short in supporting progressive visualization refinements. From these findings, we developed AI Threads, a multi-threaded analytic chatbot that enables analysts to proactively manage conversational context and improve the efficacy of its outputs. We evaluate its usability through a crowdsourced study (n=40) and in-depth interviews with expert analysts (n=10). We further demonstrate the capabilities of AI Threads on a dataset outside the LLM's training corpus. Our findings show the potential of LLMs while also surfacing challenges and fruitful avenues for future research.

Paper

Share this book

Add to My Shelf

Tracing and Visualizing Human-ML/AI Collaborative Processes through Artifacts of Data Work

by Jennifer Rogers and , Crisan, Anamaria in Artificial intelligence , Machine learning , Scientific visualization

2023

Automated Machine Learning (AutoML) technology can lower barriers in data work yet still requires human intervention to be functional. However, the complex and collaborative process resulting from humans and machines trading off work makes it difficult to trace what was done, by whom (or what), and when. In this research, we construct a taxonomy of data work artifacts that captures AutoML and human processes. We present a rigorous methodology for its creation and discuss its transferability to the visual design process. We operationalize the taxonomy through the development of AutoMLTrace, a visual interactive sketch showing both the context and temporality of human-ML/AI collaboration in data work. Finally, we demonstrate the utility of our approach via a usage scenario with an enterprise software development team. Collectively, our research process and findings explore challenges and fruitful avenues for developing data visualization tools that interrogate the sociotechnical relationships in automated data work.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter