Catalogue Search | MBRL

by Jones, Michael N., 1975- editor in Cognitive science Research Data processing. , Data mining. , Big data.

Book

by Sim, Ida , Hekler, Eric B. , Lewis, Dana in Analysis , Artificial intelligence , Beyond Big Data to new Biomedical and Health Data Science: moving to next century precision health

2019

Background There is great interest in and excitement about the concept of personalized or precision medicine and, in particular, advancing this vision via various ‘big data’ efforts. While these methods are necessary, they are insufficient to achieve the full personalized medicine promise. A rigorous, complementary ‘small data’ paradigm that can function both autonomously from and in collaboration with big data is also needed. By ‘small data’ we build on Estrin’s formulation and refer to the rigorous use of data by and for a specific N-of-1 unit (i.e., a single person, clinic, hospital, healthcare system, community, city, etc.) to facilitate improved individual-level description, prediction and, ultimately, control for that specific unit. Main body The purpose of this piece is to articulate why a small data paradigm is needed and is valuable in itself, and to provide initial directions for future work that can advance study designs and data analytic techniques for a small data approach to precision health. Scientifically, the central value of a small data approach is that it can uniquely manage complex, dynamic, multi-causal, idiosyncratically manifesting phenomena, such as chronic diseases, in comparison to big data. Beyond this, a small data approach better aligns the goals of science and practice, which can result in more rapid agile learning with less data. There is also, feasibly, a unique pathway towards transportable knowledge from a small data approach, which is complementary to a big data approach. Future work should (1) further refine appropriate methods for a small data approach; (2) advance strategies for better integrating a small data approach into real-world practices; and (3) advance ways of actively integrating the strengths and limitations from both small and big data approaches into a unified scientific knowledge base that is linked via a robust science of causality. Conclusion Small data is valuable in its own right. That said, small and big data paradigms can and should be combined via a foundational science of causality. With these approaches combined, the vision of precision health can be achieved.

Journal Article

Share this book

Add to My Shelf

Data structures and an introduction to algorthims = دليل التجارب في تراكيب البيانات ومقدمة الخوارزميات

by Rasras, Rashad J. author , Abu Zneit, Rushdi S. author in Data structures (Computer science) , Algorithms Data processing

2010

Book

Share this book

Add to My Shelf

Variability in the analysis of a single neuroimaging dataset by many teams

by Dickie, Erin W. , Sanz-Morales, Emilio , Baczkowski, Blazej M. in 59/36 , 59/57 , 631/378/2649/1409

2020

Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses 1 . The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset 2 – 5 . Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed. The results obtained by seventy different teams analysing the same functional magnetic resonance imaging dataset show substantial variation, highlighting the influence of analytical choices and the importance of sharing workflows publicly and performing multiple analyses.

Journal Article

Share this book

Add to My Shelf

Google BigQuery analytics

by Tigani, Jordan, author , Naidu, Siddartha, author in Google. , Data warehousing. , Data mining.

Book

Share this book

Add to My Shelf

Artificial Intelligence in mental health and the biases of language based models

by Straw, Isabel , Callison-Burch, Chris in Algorithms , Artificial intelligence , Bias

2020

The rapid integration of Artificial Intelligence (AI) into the healthcare field has occurred with little communication between computer scientists and doctors. The impact of AI on health outcomes and inequalities calls for health professionals and data scientists to make a collaborative effort to ensure historic health disparities are not encoded into the future. We present a study that evaluates bias in existing Natural Language Processing (NLP) models used in psychiatry and discuss how these biases may widen health inequalities. Our approach systematically evaluates each stage of model development to explore how biases arise from a clinical, data science and linguistic perspective. A literature review of the uses of NLP in mental health was carried out across multiple disciplinary databases with defined Mesh terms and keywords. Our primary analysis evaluated biases within 'GloVe' and 'Word2Vec' word embeddings. Euclidean distances were measured to assess relationships between psychiatric terms and demographic labels, and vector similarity functions were used to solve analogy questions relating to mental health. Our primary analysis of mental health terminology in GloVe and Word2Vec embeddings demonstrated significant biases with respect to religion, race, gender, nationality, sexuality and age. Our literature review returned 52 papers, of which none addressed all the areas of possible bias that we identify in model development. In addition, only one article existed on more than one research database, demonstrating the isolation of research within disciplinary silos and inhibiting cross-disciplinary collaboration or communication. Our findings are relevant to professionals who wish to minimize the health inequalities that may arise as a result of AI and data-driven algorithms. We offer primary research identifying biases within these technologies and provide recommendations for avoiding these harms in the future.

Journal Article

Share this book

Add to My Shelf

Master data management and data governance

by Berson, Alex , Dubov, Lawrence in Customer relations Data processing. , Data warehousing. , Data integration (Computer science)

Book

Share this book

Add to My Shelf

Can language models automate data wrangling?

by Hernández-Orallo, José , Jaimovitch-López, Gonzalo , Martínez-Plumed, Fernando in Artificial Intelligence , Automation , Computer Science

2023

The automation of data science and other data manipulation processes depend on the integration and formatting of ‘messy’ data. Data wrangling is an umbrella term for these tedious and time-consuming tasks. Tasks such as transforming dates, units or names expressed in different formats have been challenging for machine learning because (1) users expect to solve them with short cues or few examples, and (2) the problems depend heavily on domain knowledge. Interestingly, large language models today (1) can infer from very few examples or even a short clue in natural language, and (2) can integrate vast amounts of domain knowledge. It is then an important research question to analyse whether language models are a promising approach for data wrangling, especially as their capabilities continue growing. In this paper we apply different variants of the language model Generative Pre-trained Transformer (GPT) to five batteries covering a wide range of data wrangling problems. We compare the effect of prompts and few-shot regimes on their results and how they compare with specialised data wrangling systems and other tools. Our major finding is that they appear as a powerful tool for a wide range of data wrangling tasks. We provide some guidelines about how they can be integrated into data processing pipelines, provided the users can take advantage of their flexibility and the diversity of tasks to be addressed. However, reliability is still an important issue to overcome.

Journal Article

Share this book

Add to My Shelf

Complete guide to open source big data stack

by Frampton, Mike, author in Big data. , Computer science. , Data structures (Computer science)

\"This book describes the creation of an actual generic open source big data stack, which is an integrated stack of big data components--each of which serves a specific function like storage, resource management, or queueing. Each component has a big data heritage and community to support it. It can support big data in that it is able to scale, and it is a distributed and robust system. In the Complete Guide to Open Source Big Data Stack, New Zealand author, Mike Frampton, begins by creating a private cloud and then by installing and examining Apache Brooklyn. After that he will use each chapter to introduce one piece of the big data stack-sharing how to source the software and then how to install it. He will then show how it works by simple example. Step by step and chapter by chapter, Frampton will create a real big data stack. The goal of this book is to show how a big data stack might be created and what components might be used. It attempts to do this with currently available Apache full and incubating systems. The aim is to introduce these components by example and show how they might work together. The book concentrates on Apache-based systems and shares detailed examples of cloud storage, release management, resources management, processing, queuing, frameworks, data visualization, and more.\"-- Provided by publisher.

Book

Share this book

Add to My Shelf

Opportunities and challenges in long-read sequencing data analysis

by Amarasinghe, Shanika L. , Su, Shian , Dong, Xueyi in Accuracy , Animal Genetics and Genomics , Animals

2020

Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter