Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation

by Quintana, Daniel S

in Biobehavioral Sciences / Biometry / Confidence intervals / data / Datasets / Datasets as Topic / Disclosure / Disclosure of information / Exploration / Human Biology and Medicine / Information Dissemination - methods / meta-research / Open data / Psychology / Religion / Reproducibility / Retirement benefits / Scientists / Spirituality / statistics / Tools and Resources / Variables

2020

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Are you sure you want to remove the book from the shelf?

A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation

by Quintana, Daniel S

2020

Confirm

Do you wish to request the book?

A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation

by Quintana, Daniel S

2020

Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy

How would you like to get it?

Submit

We have requested the book for you!

Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.

Oops! Something went wrong.

Looks like we were not able to place your request. Kindly try again later.

Journal Article

A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation

Quintana, Daniel S

2020

Overview

Open research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy. It is becoming increasingly common for scientists to share their data with other researchers. This makes it possible to independently verify reported results, which increases trust in research. Sometimes it is not possible to share certain datasets because they include sensitive information about individuals. In psychology and medicine, scientists have tried to remove identifying information from datasets before sharing them by, for example, adding minor artificial errors. But, even when researchers take these steps, it may still be possible to identify individuals, and the introduction of artificial errors can make it harder to verify the original results. One potential alternative to sharing sensitive data is to create ‘synthetic datasets’. Synthetic datasets mimic original datasets by maintaining the statistical properties of the data but without matching the original recorded values. Synthetic datasets are already being used, for example, to share confidential census data. However, this approach is rarely used in other areas of research. Now, Daniel S. Quintana demonstrates how synthetic datasets can be used in psychology and medicine. Three different datasets were studied to ensure that synthetic datasets performed well regardless of the type or size of the data. Quintana evaluated freely available software that could generate synthetic versions of these different datasets, which essentially removed any identifying information. The results obtained by analysing the synthetic datasets closely mimicked the original results. These tools could allow researchers to verify each other’s results more easily without jeopardizing the privacy of participants. This could encourage more collaboration, stimulate ideas for future research, and increase data sharing between research groups.

Share this book

Add to My Shelf

Publisher

eLife Science Publications, Ltd,eLife Sciences Publications Ltd,eLife Sciences Publications, Ltd

Subject

Biobehavioral Sciences

/ Biometry

/ Confidence intervals

/ data

/ Datasets

/ Datasets as Topic

/ Disclosure