Catalogue Search | MBRL

by VILJOEN, SALOMÉ in Data collection , Data entry , Data protection

2021

Data-governance law – the legal regime that regulates how data about people is collected, processed, and used – is the subject of lively theorizing and several proposed legislative reforms. Different theories advance different legal interests in information. Some seek to reassert individual control for data subjects over the terms of their datafication, while others aim to maximize data-subject financial gain. But these proposals share a common conceptual flaw. Put simply, they miss the point of data production in a digital economy: to put people into population-based relations with one another. This relational aspect of data production drives much of the social value and harm of data collection and use in a digital economy. This Feature advances a theoretical account of data as social relations, constituted by both legal and technical systems. It shows how data relations result in supraindividual legal interests. Properly representing and adjudicating among those interests necessitates far more public and collective (i.e., democratic) forms of governing data production. Individualist data-subject rights cannot represent, let alone address, these population-level effects. This account offers two insights for data-governance law. First, it better reflects how and why data collection and use produce economic value as well as social harm in the digital economy. This brings the law governing data flows into line with the economic realities of how data production operates as a key input to the information economy. Second, this account offers an alternative normative argument for what makes datafication – the transformation of information about people into a commodity – wrongful. What makes datafication wrong is not (only) that it erodes the capacity for subject self-formation, but instead that it materializes unjust social relations: data relations that enact or amplify social inequality. This account indexes many of the most pressing forms of social informational harm that animate criticism of data extraction but fall outside typical accounts of informational harm. This account also offers a positive theory for socially beneficial data production. Addressing the inegalitarian harms of datafication – and developing socially beneficial alternatives – will require democratizing data social relations: moving from individual data-subject rights to more democratic institutions of data governance.

Journal Article

Share this book

Add to My Shelf

Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

by MIKHAYLOV, SLAVA , CONWAY, DREW , LAUDERDALE, BENJAMIN E. in Character Recognition , Crowds , Crowdsourcing

2016

Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

Journal Article

Share this book

Add to My Shelf

The Blessings of Multiple Causes

by Wang, Yixin , Blei, David M. in Algorithms , Assumptions , Bias

2019

Causal inference from observational data is a vital problem, but it comes with strong assumptions. Most methods assume that we observe all confounders, variables that affect both the causal variables and the outcome variables. This assumption is standard but it is also untestable. In this article, we develop the deconfounder, a way to do causal inference with weaker assumptions than the traditional methods require. The deconfounder is designed for problems of multiple causal inference: scientific studies that involve multiple causes whose effects are simultaneously of interest. Specifically, the deconfounder combines unsupervised machine learning and predictive model checking to use the dependencies among multiple causes as indirect evidence for some of the unobserved confounders. We develop the deconfounder algorithm, prove that it is unbiased, and show that it requires weaker assumptions than traditional causal inference. We analyze its performance in three types of studies: semi-simulated data around smoking and lung cancer, semi-simulated data around genome-wide association studies, and a real dataset about actors and movie revenue. The deconfounder is an effective approach to estimating causal effects in problems of multiple causal inference. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant

by Simmons, Joseph P. , Nelson, Leif D. , Simonsohn, Uri in Adult , Ambiguity , Biological and medical sciences

2011

In this article, we accomplish two things. First, we show that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (< .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.

Journal Article

Share this book

Add to My Shelf

Determinants of social desirability bias in sensitive surveys: a literature review

by Krumpal, Ivar in Anti-social behaviour , Attitudes , Behavior

2013

Survey questions asking about taboo topics such as sexual activities, illegal behaviour such as social fraud, or unsocial attitudes such as racism, often generate inaccurate survey estimates which are distorted by social desirability bias. Due to self-presentation concerns, survey respondents underreport socially undesirable activities and overreport socially desirable ones. This article reviews theoretical explanations of socially motivated misreporting in sensitive surveys and provides an overview of the empirical evidence on the effectiveness of specific survey methods designed to encourage the respondents to answer more honestly. Besides psychological aspects, like a stable need for social approval and the preference for not getting involved into embarrassing social interactions, aspects of the survey design, the interviewer’s characteristics and the survey situation determine the occurrence and the degree of social desirability bias. The review shows that survey designers could generate more valid data by selecting appropriate data collection strategies that reduce respondents’ discomfort when answering to a sensitive question.

Journal Article

Share this book

Add to My Shelf

Balancing data privacy and usability in the federal statistical system

by Spencer, Bruce D. , Manski, Charles F. , Nekipelov, Denis in Computer Security , Confidentiality , Criteria

2022

The federal statistical system is experiencing competing pressures for change. On the one hand, for confidentiality reasons, much socially valuable data currently held by federal agencies is either not made available to researchers at all or only made available under onerous conditions. On the other hand, agencies which release public databases face new challenges in protecting the privacy of the subjects in those databases, which leads them to consider releasing fewer data or masking the data in ways that will reduce their accuracy. In this essay, we argue that the discussion has not given proper consideration to the reduced social benefits of data availability and their usability relative to the value of increased levels of privacy protection. A more balanced benefit–cost framework should be used to assess these trade-offs. We express concerns both with synthetic data methods for disclosure limitation, which will reduce the types of research that can be reliably conducted in unknown ways, and with differential privacy criteria that use what we argue is an inappropriate measure of disclosure risk. We recommend that the measure of disclosure risk used to assess all disclosure protection methods focus on what we believe is the risk that individuals should care about, that more study of the impact of differential privacy criteria and synthetic data methods on data usability for research be conducted before either is put into widespread use, and that more research be conducted on alternative methods of disclosure risk reduction that better balance benefits and costs.

Journal Article

Share this book

Add to My Shelf

Big Data in Public Affairs

by Rethemeyer, R. Karl , Mergel, Ines , Isett, Kimberley in Affairs , Behavior Patterns , Big Data

2016

This article offers an overview of the conceptual, substantive, and practical issues surrounding \"big data\" to provide one perspective on how the field of public affairs can successfully cope with the big data revolution. Big data in public affairs refers to a combination of administrative data collected through traditional means and large-scale data sets created by sensors, computer networks, or individuals as they use the Internet. In public affairs, new opportunities for real-time insights into behavioral patterns are emerging but are bound by safeguards limiting government reach through the restriction of the collection and analysis of these data. To address both the opportunities and challenges of this emerging phenomenon, the authors first review the evolving canon of big data articles across related fields. Second, they derive a working definition of big data in public affairs. Third, they review the methodological and analytic challenges of using big data in public affairs scholarship and practice. The article concludes with implications for public affairs.

Journal Article

Share this book

Add to My Shelf

Determining an Appropriate Sample Size for Qualitative Interviews to Achieve True and Near Code Saturation: Secondary Analysis of Data

by Giombi, Kristen C , Amoozegar, Jacqueline , Williams, Peyton in Analysis , Biological products , Codes

2024

In-depth interviews are a common method of qualitative data collection, providing rich data on individuals' perceptions and behaviors that would be challenging to collect with quantitative methods. Researchers typically need to decide on sample size a priori. Although studies have assessed when saturation has been achieved, there is no agreement on the minimum number of interviews needed to achieve saturation. To date, most research on saturation has been based on in-person data collection. During the COVID-19 pandemic, web-based data collection became increasingly common, as traditional in-person data collection was possible. Researchers continue to use web-based data collection methods post the COVID-19 emergency, making it important to assess whether findings around saturation differ for in-person versus web-based interviews. We aimed to identify the number of web-based interviews needed to achieve true code saturation or near code saturation. The analyses for this study were based on data from 5 Food and Drug Administration-funded studies conducted through web-based platforms with patients with underlying medical conditions or with health care providers who provide primary or specialty care to patients. We extracted code- and interview-specific data and examined the data summaries to determine when true saturation or near saturation was reached. The sample size used in the 5 studies ranged from 30 to 70 interviews. True saturation was reached after 91% to 100% (n=30-67) of planned interviews, whereas near saturation was reached after 33% to 60% (n=15-23) of planned interviews. Studies that relied heavily on deductive coding and studies that had a more structured interview guide reached both true saturation and near saturation sooner. We also examined the types of codes applied after near saturation had been reached. In 4 of the 5 studies, most of these codes represented previously established core concepts or themes. Codes representing newly identified concepts, other or miscellaneous responses (eg, \"in general\"), uncertainty or confusion (eg, \"don't know\"), or categorization for analysis (eg, correct as compared with incorrect) were less commonly applied after near saturation had been reached. This study provides support that near saturation may be a sufficient measure to target and that conducting additional interviews after that point may result in diminishing returns. Factors to consider in determining how many interviews to conduct include the structure and type of questions included in the interview guide, the coding structure, and the population under study. Studies with less structured interview guides, studies that rely heavily on inductive coding and analytic techniques, and studies that include populations that may be less knowledgeable about the topics discussed may require a larger sample size to reach an acceptable level of saturation. Our findings also build on previous studies looking at saturation for in-person data collection conducted at a small number of sites.

Journal Article

Share this book

Add to My Shelf

Emerging Methodologies in Engineering Education Research

by Light, Gregory , Case, Jennifer M in Control Groups , Data Analysis , Data collection

2011

Methodology refers to the theoretical arguments that researchers use in order to justify their research methods and design. There is an extensive range of well established methodologies in the educational research literature of which a growing subset is beginning to be used in engineering education research. A more explicit engagement with methodologies, particularly those that are only emerging in engineering education research, is important so that engineering education researchers can broaden the set of research questions they are able to address. Seven methodologies are outlined and for each an exemplar paper is analyzed in order to demonstrate the methodology in operation and to highlight its particular contribution. The methodologies are: Case Study, Grounded Theory, Ethnography, Action Research, Phenomenography, Discourse Analysis, and Narrative Analysis. It is noted that many of the exemplar papers use some of these methodologies in combination. The exemplar papers show that collectively these methodologies might allow the research community to be able to better address questions around key engineering education challenges, such as students' responses to innovative pedagogies, diversity issues in engineering, and the changing requirements for engineering graduates in the twenty-first century.

Journal Article

Share this book

Add to My Shelf

Survey sampling design in wave 1 of the Global Flourishing Study

by Srinivasan, Rajesh , Han, Ying , Ritter, Zacc in Adult , Adults , Cardiology

2025

The Global Flourishing Study (GFS) is an international collaboration to develop a publicly accessible data resource to promote global research on human flourishing. These data include over 200,000 participants from 22 geographically and culturally diverse countries and one territory designed to be nationally representative of the adult population. The GFS is intended as a longitudinal panel study with recruitment and empanelment for Wave 1 occurring between April 2022 and December 2023. Future waves of data collection will invite participants to complete a survey annually. The annual survey covers a robust set of measures on well-being, health, social, economic, political, religious, spiritual, psychological and demographic variables. The current paper describes the sampling methodology and weighting approaches used to project the samples to be nationally representative. Details are provided on interviewer training and data collection, probability and non-probability samples, creating weights, design effects, and future data collection stages.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter