Catalogue Search | MBRL

Designing for human–AI complementarity in K‐12 education

by Aleven, Vincent , Holstein, Kenneth in Artificial intelligence , Classrooms , Complementarity

2022

Recent work has explored how complementary strengths of humans and artificial intelligence (AI) systems might be productively combined. However, successful forms of human–AI partnership have rarely been demonstrated in real‐world settings. We present the iterative design and evaluation of Lumilo, smart glasses that help teachers help their students in AI‐supported classrooms by presenting real‐time analytics about students’ learning, metacognition, and behavior. Results from a field study conducted in K‐12 classrooms indicate that students learn more when teachers and AI tutors work together during class. We discuss implications of this research for the design of human–AI partnerships. We argue for more participatory approaches to research and design in this area, in which practitioners and other stakeholders are deeply, meaningfully involved throughout the process. Furthermore, we advocate for theory‐building and for principled approaches to the study of human–AI decision‐making in real‐world contexts.

Journal Article

Share this book

Add to My Shelf

in devalued work: The confluence of overconfidence in AI and underconfidence in worker expertise

by Jordan Taylor , Haiyi Zhu , Kenneth Holstein

2026

A growing body of literature has focused on understanding and addressing workplace artificial intelligence (AI) design failures. However, past work has largely overlooked the role of the devaluation of worker expertise in shaping the dynamics of AI development and deployment. In this paper, we examine the case of feminized labor: a class of devalued occupations historically misnomered as “women’s work,” such as social work, K-12 teaching, and home healthcare. Drawing on literature on AI deployments in feminized labor contexts, we conceptualize AI Failure Loops : a set of interwoven, sociotechnical failure modes that help explain how the systemic devaluation of workers’ expertise negatively impacts, and is impacted by, AI design, evaluation, and governance practices. These failures demonstrate how misjudgments on the automatability of workers’ skills can lead to AI deployments that fail to bring value to workers and, instead, further diminish the visibility of workers’ expertise. We discuss research and design implications for workplace AI, especially for devalued occupations.

Journal Article

Share this book

Add to My Shelf

AI failure loops in devalued work: The confluence of overconfidence in AI and underconfidence in worker expertise

by Taylor, Jordan , Fox, Sarah , Zhu, Haiyi in Artificial intelligence , Deployment , Devaluation

2026

A growing body of literature has focused on understanding and addressing workplace artificial intelligence (AI) design failures. However, past work has largely overlooked the role of the devaluation of worker expertise in shaping the dynamics of AI development and deployment. In this paper, we examine the case of feminized labor: a class of devalued occupations historically misnomered as “women’s work,” such as social work, K-12 teaching, and home healthcare. Drawing on literature on AI deployments in feminized labor contexts, we conceptualize AI Failure Loops: a set of interwoven, sociotechnical failure modes that help explain how the systemic devaluation of workers’ expertise negatively impacts, and is impacted by, AI design, evaluation, and governance practices. These failures demonstrate how misjudgments on the automatability of workers’ skills can lead to AI deployments that fail to bring value to workers and, instead, further diminish the visibility of workers’ expertise. We discuss research and design implications for workplace AI, especially for devalued occupations.

Journal Article

Share this book

Add to My Shelf

Funding AI for Good: A Call for Meaningful Engagement

by Lin, Hongjin , D'Ignazio, Catherine , Gajos, Krzysztof in Artificial intelligence , Documents , Funding

2026

Artificial Intelligence for Social Good (AI4SG) is a growing area that explores AI's potential to address social issues, such as public health. Yet prior work has shown limited evidence of its tangible benefits for intended communities, and projects frequently face real-world deployment and sustainability challenges. While existing HCI literature on AI4SG initiatives primarily focuses on the mechanisms of funded projects and their outcomes, much less attention has been given to the upstream funding agendas that influence project approaches. In this work, we conducted a reflexive thematic analysis of 35 funding documents, representing about $410 million USD in total investments. We uncovered a spectrum of conceptual framings of AI4SG and the approaches that funding rhetoric promoted: from biasing towards technology capacities (more techno-centric) to emphasizing contextual understanding of the social problems at hand alongside technology capacities (more balanced). Drawing on our findings on how funding documents construct AI4SG, we offer recommendations for funders to embed more balanced approaches in future funding call designs. We further discuss implications for how the HCI community can positively shape AI4SG funding design processes.

Paper

Share this book

Add to My Shelf

Exploring the Potential of Metacognitive Support Agents for Human-AI Co-Creation

by Luo, Kaitao , Martelaro, Nikolas , Wang, Ye in Design criteria , Generative artificial intelligence , Prototyping

2025

Despite the potential of generative AI (GenAI) design tools to enhance design processes, professionals often struggle to integrate AI into their workflows. Fundamental cognitive challenges include the need to specify all design criteria as distinct parameters upfront (intent formulation) and designers' reduced cognitive involvement in the design process due to cognitive offloading, which can lead to insufficient problem exploration, underspecification, and limited ability to evaluate outcomes. Motivated by these challenges, we envision novel metacognitive support agents that assist designers in working more reflectively with GenAI. To explore this vision, we conducted exploratory prototyping through a Wizard of Oz elicitation study with 20 mechanical designers probing multiple metacognitive support strategies. We found that agent-supported users created more feasible designs than non-supported users, with differing impacts between support strategies. Based on these findings, we discuss opportunities and tradeoffs of metacognitive support agents and considerations for future AI-based design tools.

Paper

Share this book

Add to My Shelf

The Situate AI Guidebook: Co-Designing a Toolkit to Support Multi-Stakeholder Early-stage Deliberations Around Public Sector AI Proposals

by Heidari, Hoda , Coston, Amanda , Zhu, Haiyi in Co-design , Constraint modelling , Decision making

2024

Public sector agencies are rapidly deploying AI systems to augment or automate critical decisions in real-world contexts like child welfare, criminal justice, and public health. A growing body of work documents how these AI systems often fail to improve services in practice. These failures can often be traced to decisions made during the early stages of AI ideation and design, such as problem formulation. However, today, we lack systematic processes to support effective, early-stage decision-making about whether and under what conditions to move forward with a proposed AI project. To understand how to scaffold such processes in real-world settings, we worked with public sector agency leaders, AI developers, frontline workers, and community advocates across four public sector agencies and three community advocacy groups in the United States. Through an iterative co-design process, we created the Situate AI Guidebook: a structured process centered around a set of deliberation questions to scaffold conversations around (1) goals and intended use or a proposed AI system, (2) societal and legal considerations, (3) data and modeling constraints, and (4) organizational governance factors. We discuss how the guidebook's design is informed by participants' challenges, needs, and desires for improved deliberation processes. We further elaborate on implications for designing responsible AI toolkits in collaboration with public sector agency stakeholders and opportunities for future work to expand upon the guidebook. This design approach can be more broadly adopted to support the co-creation of responsible AI toolkits that scaffold key decision-making processes surrounding the use of AI in the public sector and beyond.

Paper

Share this book

Add to My Shelf

Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors

by Lam, Michelle S , Hohman, Fred , Moritz, Dominik in Behavior , Cartography , Computer graphics

2025

AI policy sets boundaries on acceptable behavior for AI models, but this is challenging in the context of large language models (LLMs): how do you ensure coverage over a vast behavior space? We introduce policy maps, an approach to AI policy design inspired by the practice of physical mapmaking. Instead of aiming for full coverage, policy maps aid effective navigation through intentional design choices about which aspects to capture and which to abstract away. With Policy Projector, an interactive tool for designing LLM policy maps, an AI practitioner can survey the landscape of model input-output pairs, define custom regions (e.g., \"violence\"), and navigate these regions with if-then policy rules that can act on LLM outputs (e.g., if output contains \"violence\" and \"graphic details,\" then rewrite without \"graphic details\"). Policy Projector supports interactive policy authoring using LLM classification and steering and a map visualization reflecting the AI practitioner's work. In an evaluation with 12 AI safety experts, our system helps policy designers craft policies around problematic model behaviors such as incorrect gender assumptions and handling of immediate physical safety threats.

Paper

Share this book

Add to My Shelf

Botender: Supporting Communities in Collaboratively Designing AI Agents through Case-Based Provocations

by Zhang, Amy X , Liu, Sophia , Holstein, Kenneth in Agents (artificial intelligence)

2026

AI agents, or bots, serve important roles in online communities. However, they are often designed by outsiders or a few tech-savvy members, leading to bots that may not align with the broader community's needs. How might communities collectively shape the behavior of community bots? We present Botender, a system that enables communities to collaboratively design LLM-powered bots without coding. With Botender, community members can directly propose, iterate on, and deploy custom bot behaviors tailored to community needs. Botender facilitates testing and iteration on bot behavior through case-based provocations: interaction scenarios generated to spark user reflection and discussion around desirable bot behavior. A validation study found these provocations more useful than standard test cases for revealing improvement opportunities and surfacing disagreements. During a five-day deployment across six Discord servers, Botender supported communities in tailoring bot behavior to their specific needs, showcasing the usefulness of case-based provocations in facilitating collaborative bot design.

Paper

Share this book

Add to My Shelf

Prototyping Multimodal GenAI Real-Time Agents with Counterfactual Replays and Hybrid Wizard-of-Oz

by Martelaro, Nikolas , Holstein, Kenneth , Gmeiner, Frederic in Generative artificial intelligence , Prototyping , Real time

2025

Recent advancements in multimodal generative AI (GenAI) enable the creation of personal context-aware real-time agents that, for example, can augment user workflows by following their on-screen activities and providing contextual assistance. However, prototyping such experiences is challenging, especially when supporting people with domain-specific tasks using real-time inputs such as speech and screen recordings. While prototyping an LLM-based proactive support agent system, we found that existing prototyping and evaluation methods were insufficient to anticipate the nuanced situational complexity and contextual immediacy required. To overcome these challenges, we explored a novel user-centered prototyping approach that combines counterfactual video replay prompting and hybrid Wizard-of-Oz methods to iteratively design and refine agent behaviors. This paper discusses our prototyping experiences, highlighting successes and limitations, and offers a practical guide and an open-source toolkit for UX designers, HCI researchers, and AI toolmakers to build more user-centered and context-aware multimodal agents.

Paper

Share this book

Add to My Shelf

Counterfactual Prediction Under Outcome Measurement Error

by Guerdan, Luke , Coston, Amanda , Wu, Zhiwei Steven in Decision making , Domains , Employment

2023

Across domains such as medicine, employment, and criminal justice, predictive models often target labels that imperfectly reflect the outcomes of interest to experts and policymakers. For example, clinical risk assessments deployed to inform physician decision-making often predict measures of healthcare utilization (e.g., costs, hospitalization) as a proxy for patient medical need. These proxies can be subject to outcome measurement error when they systematically differ from the target outcome they are intended to measure. However, prior modeling efforts to characterize and mitigate outcome measurement error overlook the fact that the decision being informed by a model often serves as a risk-mitigating intervention that impacts the target outcome of interest and its recorded proxy. Thus, in these settings, addressing measurement error requires counterfactual modeling of treatment effects on outcomes. In this work, we study intersectional threats to model reliability introduced by outcome measurement error, treatment effects, and selection bias from historical decision-making policies. We develop an unbiased risk minimization method which, given knowledge of proxy measurement error properties, corrects for the combined effects of these challenges. We also develop a method for estimating treatment-dependent measurement error parameters when these are unknown in advance. We demonstrate the utility of our approach theoretically and via experiments on real-world data from randomized controlled trials conducted in healthcare and employment domains. As importantly, we demonstrate that models correcting for outcome measurement error or treatment effects alone suffer from considerable reliability limitations. Our work underscores the importance of considering intersectional threats to model validity during the design and evaluation of predictive models for decision support.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter