Catalogue Search | MBRL

Audit Cards: Contextualizing AI Evaluations

by Casper, Stephen , Anka Reuel , Staufer, Leon in Audits , Best practice , Cards

2025

AI governance frameworks increasingly rely on audits, yet the results of their underlying evaluations require interpretation and context to be meaningfully informative. Even technically rigorous evaluations can offer little useful insight if reported selectively or obscurely. Current literature focuses primarily on technical best practices, but evaluations are an inherently sociotechnical process, and there is little guidance on reporting procedures and context. Through literature review, stakeholder interviews, and analysis of governance frameworks, we propose \"audit cards\" to make this context explicit. We identify six key types of contextual features to report and justify in audit cards: auditor identity, evaluation scope, methodology, resource access, process integrity, and review mechanisms. Through analysis of existing evaluation reports, we find significant variation in reporting practices, with most reports omitting crucial contextual information such as auditors' backgrounds, conflicts of interest, and the level and type of access to models. We also find that most existing regulations and frameworks lack guidance on rigorous reporting. In response to these shortcomings, we argue that audit cards can provide a structured format for reporting key claims alongside their justifications, enhancing transparency, facilitating proper interpretation, and establishing trust in reporting.

Paper

Share this book

Add to My Shelf

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

by Kolt, Noam , A Pinar Ozisik , Feng, Kevin in Agentic artificial intelligence , Agents (artificial intelligence)

2026

Agentic AI systems are increasingly capable of performing professional and personal tasks with limited human involvement. However, tracking these developments is difficult because the AI agent ecosystem is complex, rapidly evolving, and inconsistently documented, posing obstacles to both researchers and policymakers. To address these challenges, this paper presents the 2025 AI Agent Index. The Index documents information regarding the origins, design, capabilities, ecosystem, and safety features of 30 state-of-the-art AI agents based on publicly available information and email correspondence with developers. In addition to documenting information about individual agents, the Index illuminates broader trends in the development of agents, their capabilities, and the level of transparency of developers. Notably, we find different transparency levels among agent developers and observe that most developers share little information about safety, evaluations, and societal impacts. The 2025 AI Agent Index is available online at https://aiagentindex.mit.edu

Paper

Share this book

Add to My Shelf

Mapping Industry Practices to the EU AI Act's GPAI Code of Practice Safety and Security Measures

by Ze Shen Chin , Gil, Ariel , Rokas Gipiškis in Codes of Practice , Documents , Security

2025

This report provides a detailed comparison between the Safety and Security measures proposed in the EU AI Act's General-Purpose AI (GPAI) Code of Practice (Third Draft) and the current commitments and practices voluntarily adopted by leading AI companies. As the EU moves toward enforcing binding obligations for GPAI model providers, the Code of Practice will be key for bridging legal requirements with concrete technical commitments. Our analysis focuses on the draft's Safety and Security section (Commitments II.1-II.16), documenting excerpts from current public-facing documents that are relevant to each individual measure. We systematically reviewed different document types, such as companies' frontier safety frameworks and model cards, from over a dozen companies, including OpenAI, Anthropic, Google DeepMind, Microsoft, Meta, Amazon, and others. This report is not meant to be an indication of legal compliance, nor does it take any prescriptive viewpoint about the Code of Practice or companies' policies. Instead, it aims to inform the ongoing dialogue between regulators and General-Purpose AI model providers by surfacing evidence of industry precedent for various measures. Nonetheless, we were able to find relevant quotes from at least 5 companies' documents for the majority of the measures in Commitments II.1-II.16.

Paper

Share this book

Add to My Shelf

DenseAnnotate: Enabling Scalable Dense Caption Collection for Images and 3D Scenes via Spoken Descriptions

by Lama, Monica , Callison-Burch, Chris , Hong, Alan B in Annotations , Datasets , Large language models

2025

With the rapid adoption of multimodal large language models (MLLMs) across diverse applications, there is a pressing need for task-centered, high-quality training data. A key limitation of current training datasets is their reliance on sparse annotations mined from the Internet or entered via manual typing that capture only a fraction of an image's visual content. Dense annotations are more valuable but remain scarce. Traditional text-based annotation pipelines are poorly suited for creating dense annotations: typing limits expressiveness, slows annotation speed, and underrepresents nuanced visual features, especially in specialized areas such as multicultural imagery and 3D asset annotation. In this paper, we present DenseAnnotate, an audio-driven online annotation platform that enables efficient creation of dense, fine-grained annotations for images and 3D assets. Annotators narrate observations aloud while synchronously linking spoken phrases to image regions or 3D scene parts. Our platform incorporates speech-to-text transcription and region-of-attention marking. To demonstrate the effectiveness of DenseAnnotate, we conducted case studies involving over 1,000 annotators across two domains: culturally diverse images and 3D scenes. We curate a human-annotated multi-modal dataset of 3,531 images, 898 3D scenes, and 7,460 3D objects, with audio-aligned dense annotations in 20 languages, including 8,746 image captions, 2,000 scene captions, and 19,000 object captions. Models trained on this dataset exhibit improvements of 5% in multilingual, 47% in cultural alignment, and 54% in 3D spatial capabilities. Our results show that our platform offers a feasible approach for future vision-language research and can be applied to various tasks and diverse types of data.

Paper

Share this book

Add to My Shelf

The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems

by Ren, Richard , Barrass, Isabelle , Yin, Xuwang in Accuracy , Benchmarks , Honesty

2026

As large language models (LLMs) become more capable and agentic, the requirement for trust in their outputs grows significantly, yet at the same time concerns have been mounting that models may learn to lie in pursuit of their goals. To address these concerns, a body of work has emerged around the notion of \"honesty\" in LLMs, along with interventions aimed at mitigating deceptive behaviors. However, some benchmarks claiming to measure honesty in fact simply measure accuracy--the correctness of a model's beliefs--in disguise. Moreover, no benchmarks currently exist for directly measuring whether language models lie. In this work, we introduce a large-scale human-collected dataset for directly measuring lying, allowing us to disentangle accuracy from honesty. Across a diverse set of LLMs, we find that while larger models obtain higher accuracy on our benchmark, they do not become more honest. Surprisingly, most frontier LLMs obtain high scores on truthfulness benchmarks yet exhibit a substantial propensity to lie under pressure, resulting in low honesty scores on our benchmark. We find that simple methods, such as representation engineering interventions, can improve honesty. These results underscore the growing need for robust evaluations and effective interventions to ensure LLMs remain trustworthy.

Paper

Share this book

Add to My Shelf

Relationship of DNA Methylation and Gene Expression in Idiopathic Pulmonary Fibrosis

by Schwarz, Marvin I. , Zhang, Yingze , Schwartz, David A. in Adrenal Cortex Hormones - therapeutic use , Arrays , Cancer

2014

Idiopathic pulmonary fibrosis (IPF) is an untreatable and often fatal lung disease that is increasing in prevalence and is caused by complex interactions between genetic and environmental factors. Epigenetic mechanisms control gene expression and are likely to regulate the IPF transcriptome. To identify methylation marks that modify gene expression in IPF lung. We assessed DNA methylation (comprehensive high-throughput arrays for relative methylation arrays [CHARM]) and gene expression (Agilent gene expression arrays) in 94 patients with IPF and 67 control subjects, and performed integrative genomic analyses to define methylation-gene expression relationships in IPF lung. We validated methylation changes by a targeted analysis (Epityper), and performed functional validation of one of the genes identified by our analysis. We identified 2,130 differentially methylated regions (DMRs; <5% false discovery rate), of which 738 are associated with significant changes in gene expression and enriched for expected inverse relationship between methylation and expression (P < 2.2 × 10(-16)). We validated 13/15 DMRs by targeted analysis of methylation. Methylation-expression quantitative trait loci (methyl-eQTL) identified methylation marks that control cis and trans gene expression, with an enrichment for cis relationships (P < 2.2 × 10(-16)). We found five trans methyl-eQTLs where a methylation change at a single DMR is associated with transcriptional changes in a substantial number of genes; four of these DMRs are near transcription factors (castor zinc finger 1 [CASZ1], FOXC1, MXD4, and ZDHHC4). We studied the in vitro effects of change in CASZ1 expression and validated its role in regulation of target genes in the methyl-eQTL. These results suggest that DNA methylation may be involved in the pathogenesis of IPF.

Journal Article

Share this book

Add to My Shelf

Sotigalimab and/or nivolumab with chemotherapy in first-line metastatic pancreatic cancer: clinical and immunologic analyses from the randomized phase 2 PRINCE trial

by Byrne, Katelyn T. , Fairchild, Justin , LaVallee, Theresa M. in 631/67/1059/2325 , 631/67/1504/1713 , 631/67/1857

2022

Chemotherapy combined with immunotherapy has improved the treatment of certain solid tumors, but effective regimens remain elusive for pancreatic ductal adenocarcinoma (PDAC). We conducted a randomized phase 2 trial evaluating the efficacy of nivolumab (nivo; anti-PD-1) and/or sotigalimab (sotiga; CD40 agonistic antibody) with gemcitabine/nab-paclitaxel (chemotherapy) in patients with first-line metastatic PDAC ( NCT03214250 ). In 105 patients analyzed for efficacy, the primary endpoint of 1-year overall survival (OS) was met for nivo/chemo (57.7%, P = 0.006 compared to historical 1-year OS of 35%, n = 34) but was not met for sotiga/chemo (48.1%, P = 0.062, n = 36) or sotiga/nivo/chemo (41.3%, P = 0.223, n = 35). Secondary endpoints were progression-free survival, objective response rate, disease control rate, duration of response and safety. Treatment-related adverse event rates were similar across arms. Multi-omic circulating and tumor biomarker analyses identified distinct immune signatures associated with survival for nivo/chemo and sotiga/chemo. Survival after nivo/chemo correlated with a less suppressive tumor microenvironment and higher numbers of activated, antigen-experienced circulating T cells at baseline. Survival after sotiga/chemo correlated with greater intratumoral CD4 T cell infiltration and circulating differentiated CD4 T cells and antigen-presenting cells. A patient subset benefitting from sotiga/nivo/chemo was not identified. Collectively, these analyses suggest potential treatment-specific correlates of efficacy and may enable biomarker-selected patient populations in subsequent PDAC chemoimmunotherapy trials. In a randomized phase 2 trial, sotigalimab, a CD40 agonist, did not significantly improve overall survival in patients with previously untreated metastatic pancreatic cancer when combined with chemotherapy or with nivolumab and chemotherapy. Multi-omic exploratory analyses provide insights into immunologic features associated with clinical benefit.

Journal Article

Share this book

Add to My Shelf

A new leafhopper genus and one new species of the genus Edwardsiana of Typhlocybini (Hemiptera, Cicadellidae, Typhlocybinae) from China

by Yan, Bin , Yang, Mao-Fa , Li, Zi-Zhong in Analysis , Asia , China Seas

2025

The mukariine species Mohunia biguttata (Wang & Li, 2003) had been transferred to the subfamily Typhlocybinae, but its tribal placement and generic status remained uncertain. In this study, a new genus, Ommatocyba Yan, Yang & Webb, gen. nov. is erected for Mohunia biguttata as Ommatocyba biguttata (Wang & Li), comb. nov. , and placed in the tribe Typhlocybini, new placement , based on wing venation. In addition, a new species of Edwardsiana Zachvatkin, Edwardsiana wanglangensis Yan, Yang & Webb, sp. nov. (Typhlocybini) from Sichuan, China, is described and illustrated, and a key is provided for its separation.

Journal Article

Share this book

Add to My Shelf

Early loss of mitochondrial complex I and rewiring of glutathione metabolism in renal oncocytoma

by Oliva, Esther , Gopal, Raj K. , Mick, Eran in Adenoma, Oxyphilic - genetics , Adenoma, Oxyphilic - metabolism , Adenoma, Oxyphilic - pathology

2018

Renal oncocytomas are benign tumors characterized by a marked accumulation of mitochondria. We report a combined exome, transcriptome, and metabolome analysis of these tumors. Joint analysis of the nuclear and mitochondrial (mtDNA) genomes reveals loss-of-function mtDNA mutations occurring at high variant allele fractions, consistent with positive selection, in genes encoding complex I as the most frequent genetic events. A subset of these tumors also exhibits chromosome 1 loss and/or cyclin D1 overexpression, suggesting they follow complex I loss. Transcriptome data revealed that many pathways previously reported to be altered in renal oncocytoma were simply differentially expressed in the tumor’s cell of origin, the distal nephron, compared with other nephron segments. Using a heuristic approach to account for cell-of-origin bias we uncovered strong expression alterations in the gamma-glutamyl cycle, including glutathione synthesis (increased GCLC) and glutathione degradation. Moreover, the most striking changes in metabolite profiling were elevations in oxidized and reduced glutathione as well as γ-glutamyl-cysteine and cysteinyl-glycine, dipeptide intermediates in glutathione biosynthesis, and recycling, respectively. Biosynthesis of glutathione appears adaptive as blockade of GCLC impairs viability in cells cultured with a complex I inhibitor. Our data suggest that loss-of-function mutations in complex I are a candidate driver event in renal oncocytoma that is followed by frequent loss of chromosome 1, cyclin D1 overexpression, and adaptive up-regulation of glutathione biosynthesis.

Journal Article

Share this book

Add to My Shelf

Improved reference genome of Aedes aegypti informs arbovirus vector control

by Sharma, Atashi , Akbari, Omar S. , Glassford, William J. in 45/23 , 45/43 , 631/181/457

2018

Female Aedes aegypti mosquitoes infect more than 400 million people each year with dangerous viral pathogens including dengue, yellow fever, Zika and chikungunya. Progress in understanding the biology of mosquitoes and developing the tools to fight them has been slowed by the lack of a high-quality genome assembly. Here we combine diverse technologies to produce the markedly improved, fully re-annotated AaegL5 genome assembly, and demonstrate how it accelerates mosquito science. We anchored physical and cytogenetic maps, doubled the number of known chemosensory ionotropic receptors that guide mosquitoes to human hosts and egg-laying sites, provided further insight into the size and composition of the sex-determining M locus, and revealed copy-number variation among glutathione S -transferase genes that are important for insecticide resistance. Using high-resolution quantitative trait locus and population genomic analyses, we mapped new candidates for dengue vector competence and insecticide resistance. AaegL5 will catalyse new biological insights and intervention strategies to fight this deadly disease vector. An improved, fully re-annotated Aedes aegypti genome assembly (AaegL5) provides insights into the sex-determining M locus, chemosensory systems that help mosquitoes to hunt humans and loci involved in insecticide resistance and will help to generate intervention strategies to fight this deadly disease vector.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter