Catalogue Search | MBRL

Deciphering ocean carbon in a changing world

by Obernosterer, Ingrid , Hess, Nancy J. , Stubbins, Aron in "Earth, Atmospheric, and Planetary Sciences" , analytical chemistry , Biological Sciences

2016

Dissolved organic matter (DOM) in the oceans is one of the largest pools of reduced carbon on Earth, comparable in size to the atmospheric CO â reservoir. A vast number of compounds are present in DOM, and they play important roles in all major element cycles, contribute to the storage of atmospheric CO â in the ocean, support marine ecosystems, and facilitate interactions between organisms. At the heart of the DOM cycle lie molecular-level relationships between the individual compounds in DOM and the members of the ocean microbiome that produce and consume them. In the past, these connections have eluded clear definition because of the sheer numerical complexity of both DOM molecules and microorganisms. Emerging tools in analytical chemistry, microbiology, and informatics are breaking down the barriers to a fuller appreciation of these connections. Here we highlight questions being addressed using recent methodological and technological developments in those fields and consider how these advances are transforming our understanding of some of the most important reactions of the marine carbon cycle.

Journal Article

Share this book

Add to My Shelf

Wide-Open: Accelerating public data release by automating detection of overdue datasets

by Howe, Bill , Poon, Hoifung , Grechkin, Maxim in Access to Information , Animals , Application programming interface

2017

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.

Journal Article

Share this book

Add to My Shelf

Know Your Limits: A Survey of Abstention in Large Language Models

by Yao, Jihan , Feng, Shangbin , Xu, Chenjun

2025

Abstention, the refusal of large language models (LLMs) to provide an answer, is increasingly recognized for its potential to mitigate hallucinations and enhance safety in LLM systems. In this survey, we introduce a framework to examine abstention from three perspectives: the query, the model, and human values. We organize the literature on abstention methods, benchmarks, and evaluation metrics using this framework, and discuss merits and limitations of prior work. We further identify and motivate areas for future research, such as whether abstention can be achieved as a meta-capability that transcends specific tasks or domains, and opportunities to optimize abstention abilities in specific contexts. In doing so, we aim to broaden the scope and impact of abstention methodologies in AI systems.

Journal Article

Share this book

Add to My Shelf

The principles of tomorrow's university version 1; peer review: 2 approved

by Boettiger, Carl , Reed, Daniel A , Killeen, Timothy L in Bibliometrics , Career Choice , Careers

2018

In the 21st Century, research is increasingly data- and computation-driven. Researchers, funders, and the larger community today emphasize the traits of openness and reproducibility. In March 2017, 13 mostly early-career research leaders who are building their careers around these traits came together with ten university leaders (presidents, vice presidents, and vice provosts), representatives from four funding agencies, and eleven organizers and other stakeholders in an NIH- and NSF-funded one-day, invitation-only workshop titled \"Imagining Tomorrow's University.\" Workshop attendees were charged with launching a new dialog around open research - the current status, opportunities for advancement, and challenges that limit sharing. The workshop examined how the internet-enabled research world has changed, and how universities need to change to adapt commensurately, aiming to understand how universities can and should make themselves competitive and attract the best students, staff, and faculty in this new world. During the workshop, the participants re-imagined scholarship, education, and institutions for an open, networked era, to uncover new opportunities for universities to create value and serve society. They expressed the results of these deliberations as a set of 22 principles of tomorrow's university across six areas: credit and attribution, communities, outreach and engagement, education, preservation and reproducibility, and technologies. Activities that follow on from workshop results take one of three forms. First, since the workshop, a number of workshop authors have further developed and published their white papers to make their reflections and recommendations more concrete. These authors are also conducting efforts to implement these ideas, and to make changes in the university system. Second, we plan to organise a follow-up workshop that focuses on how these principles could be implemented. Third, we believe that the outcomes of this workshop support and are connected with recent theoretical work on the position and future of open knowledge institutions.

Journal Article

Share this book

Add to My Shelf

Screening and Follow-Up Monitoring for Substance Use in Primary Care: An Exploration of Rural–Urban Variations

by Tieben, Hendrik , Chan, Ya-Fen , Lu, Shou-En in Community involvement , Community participation , Drug abuse

2016

BACKGROUNDRates of substance use in rural areas are close to those of urban areas. While recent efforts have emphasized integrated care as a promising model for addressing workforce shortages in providing behavioral health services to those living in medically underserved regions, little is known on how substance use problems are addressed in rural primary care settings.OBJECTIVETo examine rural–urban variations in screening and monitoring primary care- based patients for substance use problems in a state-wide mental health integration program.DESIGNThis was an observational study using patient registry.SUBJECTSThe study included adult enrollees (n = 15,843) with a mental disorder from 133 participating community health clinics.MAIN OUTCOMESWe measured whether a standardized substance use instrument was used to screen patients at treatment entry and to monitor symptoms at follow-up visits.KEY RESULTSWhile on average 73.6 % of patients were screened for substance use, follow-up on substance use problems after initial screening was low (41.4 %); clinics in small/isolated rural settings appeared to be the lowest (13.6 %). Patients who were treated for a mental disorder or substance abuse in the past and who showed greater psychiatric complexities were more likely to receive a screening, whereas patients of small, isolated rural clinics and those traveling longer distances to the care facility were least likely to receive follow-up monitoring for their substance use problems.CONCLUSIONSDespite the prevalent substance misuse among patients with mental disorders, opportunities to screen this high-risk population for substance use and provide a timely follow-up for those identified as at risk remained overlooked in both rural and urban areas. Rural residents continue to bear a disproportionate burden of substance use problems, with rural–urban disparities found to be most salient in providing the continuum of services for patients with substance use problems in primary care.

Journal Article

Share this book

Add to My Shelf

SARN: Structurally-Aware Recurrent Network for Spatio-Temporal Disaggregation

by Howe, Bill , Han, Bin in Open data , Sabotage , Spatiotemporal data

2024

Open data is frequently released spatially aggregated, usually to comply with privacy policies. But coarse, heterogeneous aggregations complicate learning and integration for downstream AI/ML systems. In this work, we consider models to disaggregate spatio-temporal data from a low-resolution, irregular partition (e.g., census tract) to a high-resolution, irregular partition (e.g., city block). We propose an overarching model named the Structurally-Aware Recurrent Network (SARN), which integrates structurally-aware spatial attention (SASA) layers into the Gated Recurrent Unit (GRU) model. The spatial attention layers capture spatial interactions among regions, while the gated recurrent module captures the temporal dependencies. Each SASA layer calculates both global and structural attention -- global attention facilitates comprehensive interactions between different geographic levels, while structural attention leverages the containment relationship between different geographic levels (e.g., a city block being wholly contained within a census tract) to ensure coherent and consistent results. For scenarios with limited historical training data, we explore transfer learning and show that a model pre-trained on one city variable can be fine-tuned for another city variable using only a few hundred samples. Evaluating these techniques on two mobility datasets, we find that on both datasets, SARN significantly outperforms other neural models (5% and 1%) and typical heuristic methods (40% and 14%), enabling us to generate realistic, high-quality fine-grained data for downstream applications.

Paper

Share this book

Add to My Shelf

Adapting to Skew: Imputing Spatiotemporal Urban Data with 3D Partial Convolutions and Biased Masking

by Howe, Bill , Han, Bin in Computer vision , Data collection , Data exchange

2023

We adapt image inpainting techniques to impute large, irregular missing regions in urban settings characterized by sparsity, variance in both space and time, and anomalous events. Missing regions in urban data can be caused by sensor or software failures, data quality issues, interference from weather events, incomplete data collection, or varying data use regulations; any missing data can render the entire dataset unusable for downstream applications. To ensure coverage and utility, we adapt computer vision techniques for image inpainting to operate on 3D histograms (2D space + 1D time) commonly used for data exchange in urban settings. Adapting these techniques to the spatiotemporal setting requires handling skew: urban data tend to follow population density patterns (small dense regions surrounded by large sparse areas); these patterns can dominate the learning process and fool the model into ignoring local or transient effects. To combat skew, we 1) train simultaneously in space and time, and 2) focus attention on dense regions by biasing the masks used for training to the skew in the data. We evaluate the core model and these two extensions using the NYC taxi data and the NYC bikeshare data, simulating different conditions for missing data. We show that the core model is effective qualitatively and quantitatively, and that biased masking during training reduces error in a variety of scenarios. We also articulate a tradeoff in varying the number of timesteps per training sample: too few timesteps and the model ignores transient events; too many timesteps and the model is slow to train with limited performance gain.

Paper

Share this book

Add to My Shelf

Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias

by Caliskan, Aylin , Howe, Bill , Yang, Yiwei in Bias , Body parts , Diffusion rate

2023

Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics, such as emotions, are disregarded and the person is treated as a body. We replicate three experiments in psychology quantifying sexual objectification and show that the phenomena persist in AI. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >0.80) and sadness (d >0.50), associating images of fully clothed subjects with emotions. GRAD-CAM saliency maps highlight that CLIP gets distracted from emotional expressions in objectified images. A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50% as often for images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of \"a [age] year old girl\" generates sexualized images (as determined by an NSFW classifier) up to 73% of the time for VQGAN-CLIP and Stable Diffusion; the corresponding rate for boys never surpasses 9%. The evidence indicates that language-vision AI models trained on web scrapes learn biases of sexual objectification, which propagate to downstream applications.

Paper

Share this book

Add to My Shelf

Reliable, Routable, and Reproducible: Collection of Pedestrian Pathways at Statewide Scale

by Zhang, Yuxiang , Caspi, Anat , Howe, Bill in Ad hoc networks , Aerial photography , Automation

2024

While advances in mobility technology including autonomous vehicles and multi-modal navigation systems can improve mobility equity for people with disabilities, these technologies depend crucially on accurate, standardized, and complete pedestrian path networks. Ad hoc collection efforts lead to a data record that is sparse, unreliable, and non-interoperable. This paper presents a sociotechnical methodology to collect, manage, serve, and maintain pedestrian path data at a statewide scale. Combining the automation afforded by computer-vision approaches applied to aerial imagery and existing road network data with the quality control afforded by interactive tools, we aim to produce routable pedestrian pathways for the entire State of Washington within approximately two years. We extract paths, crossings, and curb ramps at scale from aerial imagery, integrating multi-input segmentation methods with road topology data to ensure connected, routable networks. We then organize the predictions into project regions selected for their value to the public interest, where each project region is divided into intersection-scale tasks. These tasks are assigned and tracked through an interactive tool that manages concurrency, progress, feedback, and data management. We demonstrate that our automated systems outperform state-of-the-art methods in producing routable pathway networks, which then significantly reduces the time required for human vetting. Our results demonstrate the feasibility of yielding accurate, robust pedestrian pathway networks at the scale of an entire state. This paper intends to inform procedures for national-scale ADA compliance by providing pedestrian equity, safety, and accessibility, and improving urban environments for all users.

Paper

Share this book

Add to My Shelf

ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science

by Hiniker, Alexis , Wolfe, Robert , Howe, Bill in Bias , Embedding , Empirical analysis

2024

This research introduces the Multilevel Embedding Association Test (ML-EAT), a method designed for interpretable and transparent measurement of intrinsic bias in language technologies. The ML-EAT addresses issues of ambiguity and difficulty in interpreting the traditional EAT measurement by quantifying bias at three levels of increasing granularity: the differential association between two target concepts with two attribute concepts; the individual effect size of each target concept with two attribute concepts; and the association between each individual target concept and each individual attribute concept. Using the ML-EAT, this research defines a taxonomy of EAT patterns describing the nine possible outcomes of an embedding association test, each of which is associated with a unique EAT-Map, a novel four-quadrant visualization for interpreting the ML-EAT. Empirical analysis of static and diachronic word embeddings, GPT-2 language models, and a CLIP language-and-image model shows that EAT patterns add otherwise unobservable information about the component biases that make up an EAT; reveal the effects of prompting in zero-shot models; and can also identify situations when cosine similarity is an ineffective metric, rendering an EAT unreliable. Our work contributes a method for rendering bias more observable and interpretable, improving the transparency of computational investigations into human minds and societies.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter