Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
23
result(s) for
"Hecht, Brent"
Sort by:
The effects of remote work on collaboration among information workers
2022
The coronavirus disease 2019 (COVID-19) pandemic caused a rapid shift to full-time remote work for many information workers. Viewing this shift as a natural experiment in which some workers were already working remotely before the pandemic enables us to separate the effects of firm-wide remote work from other pandemic-related confounding factors. Here, we use rich data on the emails, calendars, instant messages, video/audio calls and workweek hours of 61,182 US Microsoft employees over the first six months of 2020 to estimate the causal effects of firm-wide remote work on collaboration and communication. Our results show that firm-wide remote work caused the collaboration network of workers to become more static and siloed, with fewer bridges between disparate parts. Furthermore, there was a decrease in synchronous communication and an increase in asynchronous communication. Together, these effects may make it harder for employees to acquire and share new information across the network.Using a large dataset of workers’ technology use from before and after the COVID-19 pandemic, Yang et al. find that firm-wide remote work caused the collaboration networks of information workers to become more static and siloed and communication to shift to more asynchronous media.
Journal Article
The Mining and Application of Diverse Cultural Perspectives in User-Generated Content
2013
Wikipedia articles, tweets, and other forms of user-generated content (UGC) play an essential role in the experience of the average Web user. Outside the public eye, UGC has become equally indispensable as a source of world knowledge for systems and algorithms that help us make sense of big data. In this thesis, we demonstrate that UGC reflects the cultural diversity of its contributors to a previously unidentified extent, and that this diversity has important implications for Web users and existing UGC-based technologies. Focusing on Wikipedia, Flickr, and Twitter, we show how UGC diversity can be extracted and measured using techniques from artificial intelligence and geographic information science. Finally, through two novel applications—Omnipedia and Atlasify—we highlight the exciting potential for a new class of technologies enabled by the ability to harvest diverse perspectives from UGC.
Dissertation
A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training
by
Vincent, Nicholas
,
Hecht, Brent
,
McDonald, Allison
in
Coal mines
,
Coal mining
,
Jewish Americans
2024
Systemic property dispossession from minority groups has often been carried out in the name of technological progress. In this paper, we identify evidence that the current paradigm of large language models (LLMs) likely continues this long history. Examining common LLM training datasets, we find that a disproportionate amount of content authored by Jewish Americans is used for training without their consent. The degree of over-representation ranges from around 2x to around 6.5x. Given that LLMs may substitute for the paid labor of those who produced their training data, they have the potential to cause even more substantial and disproportionate economic harm to Jewish Americans in the coming years. This paper focuses on Jewish Americans as a case study, but it is probable that other minority communities (e.g., Asian Americans, Hindu Americans) may be similarly affected and, most importantly, the results should likely be interpreted as a \"canary in the coal mine\" that highlights deep structural concerns about the current LLM paradigm whose harms could soon affect nearly everyone. We discuss the implications of these results for the policymakers thinking about how to regulate LLMs as well as for those in the AI field who are working to advance LLMs. Our findings stress the importance of working together towards alternative LLM paradigms that avoid both disparate impacts and widespread societal harms.
The Dimensions of Data Labor: A Road Map for Researchers, Activists, and Policymakers to Empower Data Producers
2023
Many recent technological advances (e.g. ChatGPT and search engines) are possible only because of massive amounts of user-generated data produced through user interactions with computing systems or scraped from the web (e.g. behavior logs, user-generated content, and artwork). However, data producers have little say in what data is captured, how it is used, or who it benefits. Organizations with the ability to access and process this data, e.g. OpenAI and Google, possess immense power in shaping the technology landscape. By synthesizing related literature that reconceptualizes the production of data for computing as ``data labor'', we outline opportunities for researchers, policymakers, and activists to empower data producers in their relationship with tech companies, e.g advocating for transparency about data reuse, creating feedback channels between data producers and companies, and potentially developing mechanisms to share data's revenue more broadly. In doing so, we characterize data labor with six important dimensions - legibility, end-use awareness, collaboration requirement, openness, replaceability, and livelihood overlap - based on the parallels between data labor and various other types of labor in the computing literature.
A Deeper Investigation of the Importance of Wikipedia Links to the Success of Search Engines
2020
A growing body of work has highlighted the important role that Wikipedia's volunteer-created content plays in helping search engines achieve their core goal of addressing the information needs of millions of people. In this paper, we report the results of an investigation into the incidence of Wikipedia links in search engine results pages (SERPs). Our results extend prior work by considering three U.S. search engines, simulating both mobile and desktop devices, and using a spatial analysis approach designed to study modern SERPs that are no longer just \"ten blue links\". We find that Wikipedia links are extremely common in important search contexts, appearing in 67-84% of all SERPs for common and trending queries, but less often for medical queries. Furthermore, we observe that Wikipedia links often appear in \"Knowledge Panel\" SERP elements and are in positions visible to users without scrolling, although Wikipedia appears less in prominent positions on mobile devices. Our findings reinforce the complementary notions that (1) Wikipedia content and research has major impact outside of the Wikipedia domain and (2) powerful technologies like search engines are highly reliant on free content created by volunteers.
Behavioral Use Licensing for Responsible AI
by
Hecht, Brent
,
McDuff, Daniel
,
Contractor, Danish
in
Algorithms
,
Artificial intelligence
,
Ethical standards
2022
With the growing reliance on artificial intelligence (AI) for many different applications, the sharing of code, data, and models is important to ensure the replicability and democratization of scientific knowledge. Many high-profile academic publishing venues expect code and models to be submitted and released with papers. Furthermore, developers often want to release these assets to encourage development of technology that leverages their frameworks and services. A number of organizations have expressed concerns about the inappropriate or irresponsible use of AI and have proposed ethical guidelines around the application of such systems. While such guidelines can help set norms and shape policy, they are not easily enforceable. In this paper, we advocate the use of licensing to enable legally enforceable behavioral use conditions on software and code and provide several case studies that demonstrate the feasibility of behavioral use licensing. We envision how licensing may be implemented in accordance with existing responsible AI guidelines.
All That's Happening behind the Scenes: Putting the Spotlight on Volunteer Moderator Labor in Reddit
2022
Online volunteers are an uncompensated yet valuable labor force for many social platforms. For example, volunteer content moderators perform a vast amount of labor to maintain online communities. However, as social platforms like Reddit favor revenue generation and user engagement, moderators are under-supported to manage the expansion of online communities. To preserve these online communities, developers and researchers of social platforms must account for and support as much of this labor as possible. In this paper, we quantitatively characterize the publicly visible and invisible actions taken by moderators on Reddit, using a unique dataset of private moderator logs for 126 subreddits and over 900 moderators. Our analysis of this dataset reveals the heterogeneity of moderation work across both communities and moderators. Moreover, we find that analyzing only visible work - the dominant way that moderation work has been studied thus far - drastically underestimates the amount of human moderation labor on a subreddit. We discuss the implications of our results on content moderation research and social platforms.
Measuring the Monetary Value of Online Volunteer Work
2022
Online volunteers are a crucial labor force that keeps many for-profit systems afloat (e.g. social media platforms and online review sites). Despite their substantial role in upholding highly valuable technological systems, online volunteers have no way of knowing the value of their work. This paper uses content moderation as a case study and measures its monetary value to make apparent volunteer labor's value. Using a novel dataset of private logs generated by moderators, we use linear mixed-effect regression and estimate that Reddit moderators worked a minimum of 466 hours per day in 2020. These hours amount to 3.4 million USD a year based on the median hourly wage for comparable content moderation services in the U.S. We discuss how this information may inform pathways to alleviate the one-sided relationship between technology companies and online volunteers.