Source author record

Michael S. Bernstein

Michael S. Bernstein appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Human-Computer Interaction Social and Information Networks Artificial Intelligence Computer Vision cs.CY Machine Learning Applications Computation and Language physics.soc-ph Programming Languages

Catalog footprint

What is connected

18works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A "Distance Matters" Paradox: Facilitating Intra-Team Collaboration Can Harm Inter-Team Collaboration

By identifying the socio-technical conditions required for teams to work effectively remotely, the Distance Matters framework has been influential in CSCW since its introduction in 2000. Advances in collaboration technology and practices have since brought teams increasingly closer to achieving these conditions. This paper presents a ten-month ethnography in a remote organization, where we observed that despite exhibiting excellent remote collaboration, teams paradoxically struggled to collaborate across team boundaries. We extend the Distance Matters framework to account for inter-team collaboration, arguing that challenges analogous to those in the original intra-team framework -- common ground, collaboration readiness, collaboration technology readiness, and coupling of work -- persist but are actualized differently at the inter-team scale. Finally, we identify a fundamental tension between the intra- and inter-team layers: the collaboration technology and practices that help individual teams thrive (e.g., adopting customized collaboration software) can also prompt collaboration challenges in the inter-team layer, and conversely the technology and practices that facilitate inter-team collaboration (e.g., strong centralized IT organizations) can harm practices at the intra-team layer. The addition of the inter-team layer to the Distance Matters framework opens new opportunities for CSCW, where balancing the tension between team and organizational collaboration needs will be a critical technological, operational, and organizational challenge for remote work in the coming decades.

preprint2022arXiv

A Web-Scale Analysis of the Community Origins of Image Memes

Where do the most popular online cultural artifacts such as image memes originate? Media narratives suggest that cultural innovations often originate in peripheral communities and then diffuse to the mainstream core; behavioral science suggests that intermediate network positions that bridge between the periphery and the core are especially likely to originate many influential cultural innovations. Research has yet to fully adjudicate between these predictions because prior work focuses on individual platforms such as Twitter; however, any single platform is only a small, incomplete part of the larger online cultural ecosystem. In this paper, we perform the first analysis of the origins and diffusion of image memes at web scale, via a one-month crawl of all indexible online communities that principally share meme images with English text overlays. Our results suggest that communities at the core of the network originate the most highly diffused image memes: the top 10% of communities by network centrality originate the memes that generate 62% of the image meme diffusion events on the web. A zero-inflated negative binomial regression confirms that memes from core communities are more likely to diffuse than those from peripheral communities even when controlling for community size and activity level. However, a replication analysis that follows the traditional approach of testing the same question only within a single large community, Reddit, finds the regression coefficients reversed -- underscoring the importance of engaging in web-scale, cross-community analyses. The ecosystem-level viewpoint of this work positions the web as a highly centralized generator of cultural artifacts such as image memes.

preprint2022arXiv

Jury Learning: Integrating Dissenting Voices into Machine Learning Models

Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups' labels. We introduce jury learning, a supervised ML approach that resolves these disagreements explicitly through the metaphor of a jury: defining which people or groups, in what proportion, determine the classifier's prediction. For example, a jury learning model for online toxicity might centrally feature women and Black jurors, who are commonly targets of online harassment. To enable jury learning, we contribute a deep learning architecture that models every annotator in a dataset, samples from annotators' models to populate the jury, then runs inference to classify. Our architecture enables juries that dynamically adapt their composition, explore counterfactuals, and visualize dissent.

preprint2022arXiv

Measuring the Prevalence of Anti-Social Behavior in Online Communities

With increasing attention to online anti-social behaviors such as personal attacks and bigotry, it is critical to have an accurate accounting of how widespread anti-social behaviors are. In this paper, we empirically measure the prevalence of anti-social behavior in one of the world's most popular online community platforms. We operationalize this goal as measuring the proportion of unmoderated comments in the 97 most popular communities on Reddit that violate eight widely accepted platform norms. To achieve this goal, we contribute a human-AI pipeline for identifying these violations and a bootstrap sampling method to quantify measurement uncertainty. We find that 6.25% (95% Confidence Interval [5.36%, 7.13%]) of all comments in 2016, and 4.28% (95% CI [2.50%, 6.26%]) in 2020-2021, are violations of these norms. Most anti-social behaviors remain unmoderated: moderators only removed one in twenty violating comments in 2016, and one in ten violating comments in 2020. Personal attacks were the most prevalent category of norm violation; pornography and bigotry were the most likely to be moderated, while politically inflammatory comments and misogyny/vulgarity were the least likely to be moderated. This paper offers a method and set of empirical results for tracking these phenomena as both the social practices (e.g., moderation) and technical practices (e.g., design) evolve.

preprint2022arXiv

On the Opportunities and Risks of Foundation Models

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

preprint2022arXiv

Social Simulacra: Creating Populated Prototypes for Social Computing Systems

Social computing prototypes probe the social behaviors that may arise in an envisioned system design. This prototyping practice is currently limited to recruiting small groups of people. Unfortunately, many challenges do not arise until a system is populated at a larger scale. Can a designer understand how a social system might behave when populated, and make adjustments to the design before the system falls prey to such challenges? We introduce social simulacra, a prototyping technique that generates a breadth of realistic social interactions that may emerge when a social computing system is populated. Social simulacra take as input the designer's description of a community's design -- goal, rules, and member personas -- and produce as output an instance of that design with simulated behavior, including posts, replies, and anti-social behaviors. We demonstrate that social simulacra shift the behaviors that they generate appropriately in response to design changes, and that they enable exploration of "what if?" scenarios where community members or moderators intervene. To power social simulacra, we contribute techniques for prompting a large language model to generate thousands of distinct community members and their social interactions with each other; these techniques are enabled by the observation that large language models' training data already includes a wide variety of positive and negative behavior on social media platforms. In evaluations, we show that participants are often unable to distinguish social simulacra from actual community behavior and that social computing designers successfully refine their social computing designs when using social simulacra.

preprint2021arXiv

Not Now, Ask Later: Users Weaken Their Behavior Change Regimen Over Time, But Expect To Re-Strengthen It Imminently

How effectively do we adhere to nudges and interventions that help us control our online browsing habits? If we have a temporary lapse and disable the behavior change system, do we later resume our adherence, or has the dam broken? In this paper, we investigate these questions through log analyses of 8,000+ users on HabitLab, a behavior change platform that helps users reduce their time online. We find that, while users typically begin with high-challenge interventions, over time they allow themselves to slip into easier and easier interventions. Despite this, many still expect to return to the harder interventions imminently: they repeatedly choose to be asked to change difficulty again on the next visit, declining to have the system save their preference for easy interventions.

preprint2020arXiv

PolicyKit: Building Governance in Online Communities

The software behind online community platforms encodes a governance model that represents a strikingly narrow set of governance possibilities focused on moderators and administrators. When online communities desire other forms of government, such as ones that take many members' opinions into account or that distribute power in non-trivial ways, communities must resort to laborious manual effort. In this paper, we present PolicyKit, a software infrastructure that empowers online community members to concisely author a wide range of governance procedures and automatically carry out those procedures on their home platforms. We draw on political science theory to encode community governance into policies, or short imperative functions that specify a procedure for determining whether a user-initiated action can execute. Actions that can be governed by policies encompass everyday activities such as posting or moderating a message, but actions can also encompass changes to the policies themselves, enabling the evolution of governance over time. We demonstrate the expressivity of PolicyKit through implementations of governance models such as a random jury deliberation, a multi-stage caucus, a reputation system, and a promotion procedure inspired by Wikipedia's Request for Adminship (RfA) process.

preprint2016arXiv

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets. As a result, crowd workers spend weeks or months repeating the exact same tasks, making it necessary to understand their behavior over these long periods of time. We utilize three large, longitudinal datasets of nine million annotations collected from Amazon Mechanical Turk to examine claims that workers fatigue or satisfice over these long periods, producing lower quality work. We find that, contrary to these claims, workers are extremely stable in their quality over the entire period. To understand whether workers set their quality based on the task's requirements for acceptance, we then perform an experiment where we vary the required quality for a large crowdsourcing task. Workers did not adjust their quality based on the acceptance threshold: workers who were above the threshold continued working at their usual quality level, and workers below the threshold self-selected themselves out of the task. Capitalizing on this consistency, we demonstrate that it is possible to predict workers' long-term quality using just a glimpse of their quality on the first five tasks.

preprint2016arXiv

Atelier: Repurposing Expert Crowdsourcing Tasks as Micro-internships

Expert crowdsourcing marketplaces have untapped potential to empower workers' career and skill development. Currently, many workers cannot afford to invest the time and sacrifice the earnings required to learn a new skill, and a lack of experience makes it difficult to get job offers even if they do. In this paper, we seek to lower the threshold to skill development by repurposing existing tasks on the marketplace as mentored, paid, real-world work experiences, which we refer to as micro-internships. We instantiate this idea in Atelier, a micro-internship platform that connects crowd interns with crowd mentors. Atelier guides mentor-intern pairs to break down expert crowdsourcing tasks into milestones, review intermediate output, and problem-solve together. We conducted a field experiment comparing Atelier's mentorship model to a non-mentored alternative on a real-world programming crowdsourcing task, finding that Atelier helped interns maintain forward progress and absorb best practices.

preprint2016arXiv

Embracing Error to Enable Rapid Crowdsourcing

Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data. To scale and widen the applicability of crowdsourcing, we present a technique that produces extremely rapid judgments for binary and categorical labels. Rather than punishing all errors, which causes workers to proceed slowly and deliberately, our technique speeds up workers' judgments to the point where errors are acceptable and even expected. We demonstrate that it is possible to rectify these errors by randomizing task order and modeling response latency. We evaluate our technique on a breadth of common labeling tasks such as image verification, word similarity, sentiment analysis and topic classification. Where prior work typically achieves a 0.25x to 1x speedup over fixed majority vote, our approach often achieves an order of magnitude (10x) speedup.

preprint2016arXiv

Mechanical Novel: Crowdsourcing Complex Work through Reflection and Revision

Crowdsourcing systems accomplish large tasks with scale and speed by breaking work down into independent parts. However, many types of complex creative work, such as fiction writing, have remained out of reach for crowds because work is tightly interdependent: changing one part of a story may trigger changes to the overall plot and vice versa. Taking inspiration from how expert authors write, we propose a technique for achieving interdependent complex goals with crowds. With this technique, the crowd loops between reflection, to select a high-level goal, and revision, to decompose that goal into low-level, actionable tasks. We embody this approach in Mechanical Novel, a system that crowdsources short fiction stories on Amazon Mechanical Turk. In a field experiment, Mechanical Novel resulted in higher-quality stories than an iterative crowdsourcing workflow. Our findings suggest that orienting crowd work around high-level goals may enable workers to coordinate their effort to accomplish complex work.

preprint2016arXiv

Mosaic: Designing Online Creative Communities for Sharing Works-in-Progress

Online creative communities allow creators to share their work with a large audience, maximizing opportunities to showcase their work and connect with fans and peers. However, sharing in-progress work can be technically and socially challenging in environments designed for sharing completed pieces. We propose an online creative community where sharing process, rather than showcasing outcomes, is the main method of sharing creative work. Based on this, we present Mosaic---an online community where illustrators share work-in-progress snapshots showing how an artwork was completed from start to finish. In an online deployment and observational study, artists used Mosaic as a vehicle for reflecting on how they can improve their own creative process, developed a social norm of detailed feedback, and became less apprehensive of sharing early versions of artwork. Through Mosaic, we argue that communities oriented around sharing creative process can create a collaborative environment that is beneficial for creative growth.

preprint2016arXiv

Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community

Imagine a princess asleep in a castle, waiting for her prince to slay the dragon and rescue her. Tales like the famous Sleeping Beauty clearly divide up gender roles. But what about more modern stories, borne of a generation increasingly aware of social constructs like sexism and racism? Do these stories tend to reinforce gender stereotypes, or counter them? In this paper, we present a technique that combines natural language processing with a crowdsourced lexicon of stereotypes to capture gender biases in fiction. We apply this technique across 1.8 billion words of fiction from the Wattpad online writing community, investigating gender representation in stories, how male and female characters behave and are described, and how authors' use of gender stereotypes is associated with the community's ratings. We find that male over-representation and traditional gender stereotypes (e.g., dominant men and submissive women) are common throughout nearly every genre in our corpus. However, only some of these stereotypes, like sexual or violent men, are associated with highly rated stories. Finally, despite women often being the target of negative stereotypes, female authors are equally likely to write such stereotypes as men.

preprint2016arXiv

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering. Cognition is core to tasks that involve not just recognizing, but reasoning about our visual world. However, models used to tackle the rich content in images for cognitive tasks are still being trained using the same datasets designed for perceptual tasks. To achieve success at cognitive tasks, models need to understand the interactions and relationships between objects in an image. When asked "What vehicle is the person riding?", computers will need to identify the objects in an image as well as the relationships riding(man, carriage) and pulling(horse, carriage) in order to answer correctly that "the person is riding a horse-drawn carriage". In this paper, we present the Visual Genome dataset to enable the modeling of such relationships. We collect dense annotations of objects, attributes, and relationships within each image to learn these models. Specifically, our dataset contains over 100K images where each image has an average of 21 objects, 18 attributes, and 18 pairwise relationships between objects. We canonicalize the objects, attributes, relationships, and noun phrases in region descriptions and questions answer pairs to WordNet synsets. Together, these annotations represent the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answers.

preprint2015arXiv

SentenceRacer: A Game with a Purpose for Image Sentence Annotation

Recently datasets that contain sentence descriptions of images have enabled models that can automatically generate image captions. However, collecting these datasets are still very expensive. Here, we present SentenceRacer, an online game that gathers and verifies descriptions of images at no cost. Similar to the game hangman, players compete to uncover words in a sentence that ultimately describes an image. SentenceRacer both generates and verifies that the sentences are accurate descriptions. We show that SentenceRacer generates annotations of higher quality than those generated on Amazon Mechanical Turk (AMT).

preprint2014arXiv

Designing and Deploying Online Field Experiments

Online experiments are widely used to compare specific design alternatives, but they can also be used to produce generalizable knowledge and inform strategic decision making. Doing so often requires sophisticated experimental designs, iterative refinement, and careful logging and analysis. Few tools exist that support these needs. We thus introduce a language for online field experiments called PlanOut. PlanOut separates experimental design from application code, allowing the experimenter to concisely describe experimental designs, whether common "A/B tests" and factorial designs, or more complex designs involving conditional logic or multiple experimental units. These latter designs are often useful for understanding causal mechanisms involved in user behaviors. We demonstrate how experiments from the literature can be implemented in PlanOut, and describe two large field experiments conducted on Facebook with PlanOut. For common scenarios in which experiments are run iteratively and in parallel, we introduce a namespaced management system that encourages sound experimental practice.

preprint2012arXiv

Analytic Methods for Optimizing Realtime Crowdsourcing

Realtime crowdsourcing research has demonstrated that it is possible to recruit paid crowds within seconds by managing a small, fast-reacting worker pool. Realtime crowds enable crowd-powered systems that respond at interactive speeds: for example, cameras, robots and instant opinion polls. So far, these techniques have mainly been proof-of-concept prototypes: research has not yet attempted to understand how they might work at large scale or optimize their cost/performance trade-offs. In this paper, we use queueing theory to analyze the retainer model for realtime crowdsourcing, in particular its expected wait time and cost to requesters. We provide an algorithm that allows requesters to minimize their cost subject to performance requirements. We then propose and analyze three techniques to improve performance: push notifications, shared retainer pools, and precruitment, which involves recalling retainer workers before a task actually arrives. An experimental validation finds that precruited workers begin a task 500 milliseconds after it is posted, delivering results below the one-second cognitive threshold for an end-user to stay in flow.

Michael S. Bernstein

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

A "Distance Matters" Paradox: Facilitating Intra-Team Collaboration Can Harm Inter-Team Collaboration

A Web-Scale Analysis of the Community Origins of Image Memes

Jury Learning: Integrating Dissenting Voices into Machine Learning Models

Measuring the Prevalence of Anti-Social Behavior in Online Communities

On the Opportunities and Risks of Foundation Models

Social Simulacra: Creating Populated Prototypes for Social Computing Systems

Not Now, Ask Later: Users Weaken Their Behavior Change Regimen Over Time, But Expect To Re-Strengthen It Imminently

PolicyKit: Building Governance in Online Communities

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

Atelier: Repurposing Expert Crowdsourcing Tasks as Micro-internships

Embracing Error to Enable Rapid Crowdsourcing

Mechanical Novel: Crowdsourcing Complex Work through Reflection and Revision

Mosaic: Designing Online Creative Communities for Sharing Works-in-Progress

Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

SentenceRacer: A Game with a Purpose for Image Sentence Annotation

Designing and Deploying Online Field Experiments

Analytic Methods for Optimizing Realtime Crowdsourcing