Researcher profile

Michael S. Bernstein

Michael S. Bernstein contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

A "Distance Matters" Paradox: Facilitating Intra-Team Collaboration Can Harm Inter-Team Collaboration

By identifying the socio-technical conditions required for teams to work effectively remotely, the Distance Matters framework has been influential in CSCW since its introduction in 2000. Advances in collaboration technology and practices have since brought teams increasingly closer to achieving these conditions. This paper presents a ten-month ethnography in a remote organization, where we observed that despite exhibiting excellent remote collaboration, teams paradoxically struggled to collaborate across team boundaries. We extend the Distance Matters framework to account for inter-team collaboration, arguing that challenges analogous to those in the original intra-team framework -- common ground, collaboration readiness, collaboration technology readiness, and coupling of work -- persist but are actualized differently at the inter-team scale. Finally, we identify a fundamental tension between the intra- and inter-team layers: the collaboration technology and practices that help individual teams thrive (e.g., adopting customized collaboration software) can also prompt collaboration challenges in the inter-team layer, and conversely the technology and practices that facilitate inter-team collaboration (e.g., strong centralized IT organizations) can harm practices at the intra-team layer. The addition of the inter-team layer to the Distance Matters framework opens new opportunities for CSCW, where balancing the tension between team and organizational collaboration needs will be a critical technological, operational, and organizational challenge for remote work in the coming decades.

preprint2022arXiv

A Web-Scale Analysis of the Community Origins of Image Memes

Where do the most popular online cultural artifacts such as image memes originate? Media narratives suggest that cultural innovations often originate in peripheral communities and then diffuse to the mainstream core; behavioral science suggests that intermediate network positions that bridge between the periphery and the core are especially likely to originate many influential cultural innovations. Research has yet to fully adjudicate between these predictions because prior work focuses on individual platforms such as Twitter; however, any single platform is only a small, incomplete part of the larger online cultural ecosystem. In this paper, we perform the first analysis of the origins and diffusion of image memes at web scale, via a one-month crawl of all indexible online communities that principally share meme images with English text overlays. Our results suggest that communities at the core of the network originate the most highly diffused image memes: the top 10% of communities by network centrality originate the memes that generate 62% of the image meme diffusion events on the web. A zero-inflated negative binomial regression confirms that memes from core communities are more likely to diffuse than those from peripheral communities even when controlling for community size and activity level. However, a replication analysis that follows the traditional approach of testing the same question only within a single large community, Reddit, finds the regression coefficients reversed -- underscoring the importance of engaging in web-scale, cross-community analyses. The ecosystem-level viewpoint of this work positions the web as a highly centralized generator of cultural artifacts such as image memes.

preprint2022arXiv

Jury Learning: Integrating Dissenting Voices into Machine Learning Models

Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups' labels. We introduce jury learning, a supervised ML approach that resolves these disagreements explicitly through the metaphor of a jury: defining which people or groups, in what proportion, determine the classifier's prediction. For example, a jury learning model for online toxicity might centrally feature women and Black jurors, who are commonly targets of online harassment. To enable jury learning, we contribute a deep learning architecture that models every annotator in a dataset, samples from annotators' models to populate the jury, then runs inference to classify. Our architecture enables juries that dynamically adapt their composition, explore counterfactuals, and visualize dissent.

preprint2022arXiv

Measuring the Prevalence of Anti-Social Behavior in Online Communities

With increasing attention to online anti-social behaviors such as personal attacks and bigotry, it is critical to have an accurate accounting of how widespread anti-social behaviors are. In this paper, we empirically measure the prevalence of anti-social behavior in one of the world's most popular online community platforms. We operationalize this goal as measuring the proportion of unmoderated comments in the 97 most popular communities on Reddit that violate eight widely accepted platform norms. To achieve this goal, we contribute a human-AI pipeline for identifying these violations and a bootstrap sampling method to quantify measurement uncertainty. We find that 6.25% (95% Confidence Interval [5.36%, 7.13%]) of all comments in 2016, and 4.28% (95% CI [2.50%, 6.26%]) in 2020-2021, are violations of these norms. Most anti-social behaviors remain unmoderated: moderators only removed one in twenty violating comments in 2016, and one in ten violating comments in 2020. Personal attacks were the most prevalent category of norm violation; pornography and bigotry were the most likely to be moderated, while politically inflammatory comments and misogyny/vulgarity were the least likely to be moderated. This paper offers a method and set of empirical results for tracking these phenomena as both the social practices (e.g., moderation) and technical practices (e.g., design) evolve.

preprint2022arXiv

On the Opportunities and Risks of Foundation Models

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

preprint2022arXiv

Social Simulacra: Creating Populated Prototypes for Social Computing Systems

Social computing prototypes probe the social behaviors that may arise in an envisioned system design. This prototyping practice is currently limited to recruiting small groups of people. Unfortunately, many challenges do not arise until a system is populated at a larger scale. Can a designer understand how a social system might behave when populated, and make adjustments to the design before the system falls prey to such challenges? We introduce social simulacra, a prototyping technique that generates a breadth of realistic social interactions that may emerge when a social computing system is populated. Social simulacra take as input the designer's description of a community's design -- goal, rules, and member personas -- and produce as output an instance of that design with simulated behavior, including posts, replies, and anti-social behaviors. We demonstrate that social simulacra shift the behaviors that they generate appropriately in response to design changes, and that they enable exploration of "what if?" scenarios where community members or moderators intervene. To power social simulacra, we contribute techniques for prompting a large language model to generate thousands of distinct community members and their social interactions with each other; these techniques are enabled by the observation that large language models' training data already includes a wide variety of positive and negative behavior on social media platforms. In evaluations, we show that participants are often unable to distinguish social simulacra from actual community behavior and that social computing designers successfully refine their social computing designs when using social simulacra.

preprint2021arXiv

Not Now, Ask Later: Users Weaken Their Behavior Change Regimen Over Time, But Expect To Re-Strengthen It Imminently

How effectively do we adhere to nudges and interventions that help us control our online browsing habits? If we have a temporary lapse and disable the behavior change system, do we later resume our adherence, or has the dam broken? In this paper, we investigate these questions through log analyses of 8,000+ users on HabitLab, a behavior change platform that helps users reduce their time online. We find that, while users typically begin with high-challenge interventions, over time they allow themselves to slip into easier and easier interventions. Despite this, many still expect to return to the harder interventions imminently: they repeatedly choose to be asked to change difficulty again on the next visit, declining to have the system save their preference for easy interventions.

preprint2020arXiv

PolicyKit: Building Governance in Online Communities

The software behind online community platforms encodes a governance model that represents a strikingly narrow set of governance possibilities focused on moderators and administrators. When online communities desire other forms of government, such as ones that take many members' opinions into account or that distribute power in non-trivial ways, communities must resort to laborious manual effort. In this paper, we present PolicyKit, a software infrastructure that empowers online community members to concisely author a wide range of governance procedures and automatically carry out those procedures on their home platforms. We draw on political science theory to encode community governance into policies, or short imperative functions that specify a procedure for determining whether a user-initiated action can execute. Actions that can be governed by policies encompass everyday activities such as posting or moderating a message, but actions can also encompass changes to the policies themselves, enabling the evolution of governance over time. We demonstrate the expressivity of PolicyKit through implementations of governance models such as a random jury deliberation, a multi-stage caucus, a reputation system, and a promotion procedure inspired by Wikipedia's Request for Adminship (RfA) process.