Researcher profile

Donald B. Rubin

Donald B. Rubin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2022arXiv

Causal inference from treatment-control studies having an additional factor with unknown assignment mechanism

Consider a situation with two treatments, the first of which is randomized but the second is not, and the multifactor version of this. Interest is in treatment effects, defined using standard factorial notation. We define estimators for the treatment effects and explore their properties when there is information about the nonrandomized treatment assignment and when there is no information on the assignment of the nonrandomized treatment. We show when and how hidden treatments can bias estimators and inflate their sampling variances.

preprint2021arXiv

Automatic Detection of Influential Actors in Disinformation Networks

The weaponization of digital communications and social media to conduct disinformation campaigns at immense scale, speed, and reach presents new challenges to identify and counter hostile influence operations (IOs). This paper presents an end-to-end framework to automate detection of disinformation narratives, networks, and influential actors. The framework integrates natural language processing, machine learning, graph analytics, and a novel network causal inference approach to quantify the impact of individual actors in spreading IO narratives. We demonstrate its capability on real-world hostile IO campaigns with Twitter datasets collected during the 2017 French presidential elections, and known IO accounts disclosed by Twitter over a broad range of IO campaigns (May 2007 to February 2020), over 50,000 accounts, 17 countries, and different account types including both trolls and bots. Our system detects IO accounts with 96% precision, 79% recall, and 96% area-under-the-PR-curve, maps out salient network communities, and discovers high-impact accounts that escape the lens of traditional impact statistics based on activity counts and network centrality. Results are corroborated with independent sources of known IO accounts from U.S. Congressional reports, investigative journalism, and IO datasets provided by Twitter.

preprint2021arXiv

PCA Rerandomization

Mahalanobis distance between treatment group and control group covariate means is often adopted as a balance criterion when implementing a rerandomization strategy. However, this criterion may not work well for high-dimensional cases because it balances all orthogonalized covariates equally. Here, we propose leveraging principal component analysis (PCA) to identify proper subspaces in which Mahalanobis distance should be calculated. Not only can PCA effectively reduce the dimensionality for high-dimensional cases while capturing most of the information in the covariates, but it also provides computational simplicity by focusing on the top orthogonal components. We show that our PCA rerandomization scheme has desirable theoretical properties on balancing covariates and thereby on improving the estimation of average treatment effects. We also show that this conclusion is supported by numerical studies using both simulated and real examples.