Researcher profile

Xiaojie Mao

Xiaojie Mao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning

Off-policy evaluation and learning (OPE/L) use offline observational data to make better decisions, which is crucial in applications where online experimentation is limited. However, depending entirely on logged data, OPE/L is sensitive to environment distribution shifts -- discrepancies between the data-generating environment and that where policies are deployed. \citet{si2020distributional} proposed distributionally robust OPE/L (DROPE/L) to address this, but the proposal relies on inverse-propensity weighting, whose estimation error and regret will deteriorate if propensities are nonparametrically estimated and whose variance is suboptimal even if not. For standard, non-robust, OPE/L, this is solved by doubly robust (DR) methods, but they do not naturally extend to the more complex DROPE/L, which involves a worst-case expectation. In this paper, we propose the first DR algorithms for DROPE/L with KL-divergence uncertainty sets. For evaluation, we propose Localized Doubly Robust DROPE (LDR$^2$OPE) and show that it achieves semiparametric efficiency under weak product rates conditions. Thanks to a localization technique, LDR$^2$OPE only requires fitting a small number of regressions, just like DR methods for standard OPE. For learning, we propose Continuum Doubly Robust DROPL (CDR$^2$OPL) and show that, under a product rate condition involving a continuum of regressions, it enjoys a fast regret rate of $\mathcal{O}\left(N^{-1/2}\right)$ even when unknown propensities are nonparametrically estimated. We empirically validate our algorithms in simulations and further extend our results to general $f$-divergence uncertainty sets.

preprint2022arXiv

Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

We consider estimating a low-dimensional parameter in an estimating equation involving high-dimensional nuisances that depend on the parameter. A central example is the efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference, which involves as a nuisance the covariate-conditional cumulative distribution function evaluated at the quantile to be estimated. Debiased machine learning (DML) is a data-splitting approach to estimating high-dimensional nuisances using flexible machine learning methods, but applying it to problems with parameter-dependent nuisances is impractical. For (L)QTE, DML requires we learn the whole covariate-conditional cumulative distribution function. We instead propose localized debiased machine learning (LDML), which avoids this burdensome step and needs only estimate nuisances at a single initial rough guess for the parameter. For (L)QTE, LDML involves learning just two regression functions, a standard task for machine learning methods. We prove that under lax rate conditions our estimator has the same favorable asymptotic behavior as the infeasible estimator that uses the unknown true nuisances. Thus, LDML notably enables practically-feasible and theoretically-grounded efficient estimation of important quantities in causal inference such as (L)QTEs when we must control for many covariates and/or flexible relationships, as we demonstrate in empirical studies.

preprint2022arXiv

Stochastic Optimization Forests

We study contextual stochastic optimization problems, where we leverage rich auxiliary observations (e.g., product characteristics) to improve decision making with uncertain variables (e.g., demand). We show how to train forest decision policies for this problem by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm. We realize this seemingly computationally intractable problem by developing approximate splitting criteria that utilize optimization perturbation analysis to eschew burdensome re-optimization for every candidate split, so that our method scales to large-scale problems. We prove that our splitting criteria consistently approximate the true risk and that our method achieves asymptotic optimality. We extensively validate our method empirically, demonstrating the value of optimization-aware construction of forests and the success of our efficient approximations. We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.

preprint2020arXiv

Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination

The increasing impact of algorithmic decisions on people's lives compels us to scrutinize their fairness and, in particular, the disparate impacts that ostensibly-color-blind algorithms can have on different groups. Examples include credit decisioning, hiring, advertising, criminal justice, personalized medicine, and targeted policymaking, where in some cases legislative or regulatory frameworks for fairness exist and define specific protected classes. In this paper we study a fundamental challenge to assessing disparate impacts in practice: protected class membership is often not observed in the data. This is particularly a problem in lending and healthcare. We consider the use of an auxiliary dataset, such as the US census, to construct models that predict the protected class from proxy variables, such as surname and geolocation. We show that even with such data, a variety of common disparity measures are generally unidentifiable, providing a new perspective on the documented biases of popular proxy-based methods. We provide exact characterizations of the tightest-possible set of all possible true disparities that are consistent with the data (and possibly any assumptions). We further provide optimization-based algorithms for computing and visualizing these sets and statistical tools to assess sampling uncertainty. Together, these enable reliable and robust assessments of disparities -- an important tool when disparity assessment can have far-reaching policy implications. We demonstrate this in two case studies with real data: mortgage lending and personalized medicine dosing.

preprint2020arXiv

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

We study a nonparametric contextual bandit problem where the expected reward functions belong to a Hölder class with smoothness parameter $β$. We show how this interpolates between two extremes that were previously studied in isolation: non-differentiable bandits ($β\leq1$), where rate-optimal regret is achieved by running separate non-contextual bandits in different context regions, and parametric-response bandits (satisfying $β=\infty$), where rate-optimal regret can be achieved with minimal or no exploration due to infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and non-differentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits.