Source author record

Amir Najmi

Amir Najmi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Methodology Applications Artificial Intelligence econ.TH

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

The many Shapley values for model explanation

The Shapley value has become a popular method to attribute the prediction of a machine-learning model on an input to its base features. The use of the Shapley value is justified by citing [16] showing that it is the \emph{unique} method that satisfies certain good properties (\emph{axioms}). There are, however, a multiplicity of ways in which the Shapley value is operationalized in the attribution problem. These differ in how they reference the model, the training data, and the explanation context. These give very different results, rendering the uniqueness result meaningless. Furthermore, we find that previously proposed approaches can produce counterintuitive attributions in theory and in practice---for instance, they can assign non-zero attributions to features that are not even referenced by the model. In this paper, we use the axiomatic approach to study the differences between some of the many operationalizations of the Shapley value for attribution, and propose a technique called Baseline Shapley (BShap) that is backed by a proper uniqueness result. We also contrast BShap with Integrated Gradients, another extension of Shapley value to the continuous setting.

preprint2020arXiv

The Penalty Imposed by Ablated Data Augmentation

There is a set of data augmentation techniques that ablate parts of the input at random. These include input dropout, cutout, and random erasing. We term these techniques ablated data augmentation. Though these techniques seems similar in spirit and have shown success in improving model performance in a variety of domains, we do not yet have a mathematical understanding of the differences between these techniques like we do for other regularization techniques like L1 or L2. First, we study a formal model of mean ablated data augmentation and inverted dropout for linear regression. We prove that ablated data augmentation is equivalent to optimizing the ordinary least squares objective along with a penalty that we call the Contribution Covariance Penalty and inverted dropout, a more common implementation than dropout in popular frameworks, is equivalent to optimizing the ordinary least squares objective along with Modified L2. For deep networks, we demonstrate an empirical version of the result if we replace contributions with attributions and coefficients with average gradients, i.e., the Contribution Covariance Penalty and Modified L2 Penalty drop with the increase of the corresponding ablated data augmentation across a variety of networks.

preprint2015arXiv

Second Order Calibration: A Simple Way to Get Approximate Posteriors

Many large-scale machine learning problems involve estimating an unknown parameter $θ_{i}$ for each of many items. For example, a key problem in sponsored search is to estimate the click through rate (CTR) of each of billions of query-ad pairs. Most common methods, though, only give a point estimate of each $θ_{i}$. A posterior distribution for each $θ_{i}$ is usually more useful but harder to get. We present a simple post-processing technique that takes point estimates or scores $t_{i}$ (from any method) and estimates an approximate posterior for each $θ_{i}$. We build on the idea of calibration, a common post-processing technique that estimates $\mathrm{E}\left(θ_{i}\!\!\bigm|\!\! t_{i}\right)$. Our method, second order calibration, uses empirical Bayes methods to estimate the distribution of $θ_{i}\!\!\bigm|\!\! t_{i}$ and uses the estimated distribution as an approximation to the posterior distribution of $θ_{i}$. We show that this can yield improved point estimates and useful accuracy estimates. The method scales to large problems - our motivating example is a CTR estimation problem involving tens of billions of query-ad pairs.

preprint2014arXiv

Feedback Detection for Live Predictors

A predictor that is deployed in a live production system may perturb the features it uses to make predictions. Such a feedback loop can occur, for example, when a model that predicts a certain type of behavior ends up causing the behavior it predicts, thus creating a self-fulfilling prophecy. In this paper we analyze predictor feedback detection as a causal inference problem, and introduce a local randomization scheme that can be used to detect non-linear feedback in real-world problems. We conduct a pilot study for our proposed methodology using a predictive system currently deployed as a part of a search engine.