Source author record

Sanghamitra Dutta

Sanghamitra Dutta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence cs.CY Information Theory math.IT Distributed, Parallel, and Cluster Computing Performance

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

There exist several methods that aim to address the crucial task of understanding the behaviour of AI/ML models. Arguably, the most popular among them are local explanations that focus on investigating model behaviour for individual instances. Several methods have been proposed for local analysis, but relatively lesser effort has gone into understanding if the explanations are robust and accurately reflect the behaviour of underlying models. In this work, we present a survey of the works that analysed the robustness of two classes of local explanations (feature importance and counterfactual explanations) that are popularly used in analysing AI/ML models in finance. The survey aims to unify existing definitions of robustness, introduces a taxonomy to classify different robustness approaches, and discusses some interesting results. Finally, the survey introduces some pointers about extending current robustness analysis approaches so as to identify reliable explainability methods.

preprint2022arXiv

Fairness via In-Processing in the Over-parameterized Regime: A Cautionary Tale

The success of DNNs is driven by the counter-intuitive ability of over-parameterized networks to generalize, even when they perfectly fit the training data. In practice, test error often continues to decrease with increasing over-parameterization, referred to as double descent. This allows practitioners to instantiate large models without having to worry about over-fitting. Despite its benefits, however, prior work has shown that over-parameterization can exacerbate bias against minority subgroups. Several fairness-constrained DNN training methods have been proposed to address this concern. Here, we critically examine MinDiff, a fairness-constrained training procedure implemented within TensorFlow's Responsible AI Toolkit, that aims to achieve Equality of Opportunity. We show that although MinDiff improves fairness for under-parameterized models, it is likely to be ineffective in the over-parameterized regime. This is because an overfit model with zero training loss is trivially group-wise fair on training data, creating an "illusion of fairness," thus turning off the MinDiff optimization (this will apply to any disparity-based measures which care about errors or accuracy. It won't apply to demographic parity). Within specified fairness constraints, under-parameterized MinDiff models can even have lower error compared to their over-parameterized counterparts (despite baseline over-parameterized models having lower error). We further show that MinDiff optimization is very sensitive to choice of batch size in the under-parameterized regime. Thus, fair model training using MinDiff requires time-consuming hyper-parameter searches. Finally, we suggest using previously proposed regularization techniques, viz. L2, early stopping and flooding in conjunction with MinDiff to train fair over-parameterized models.

preprint2022arXiv

Quantifying Feature Contributions to Overall Disparity Using Information Theory

When a machine-learning algorithm makes biased decisions, it can be helpful to understand the sources of disparity to explain why the bias exists. Towards this, we examine the problem of quantifying the contribution of each individual feature to the observed disparity. If we have access to the decision-making model, one potential approach (inspired from intervention-based approaches in explainability literature) is to vary each individual feature (while keeping the others fixed) and use the resulting change in disparity to quantify its contribution. However, we may not have access to the model or be able to test/audit its outputs for individually varying features. Furthermore, the decision may not always be a deterministic function of the input features (e.g., with human-in-the-loop). For these situations, we might need to explain contributions using purely distributional (i.e., observational) techniques, rather than interventional. We ask the question: what is the "potential" contribution of each individual feature to the observed disparity in the decisions when the exact decision-making mechanism is not accessible? We first provide canonical examples (thought experiments) that help illustrate the difference between distributional and interventional approaches to explaining contributions, and when either is better suited. When unable to intervene on the inputs, we quantify the "redundant" statistical dependency about the protected attribute that is present in both the final decision and an individual feature, by leveraging a body of work in information theory called Partial Information Decomposition. We also perform a simple case study to show how this technique could be applied to quantify contributions.

preprint2022arXiv

Robust Counterfactual Explanations for Tree-Based Ensembles

Counterfactual explanations inform ways to achieve a desired outcome from a machine learning model. However, such explanations are not robust to certain real-world changes in the underlying model (e.g., retraining the model, changing hyperparameters, etc.), questioning their reliability in several applications, e.g., credit lending. In this work, we propose a novel strategy -- that we call RobX -- to generate robust counterfactuals for tree-based ensembles, e.g., XGBoost. Tree-based ensembles pose additional challenges in robust counterfactual generation, e.g., they have a non-smooth and non-differentiable objective function, and they can change a lot in the parameter space under retraining on very similar data. We first introduce a novel metric -- that we call Counterfactual Stability -- that attempts to quantify how robust a counterfactual is going to be to model changes under retraining, and comes with desirable theoretical properties. Our proposed strategy RobX works with any counterfactual generation method (base method) and searches for robust counterfactuals by iteratively refining the counterfactual generated by the base method using our metric Counterfactual Stability. We compare the performance of RobX with popular counterfactual generation methods (for tree-based ensembles) across benchmark datasets. The results demonstrate that our strategy generates counterfactuals that are significantly more robust (nearly 100% validity after actual model changes) and also realistic (in terms of local outlier factor) over existing state-of-the-art methods.

preprint2020arXiv

Slow and Stale Gradients Can Win the Race

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error. In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime(wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on CIFAR10 dataset.

preprint2016arXiv

Adaptivity provably helps: information-theoretic limits on $l_0$ cost of non-adaptive sensing

The advantages of adaptivity and feedback are of immense interest in signal processing and communication with many positive and negative results. Although it is established that adaptivity does not offer substantial reductions in minimax mean square error for a fixed number of measurements, existing results have shown several advantages of adaptivity in complexity of reconstruction, accuracy of support detection, and gain in signal-to-noise ratio, under constraints on sensing energy. Sensing energy has often been measured in terms of the Frobenius Norm of the sensing matrix. This paper uses a different metric that we call the $l_0$ cost of a sensing matrix-- to quantify the complexity of sensing. Thus sparse sensing matrices have a lower cost. We derive information-theoretic lower bounds on the $l_0$ cost that hold for any non-adaptive sensing strategy. We establish that any non-adaptive sensing strategy must incur an $l_0$ cost of $Θ\left( N \log_2(N)\right) $ to reconstruct an $N$-dimensional, one--sparse signal when the number of measurements are limited to $Θ\left(\log_2 (N)\right)$. In comparison, bisection-type adaptive strategies only require an $l_0$ cost of at most $\mathcal{O}(N)$ for an equal number of measurements. The problem has an interesting interpretation as a sphere packing problem in a multidimensional space, such that all the sphere centres have minimum non-zero co-ordinates. We also discuss the variation in $l_0$ cost as the number of measurements increase from $Θ\left(\log_2 (N)\right)$ to $Θ\left(N\right)$.

preprint2014arXiv

LAMP: A Locally Adapting Matching Pursuit Framework for Group Sparse Signatures in Ultra-Wide Band Radar Imaging

It has been found that radar returns of extended targets are not only sparse but also exhibit a tendency to cluster into randomly located, variable sized groups. However, the standard techniques of Compressive Sensing as applied in radar imaging hardly considers the clustering tendency into account while reconstructing the image from the compressed measurements. If the group sparsity is taken into account, it is intuitive that one might obtain better results both in terms of accuracy and time complexity as compared to the conventional recovery techniques like Orthogonal Matching Pursuit (OMP). In order to remedy this, techniques like Block OMP have been used in the existing literature. An alternate approach is via reconstructing the signal by transforming into the Hough Transform Domain where they become point-wise sparse. However, these techniques essentially assume specific size and structure of the groups and are not always effective if the exact characteristics of the groups are not known, prior to reconstruction. In this manuscript, a novel framework that we call locally adapting matching pursuit (LAMP) have been proposed for efficient reconstruction of group sparse signals from compressed measurements without assuming any specific size, location, or structure of the groups. The recovery guarantee of the LAMP and its superiority compared to the existing algorithms has been established with respect to accuracy, time complexity and flexibility in group size. LAMP has been successfully used on a real-world, experimental data set.

Sanghamitra Dutta

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A Survey on the Robustness of Feature Importance and Counterfactual Explanations

Fairness via In-Processing in the Over-parameterized Regime: A Cautionary Tale

Quantifying Feature Contributions to Overall Disparity Using Information Theory

Robust Counterfactual Explanations for Tree-Based Ensembles

Slow and Stale Gradients Can Win the Race

Adaptivity provably helps: information-theoretic limits on $l_0$ cost of non-adaptive sensing

LAMP: A Locally Adapting Matching Pursuit Framework for Group Sparse Signatures in Ultra-Wide Band Radar Imaging