Source author record

Sahil Singla

Sahil Singla appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computer Science and Game Theory Discrete Mathematics Machine Learning Computational Geometry Computer Vision math.CO Artificial Intelligence cs.CY math.MG math.PR Networking and Internet Architecture Performance

Catalog footprint

What is connected

18works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Core Risk Minimization using Salient ImageNet

Deep neural networks can be unreliable in the real world especially when they heavily use spurious features for their predictions. Recently, Singla & Feizi (2022) introduced the Salient Imagenet dataset by annotating and localizing core and spurious features of ~52k samples from 232 classes of Imagenet. While this dataset is useful for evaluating the reliance of pretrained models on spurious features, its small size limits its usefulness for training models. In this work, we first introduce the Salient Imagenet-1M dataset with more than 1 million soft masks localizing core and spurious features for all 1000 Imagenet classes. Using this dataset, we first evaluate the reliance of several Imagenet pretrained models (42 total) on spurious features and observe that: (i) transformers are more sensitive to spurious features compared to Convnets, (ii) zero-shot CLIP transformers are highly susceptible to spurious features. Next, we introduce a new learning paradigm called Core Risk Minimization (CoRM) whose objective ensures that the model predicts a class using its core features. We evaluate different computational approaches for solving CoRM and achieve significantly higher (+12%) core accuracy (accuracy when non-core regions corrupted using noise) with no drop in clean accuracy compared to models trained via Empirical Risk Minimization.

preprint2022arXiv

Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100

Training convolutional neural networks (CNNs) with a strict Lipschitz constraint under the $l_{2}$ norm is useful for provable adversarial robustness, interpretable gradients and stable training. While $1$-Lipschitz CNNs can be designed by enforcing a $1$-Lipschitz constraint on each layer, training such networks requires each layer to have an orthogonal Jacobian matrix (for all inputs) to prevent the gradients from vanishing during backpropagation. A layer with this property is said to be Gradient Norm Preserving (GNP). In this work, we introduce a procedure to certify the robustness of $1$-Lipschitz CNNs by relaxing the orthogonalization of the last linear layer of the network that significantly advances the state of the art for both standard and provable robust accuracies on CIFAR-100 (gains of $4.80\%$ and $4.71\%$, respectively). We further boost their robustness by introducing (i) a novel Gradient Norm preserving activation function called the Householder activation function (that includes every $\mathrm{GroupSort}$ activation) and (ii) a certificate regularization. On CIFAR-10, we achieve significant improvements over prior works in provable robust accuracy ($5.81\%$) with only a minor drop in standard accuracy ($-0.29\%$). Code for reproducing all experiments in the paper is available at \url{https://github.com/singlasahil14/SOC}.

preprint2022arXiv

Salient ImageNet: How to discover spurious features in Deep Learning?

Deep neural networks can be unreliable in the real world especially when they heavily use {\it spurious} features for their predictions. Focusing on image classifications, we define {\it core features} as the set of visual features that are always a part of the object definition while {\it spurious features} are the ones that are likely to {\it co-occur} with the object but not a part of it (e.g., attribute "fingers" for class "band aid"). Traditional methods for discovering spurious features either require extensive human annotations (thus, not scalable), or are useful on specific models. In this work, we introduce a {\it general} framework to discover a subset of spurious and core visual features used in inferences of a general model and localize them on a large number of images with minimal human supervision. Our methodology is based on this key idea: to identify spurious or core \textit{visual features} used in model predictions, we identify spurious or core \textit{neural features} (penultimate layer neurons of a robust model) via limited human supervision (e.g., using top 5 activating images per feature). We then show that these neural feature annotations {\it generalize} extremely well to many more images {\it without} any human supervision. We use the activation maps for these neural features as the soft masks to highlight spurious or core visual features. Using this methodology, we introduce the {\it Salient Imagenet} dataset containing core and spurious masks for a large set of samples from Imagenet. Using this dataset, we show that several popular Imagenet models rely heavily on various spurious features in their predictions, indicating the standard accuracy alone is not sufficient to fully assess model performance. Code and dataset for reproducing all experiments in the paper is available at \url{https://github.com/singlasahil14/salient_imagenet}.

preprint2022arXiv

Smoothed Analysis of the Komlós Conjecture

The well-known Komlós conjecture states that given $n$ vectors in $\mathbb{R}^d$ with Euclidean norm at most one, there always exists a $\pm 1$ coloring such that the $\ell_{\infty}$ norm of the signed-sum vector is a constant independent of $n$ and $d$. We prove this conjecture in a smoothed analysis setting where the vectors are perturbed by adding a small Gaussian noise and when the number of vectors $n =ω(d\log d)$. The dependence of $n$ on $d$ is the best possible even in a completely random setting. Our proof relies on a weighted second moment method, where instead of considering uniformly randomly colorings we apply the second moment method on an implicit distribution on colorings obtained by applying the Gram-Schmidt walk algorithm to a suitable set of vectors. The main technical idea is to use various properties of these colorings, including subgaussianity, to control the second moment.

preprint2022arXiv

Submodular Dominance and Applications

In submodular optimization we often deal with the expected value of a submodular function $f$ on a distribution $\mathcal{D}$ over sets of elements. In this work we study such submodular expectations for negatively dependent distributions. We introduce a natural notion of negative dependence, which we call Weak Negative Regression (WNR), that generalizes both Negative Association and Negative Regression. We observe that WNR distributions satisfy Submodular Dominance, whereby the expected value of $f$ under $\mathcal{D}$ is at least the expected value of $f$ under a product distribution with the same element-marginals. Next, we give several applications of Submodular Dominance to submodular optimization. In particular, we improve the best known submodular prophet inequalities, we develop new rounding techniques for polytopes of set systems that admit negatively dependent distributions, and we prove existence of contention resolution schemes for WNR distributions.

preprint2021arXiv

Fairness Through Robustness: Investigating Robustness Disparity in Deep Learning

Deep neural networks (DNNs) are increasingly used in real-world applications (e.g. facial recognition). This has resulted in concerns about the fairness of decisions made by these models. Various notions and measures of fairness have been proposed to ensure that a decision-making system does not disproportionately harm (or benefit) particular subgroups of the population. In this paper, we argue that traditional notions of fairness that are only based on models' outputs are not sufficient when the model is vulnerable to adversarial attacks. We argue that in some cases, it may be easier for an attacker to target a particular subgroup, resulting in a form of \textit{robustness bias}. We show that measuring robustness bias is a challenging task for DNNs and propose two methods to measure this form of bias. We then conduct an empirical study on state-of-the-art neural networks on commonly used real-world datasets such as CIFAR-10, CIFAR-100, Adience, and UTKFace and show that in almost all cases there are subgroups (in some cases based on sensitive attributes like race, gender, etc) which are less robust and are thus at a disadvantage. We argue that this kind of bias arises due to both the data distribution and the highly complex nature of the learned decision boundary in the case of DNNs, thus making mitigation of such biases a non-trivial task. Our results show that robustness bias is an important criterion to consider while auditing real-world systems that rely on DNNs for decision making. Code to reproduce all our results can be found here: \url{https://github.com/nvedant07/Fairness-Through-Robustness}

preprint2020arXiv

Online Discrepancy Minimization for Stochastic Arrivals

In the stochastic online vector balancing problem, vectors $v_1,v_2,\ldots,v_T$ chosen independently from an arbitrary distribution in $\mathbb{R}^n$ arrive one-by-one and must be immediately given a $\pm$ sign. The goal is to keep the norm of the discrepancy vector, i.e., the signed prefix-sum, as small as possible for a given target norm. We consider some of the most well-known problems in discrepancy theory in the above online stochastic setting, and give algorithms that match the known offline bounds up to $\mathsf{polylog}(nT)$ factors. This substantially generalizes and improves upon the previous results of Bansal, Jiang, Singla, and Sinha (STOC' 20). In particular, for the Komlós problem where $\|v_t\|_2\leq 1$ for each $t$, our algorithm achieves $\tilde{O}(1)$ discrepancy with high probability, improving upon the previous $\tilde{O}(n^{3/2})$ bound. For Tusnády's problem of minimizing the discrepancy of axis-aligned boxes, we obtain an $O(\log^{d+4} T)$ bound for arbitrary distribution over points. Previous techniques only worked for product distributions and gave a weaker $O(\log^{2d+1} T)$ bound. We also consider the Banaszczyk setting, where given a symmetric convex body $K$ with Gaussian measure at least $1/2$, our algorithm achieves $\tilde{O}(1)$ discrepancy with respect to the norm given by $K$ for input distributions with sub-exponential tails. Our key idea is to introduce a potential that also enforces constraints on how the discrepancy vector evolves, allowing us to maintain certain anti-concentration properties. For the Banaszczyk setting, we further enhance this potential by combining it with ideas from generic chaining. Finally, we also extend these results to the setting of online multi-color discrepancy.

preprint2020arXiv

Online Vector Balancing and Geometric Discrepancy

We consider an online vector balancing question where $T$ vectors, chosen from an arbitrary distribution over $[-1,1]^n$, arrive one-by-one and must be immediately given a $\pm$ sign. The goal is to keep the discrepancy small as possible. A concrete example is the online interval discrepancy problem where T points are sampled uniformly in [0,1], and the goal is to immediately color them $\pm$ such that every sub-interval remains nearly balanced. As random coloring incurs $Ω(T^{1/2})$ discrepancy, while the offline bounds are $Θ(\sqrt{n \log (T/n)})$ for vector balancing and $1$ for interval balancing, a natural question is whether one can (nearly) match the offline bounds in the online setting for these problems. One must utilize the stochasticity as in the worst-case scenario it is known that discrepancy is $Ω(T^{1/2})$ for any online algorithm. Bansal and Spencer recently show an $O(\sqrt{n}\log T)$ bound when each coordinate is independent. When there are dependencies among the coordinates, the problem becomes much more challenging, as evidenced by a recent work of Jiang, Kulkarni, and Singla that gives a non-trivial $O(T^{1/\log\log T})$ bound for online interval discrepancy. Although this beats random coloring, it is still far from the offline bound. In this work, we introduce a new framework for online vector balancing when the input distribution has dependencies across coordinates. This lets us obtain a $poly(n, \log T)$ bound for online vector balancing under arbitrary input distributions, and a $poly(\log T)$ bound for online interval discrepancy. Our framework is powerful enough to capture other well-studied geometric discrepancy problems; e.g., a $poly(\log^d (T))$ bound for the online $d$-dimensional Tusnády's problem. A key new technical ingredient is an {anti-concentration} inequality for sums of pairwise uncorrelated random variables.

preprint2020arXiv

Prophet Inequalities with Linear Correlations and Augmentations

In a classical online decision problem, a decision-maker who is trying to maximize her value inspects a sequence of arriving items to learn their values (drawn from known distributions), and decides when to stop the process by taking the current item. The goal is to prove a "prophet inequality": that she can do approximately as well as a prophet with foreknowledge of all the values. In this work, we investigate this problem when the values are allowed to be correlated. Since non-trivial guarantees are impossible for arbitrary correlations, we consider a natural "linear" correlation structure introduced by Bateni et al. [ESA 2015] as a generalization of the common-base value model of Chawla et al. [GEB 2015]. A key challenge is that threshold-based algorithms, which are commonly used for prophet inequalities, no longer guarantee good performance for linear correlations. We relate this roadblock to another "augmentations" challenge that might be of independent interest: many existing prophet inequality algorithms are not robust to slight increase in the values of the arriving items. We leverage this intuition to prove bounds (matching up to constant factors) that decay gracefully with the amount of correlation of the arriving items. We extend these results to the case of selecting multiple items by designing a new $(1+o(1))$ approximation ratio algorithm that is robust to augmentations.

preprint2020arXiv

Random-Order Models

This chapter introduces the \emph{random-order model} in online algorithms. In this model, the input is chosen by an adversary, then randomly permuted before being presented to the algorithm. This reshuffling often weakens the power of the adversary and allows for improved algorithmic guarantees. We show such improvements for two broad classes of problems: packing problems where we must pick a constrained set of items to maximize total value, and covering problems where we must satisfy given requirements at minimum total cost. We also discuss how random-order model relates to other stochastic models used for non-worst-case competitive analysis.

preprint2020arXiv

Second-Order Provable Defenses against Adversarial Attacks

A robustness certificate is the minimum distance of a given input to the decision boundary of the classifier (or its lower bound). For {\it any} input perturbations with a magnitude smaller than the certificate value, the classification output will provably remain unchanged. Exactly computing the robustness certificates for neural networks is difficult since it requires solving a non-convex optimization. In this paper, we provide computationally-efficient robustness certificates for neural networks with differentiable activation functions in two steps. First, we show that if the eigenvalues of the Hessian of the network are bounded, we can compute a robustness certificate in the $l_2$ norm efficiently using convex optimization. Second, we derive a computationally-efficient differentiable upper bound on the curvature of a deep network. We also use the curvature bound as a regularization term during the training of the network to boost its certified robustness. Putting these results together leads to our proposed {\bf C}urvature-based {\bf R}obustness {\bf C}ertificate (CRC) and {\bf C}urvature-based {\bf R}obust {\bf T}raining (CRT). Our numerical results show that CRT leads to significantly higher certified robust accuracy compared to interval-bound propagation (IBP) based training. We achieve certified robust accuracy 69.79\%, 57.78\% and 53.19\% while IBP-based methods achieve 44.96\%, 44.74\% and 44.66\% on 2,3 and 4 layer networks respectively on the MNIST-dataset.

preprint2019arXiv

Approximation Schemes for a Unit-Demand Buyer with Independent Items via Symmetries

We consider a revenue-maximizing seller with $n$ items facing a single buyer. We introduce the notion of symmetric menu complexity of a mechanism, which counts the number of distinct options the buyer may purchase, up to permutations of the items. Our main result is that a mechanism of quasi-polynomial symmetric menu complexity suffices to guarantee a $(1-\varepsilon)$-approximation when the buyer is unit-demand over independent items, even when the value distribution is unbounded, and that this mechanism can be found in quasi-polynomial time. Our key technical result is a polynomial time, (symmetric) menu-complexity-preserving black-box reduction from achieving a $(1-\varepsilon)$-approximation for unbounded valuations that are subadditive over independent items to achieving a $(1-O(\varepsilon))$-approximation when the values are bounded (and still subadditive over independent items). We further apply this reduction to deduce approximation schemes for a suite of valuation classes beyond our main result. Finally, we show that selling separately (which has exponential menu complexity) can be approximated up to a $(1-\varepsilon)$ factor with a menu of efficient-linear $(f(\varepsilon) \cdot n)$ symmetric menu complexity.

preprint2016arXiv

Adaptivity Gaps for Stochastic Probing: Submodular and XOS Functions

Suppose we are given a submodular function $f$ over a set of elements, and we want to maximize its value subject to certain constraints. Good approximation algorithms are known for such problems under both monotone and non-monotone submodular functions. We consider these problems in a stochastic setting, where elements are not all active and we can only get value from active elements. Each element $e$ is active independently with some known probability $p_e$, but we don't know the element's status \emph{a priori}. We find it out only when we \emph{probe} the element $e$---probing reveals whether it's active or not, whereafter we can use this information to decide which other elements to probe. Eventually, if we have a probed set $S$ and a subset $\text{active}(S)$ of active elements in $S$, we can pick any $T \subseteq \text{active}(S)$ and get value $f(T)$. Moreover, the sequence of elements we probe must satisfy a given \emph{prefix-closed constraint}---e.g., these may be given by a matroid, or an orienteering constraint, or deadline, or precedence constraint, or an arbitrary downward-closed constraint---if we can probe some sequence of elements we can probe any prefix of it. What is a good strategy to probe elements to maximize the expected value? In this paper we study the gap between adaptive and non-adaptive strategies for $f$ being a submodular or a fractionally subadditive (XOS) function. If this gap is small, we can focus on finding good non-adaptive strategies instead, which are easier to find as well as to represent. We show that the adaptivity gap is a constant for monotone and non-monotone submodular functions, and logarithmic for XOS functions of small \emph{width}. These bounds are nearly tight. Our techniques show new ways of arguing about the optimal adaptive decision tree for stochastic problems.

preprint2016arXiv

Combinatorial Prophet Inequalities

We introduce a novel framework of Prophet Inequalities for combinatorial valuation functions. For a (non-monotone) submodular objective function over an arbitrary matroid feasibility constraint, we give an $O(1)$-competitive algorithm. For a monotone subadditive objective function over an arbitrary downward-closed feasibility constraint, we give an $O(\log n \log^2 r)$-competitive algorithm (where $r$ is the cardinality of the largest feasible subset). Inspired by the proof of our subadditive prophet inequality, we also obtain an $O(\log n \cdot \log^2 r)$-competitive algorithm for the Secretary Problem with a monotone subadditive objective function subject to an arbitrary downward-closed feasibility constraint. Even for the special case of a cardinality feasibility constraint, our algorithm circumvents an $Ω(\sqrt{n})$ lower bound by Bateni, Hajiaghayi, and Zadimoghaddam \cite{BHZ13-submodular-secretary_original} in a restricted query model. En route to our submodular prophet inequality, we prove a technical result of independent interest: we show a variant of the Correlation Gap Lemma for non-monotone submodular functions.

preprint2016arXiv

How to morph planar graph drawings

Given an $n$-vertex graph and two straight-line planar drawings of the graph that have the same faces and the same outer face, we show that there is a morph (i.e., a continuous transformation) between the two drawings that preserves straight-line planarity and consists of $O(n)$ steps, which we prove is optimal in the worst case. Each step is a unidirectional linear morph, which means that every vertex moves at constant speed along a straight line, and the lines are parallel although the vertex speeds may differ. Thus we provide an efficient version of Cairns' 1944 proof of the existence of straight-line planarity-preserving morphs for triangulated graphs, which required an exponential number of steps.

preprint2014arXiv

Exact Analysis of TTL Cache Networks: The Case of Caching Policies driven by Stopping Times

TTL caching models have recently regained significant research interest, largely due to their ability to fit popular caching policies such as LRU. This paper advances the state-of-the-art analysis of TTL-based cache networks by developing two exact methods with orthogonal generality and computational complexity. The first method generalizes existing results for line networks under renewal requests to the broad class of caching policies whereby evictions are driven by stopping times. The obtained results are further generalized, using the second method, to feedforward networks with Markov arrival processes (MAP) requests. MAPs are particularly suitable for non-line networks because they are closed not only under superposition and splitting, as known, but also under input-output caching operations as proven herein for phase-type TTL distributions. The crucial benefit of the two closure properties is that they jointly enable the first exact analysis of feedforward networks of TTL caches in great generality.

preprint2014arXiv

On Integrality Ratios for Asymmetric TSP in the Sherali-Adams Hierarchy

We study the ATSP (Asymmetric Traveling Salesman Problem), and our focus is on negative results in the framework of the Sherali-Adams (SA) Lift and Project method. Our main result pertains to the standard LP (linear programming) relaxation of ATSP, due to Dantzig, Fulkerson, and Johnson. For any fixed integer $t\geq 0$ and small $ε$, $0<ε\ll{1}$, there exists a digraph $G$ on $ν=ν(t,ε)=O(t/ε)$ vertices such that the integrality ratio for level~$t$ of the SA system starting with the standard LP on $G$ is $\ge 1+\frac{1-ε}{2t+3} \approx \frac43, \frac65, \frac87, \dots$. Thus, in terms of the input size, the result holds for any $t = 0,1,\dots,Θ(ν)$ levels. Our key contribution is to identify a structural property of digraphs that allows us to construct fractional feasible solutions for any level~$t$ of the SA system starting from the standard~LP. Our hard instances are simple and satisfy the structural property. There is a further relaxation of the standard LP called the balanced LP, and our methods simplify considerably when the starting LP for the SA system is the balanced~LP; in particular, the relevant structural property (of digraphs) simplifies such that it is satisfied by the digraphs given by the well-known construction of Charikar, Goemans and Karloff (CGK). Consequently, the CGK digraphs serve as hard instances, and we obtain an integrality ratio of $1 +\frac{1-ε}{t+1}$ for any level~$t$ of the SA system, where $0<ε\ll{1}$ and the number of vertices is $ν(t,ε)=O((t/ε)^{(t/ε)})$. Also, our results for the standard~LP extend to the Path-ATSP (find a min cost Hamiltonian dipath from a given source vertex to a given sink vertex).

preprint2010arXiv

Exhaustive Verification of Weak Reconstruction For Self Complementary Graphs

This paper presents an exhaustive approach for verification of the weak reconstruction of Self Complementary Graphs up to 17 vertices. It describes the general problem of the Reconstruction Conjecture, explaining the complexity involved in checking deck-isomorphism between two graphs. In order to improve the computation time, various pruning techniques have been employed to reduce the number of graph-isomorphism comparisons. These techniques offer great help in proceeding with a reconstructive approach. An analysis of the numbers involved is provided, along with the various limitations of this approach. A list enumerating the number of SC graphs up till 101 vertices is also appended.

Sahil Singla

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Core Risk Minimization using Salient ImageNet

Improved deterministic l2 robustness on CIFAR-10 and CIFAR-100

Salient ImageNet: How to discover spurious features in Deep Learning?

Smoothed Analysis of the Komlós Conjecture

Submodular Dominance and Applications

Fairness Through Robustness: Investigating Robustness Disparity in Deep Learning

Online Discrepancy Minimization for Stochastic Arrivals

Online Vector Balancing and Geometric Discrepancy

Prophet Inequalities with Linear Correlations and Augmentations

Random-Order Models

Second-Order Provable Defenses against Adversarial Attacks

Approximation Schemes for a Unit-Demand Buyer with Independent Items via Symmetries

Adaptivity Gaps for Stochastic Probing: Submodular and XOS Functions

Combinatorial Prophet Inequalities

How to morph planar graph drawings

Exact Analysis of TTL Cache Networks: The Case of Caching Policies driven by Stopping Times

On Integrality Ratios for Asymmetric TSP in the Sherali-Adams Hierarchy

Exhaustive Verification of Weak Reconstruction For Self Complementary Graphs