Researcher profile

Sanjoy Dasgupta

Sanjoy Dasgupta contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

A Theoretical Perspective on Hyperdimensional Computing

Hyperdimensional (HD) computing is a set of neurally inspired methods for obtaining high-dimensional, low-precision, distributed representations of data. These representations can be combined with simple, neurally plausible algorithms to effect a variety of information processing tasks. HD computing has recently garnered significant interest from the computer hardware community as an energy-efficient, low-latency, and noise-robust tool for solving learning problems. In this review, we present a unified treatment of the theoretical foundations of HD computing with a focus on the suitability of representations for learning.

preprint2022arXiv

Framework for Evaluating Faithfulness of Local Explanations

We study the faithfulness of an explanation system to the underlying prediction model. We show that this can be captured by two properties, consistency and sufficiency, and introduce quantitative measures of the extent to which these hold. Interestingly, these measures depend on the test-time data distribution. For a variety of existing explanation systems, such as anchors, we analytically study these quantities. We also provide estimators and sample complexity bounds for empirically determining the faithfulness of black-box explanation systems. Finally, we experimentally validate the new properties and estimators.

preprint2022arXiv

Online $k$-means Clustering on Arbitrary Data Streams

We consider online $k$-means clustering where each new point is assigned to the nearest cluster center, after which the algorithm may update its centers. The loss incurred is the sum of squared distances from new points to their assigned cluster centers. The goal over a data stream $X$ is to achieve loss that is a constant factor of $L(X, OPT_k)$, the best possible loss using $k$ fixed points in hindsight. We propose a data parameter, $Λ(X)$, such that for any algorithm maintaining $O(k\text{poly}(\log n))$ centers at time $n$, there exists a data stream $X$ for which a loss of $Ω(Λ(X))$ is inevitable. We then give a randomized algorithm that achieves clustering loss $O(Λ(X) + L(X, OPT_k))$. Our algorithm uses $O(k\text{poly}(\log n))$ memory and maintains $O(k\text{poly}(\log n))$ cluster centers. Our algorithm also enjoys a running time of $O(k\text{poly}(\log n))$ and is the first algorithm to achieve polynomial space and time complexity in this setting. It also is the first to have provable guarantees without making any assumptions on the input data.

preprint2020arXiv

A Non-Parametric Test to Detect Data-Copying in Generative Models

Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call {\em{data-copying}} -- where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample non-parametric test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets. For code \& examples, visit https://github.com/casey-meehan/data-copying

preprint2020arXiv

Explainable $k$-Means and $k$-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data. Unfortunately, many clustering algorithms lead to cluster assignments that are hard to explain, partially because they depend on all the features of the data in a complicated way. To improve interpretability, we consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner. We study this problem from a theoretical viewpoint, measuring cluster quality by the $k$-means and $k$-medians objectives: Must there exist a tree-induced clustering whose cost is comparable to that of the best unconstrained clustering, and if so, how can it be found? In terms of negative results, we show, first, that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and second, that any tree-induced clustering must in general incur an $Ω(\log k)$ approximation factor compared to the optimal clustering. On the positive side, we design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves. For two means/medians, we show that a single threshold cut suffices to achieve a constant factor approximation, and we give nearly-matching lower bounds. For general $k \geq 2$, our algorithm is an $O(k)$ approximation to the optimal $k$-medians and an $O(k^2)$ approximation to the optimal $k$-means. Prior to our work, no algorithms were known with provable guarantees independent of dimension and input size.

preprint2020arXiv

Expressivity of expand-and-sparsify representations

A simple sparse coding mechanism appears in the sensory systems of several organisms: to a coarse approximation, an input $x \in \R^d$ is mapped to much higher dimension $m \gg d$ by a random linear transformation, and is then sparsified by a winner-take-all process in which only the positions of the top $k$ values are retained, yielding a $k$-sparse vector $z \in \{0,1\}^m$. We study the benefits of this representation for subsequent learning. We first show a universal approximation property, that arbitrary continuous functions of $x$ are well approximated by linear functions of $z$, provided $m$ is large enough. This can be interpreted as saying that $z$ unpacks the information in $x$ and makes it more readily accessible. The linear functions can be specified explicitly and are easy to learn, and we give bounds on how large $m$ needs to be as a function of the input dimension $d$ and the smoothness of the target function. Next, we consider whether the representation is adaptive to manifold structure in the input space. This is highly dependent on the specific method of sparsification: we show that adaptivity is not obtained under the winner-take-all mechanism, but does hold under a slight variant. Finally we consider mappings to the representation space that are random but are attuned to the data distribution, and we give favorable approximation bounds in this setting.