Source author record

Chinmay Hegde

Chinmay Hegde appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Information Theory math.IT Artificial Intelligence Computational Complexity Cryptography and Security Data Structures and Algorithms Graphics

Catalog footprint

What is connected

21works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ADKO: Agentic Decentralized Knowledge Optimization

We present Agentic Decentralized Knowledge Optimization (ADKO), a framework for collaborative black-box optimization across autonomous agents that achieves sample efficiency, privacy preservation, heterogeneous-objective handling, and communication efficiency. Each agent maintains a private Gaussian Process (GP) surrogate trained on local data and communicates only through knowledge tokens-compact, lossy summaries containing directional signals, advantage scores, and optional language-model (LM) insights-without sharing raw data or model parameters. ADKO unifies GP-Upper Confidence Bound (GP-UCB), parallel Bayesian optimization, decentralized learning, and LM-guided discovery. We provide the first formal analysis of dual information loss: token compression, quantified via mutual-information-based fidelity, and LM approximation error, decomposed into bias and stochastic noise. Our main result shows cumulative regret decomposes into GP error, LM bias, LM noise, and compression loss, with necessary and sufficient conditions for sublinear regret. We also propose fidelity-aware token pruning to preserve high-information tokens under memory budget. Experiments on neural architecture search and scientific discovery validate the theory and show consistent improvements over strong baselines.

preprint2023arXiv

Pathfinding Neural Cellular Automata

Pathfinding makes up an important sub-component of a broad range of complex tasks in AI, such as robot path planning, transport routing, and game playing. While classical algorithms can efficiently compute shortest paths, neural networks could be better suited to adapting these sub-routines to more complex and intractable tasks. As a step toward developing such networks, we hand-code and learn models for Breadth-First Search (BFS), i.e. shortest path finding, using the unified architectural framework of Neural Cellular Automata, which are iterative neural networks with equal-size inputs and outputs. Similarly, we present a neural implementation of Depth-First Search (DFS), and outline how it can be combined with neural BFS to produce an NCA for computing diameter of a graph. We experiment with architectural modifications inspired by these hand-coded NCAs, training networks from scratch to solve the diameter problem on grid mazes while exhibiting strong generalization ability. Finally, we introduce a scheme in which data points are mutated adversarially during training. We find that adversarially evolving mazes leads to increased generalization on out-of-distribution examples, while at the same time generating data-sets with significantly more complex solutions for reasoning tasks.

preprint2022arXiv

A Meta-Analysis of Distributionally-Robust Models

State-of-the-art image classifiers trained on massive datasets (such as ImageNet) have been shown to be vulnerable to a range of both intentional and incidental distribution shifts. On the other hand, several recent classifiers with favorable out-of-distribution (OOD) robustness properties have emerged, achieving high accuracy on their target tasks while maintaining their in-distribution accuracy on challenging benchmarks. We present a meta-analysis on a wide range of publicly released models, most of which have been published over the last twelve months. Through this meta-analysis, we empirically identify four main commonalities for all the best-performing OOD-robust models, all of which illuminate the considerable promise of vision-language pre-training.

preprint2022arXiv

NURBS-Diff: A Differentiable Programming Module for NURBS

Boundary representations (B-reps) using Non-Uniform Rational B-splines (NURBS) are the de facto standard used in CAD, but their utility in deep learning-based approaches is not well researched. We propose a differentiable NURBS module to integrate NURBS representations of CAD models with deep learning methods. We mathematically define the derivatives of the NURBS curves or surfaces with respect to the input parameters (control points, weights, and the knot vector). These derivatives are used to define an approximate Jacobian used for performing the "backward" evaluation to train the deep learning models. We have implemented our NURBS module using GPU-accelerated algorithms and integrated it with PyTorch, a popular deep learning framework. We demonstrate the efficacy of our NURBS module in performing CAD operations such as curve or surface fitting and surface offsetting. Further, we show its utility in deep learning for unsupervised point cloud reconstruction and enforce analysis constraints. These examples show that our module performs better for certain deep learning frameworks and can be directly integrated with any deep-learning framework requiring NURBS.

preprint2022arXiv

On The Computational Complexity of Self-Attention

Transformer architectures have led to remarkable progress in many state-of-art applications. However, despite their successes, modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. Several approaches have been proposed to speed up self-attention mechanisms to achieve sub-quadratic running time; however, the large majority of these works are not accompanied by rigorous error guarantees. In this work, we establish lower bounds on the computational complexity of self-attention in a number of scenarios. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. This argument holds even if the attention computation is performed only approximately, and for a variety of attention mechanisms. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time, at the cost of having an exponential dependence on the polynomial order.

preprint2022arXiv

One-Shot Neural Architecture Search via Compressive Sensing

Neural Architecture Search remains a very challenging meta-learning problem. Several recent techniques based on parameter-sharing idea have focused on reducing the NAS running time by leveraging proxy models, leading to architectures with competitive performance compared to those with hand-crafted designs. In this paper, we propose an iterative technique for NAS, inspired by algorithms for learning low-degree sparse Boolean functions. We validate our approach on the DARTs search space (Liu et al., 2018b) and NAS-Bench-201 (Yang et al., 2020). In addition, we provide theoretical analysis via upper bounds on the number of validation error measurements needed for reliable learning, and include ablation studies to further in-depth understanding of our technique.

preprint2022arXiv

Revisiting Self-Distillation

Knowledge distillation is the procedure of transferring "knowledge" from a large model (the teacher) to a more compact one (the student), often being used in the context of model compression. When both models have the same architecture, this procedure is called self-distillation. Several works have anecdotally shown that a self-distilled student can outperform the teacher on held-out data. In this work, we systematically study self-distillation in a number of settings. We first show that even with a highly accurate teacher, self-distillation allows a student to surpass the teacher in all cases. Secondly, we revisit existing theoretical explanations of (self) distillation and identify contradicting examples, revealing possible drawbacks of these explanations. Finally, we provide an alternative explanation for the dynamics of self-distillation through the lens of loss landscape geometry. We conduct extensive experiments to show that self-distillation leads to flatter minima, thereby resulting in better generalization.

preprint2022arXiv

Selective Network Linearization for Efficient Private Inference

Private inference (PI) enables inference directly on cryptographically secure data.While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to $4.25\%$ more accuracy (iso-ReLU count at 50K) or $2.2\times$ less latency (iso-accuracy at 70\%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a "no free lunch" theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy. Public code is available at \url{https://github.com/NYU-DICE-Lab/selective_network_linearization}.

preprint2022arXiv

Smooth-Reduce: Leveraging Patches for Improved Certified Robustness

Randomized smoothing (RS) has been shown to be a fast, scalable technique for certifying the robustness of deep neural network classifiers. However, methods based on RS require augmenting data with large amounts of noise, which leads to significant drops in accuracy. We propose a training-free, modified smoothing approach, Smooth-Reduce, that leverages patching and aggregation to provide improved classifier certificates. Our algorithm classifies overlapping patches extracted from an input image, and aggregates the predicted logits to certify a larger radius around the input. We study two aggregation schemes -- max and mean -- and show that both approaches provide better certificates in terms of certified accuracy, average certified radii and abstention rates as compared to concurrent approaches. We also provide theoretical guarantees for such certificates, and empirically show significant improvements over other randomized smoothing methods that require expensive retraining. Further, we extend our approach to videos and provide meaningful certificates for video classifiers. A project page can be found at https://nyu-dice-lab.github.io/SmoothReduce/

preprint2021arXiv

Adversarially Robust Learning via Entropic Regularization

In this paper we propose a new family of algorithms, ATENT, for training adversarially robust deep neural networks. We formulate a new loss function that is equipped with an additional entropic regularization. Our loss function considers the contribution of adversarial samples that are drawn from a specially designed distribution in the data space that assigns high probability to points with high loss and in the immediate neighborhood of training samples. Our proposed algorithms optimize this loss to seek adversarially robust valleys of the loss landscape. Our approach achieves competitive (or better) performance in terms of robust classification accuracy as compared to several state-of-the-art robust learning approaches on benchmark datasets such as MNIST and CIFAR-10.

preprint2021arXiv

Provable Compressed Sensing with Generative Priors via Langevin Dynamics

Deep generative models have emerged as a powerful class of priors for signals in various inverse problems such as compressed sensing, phase retrieval and super-resolution. Here, we assume an unknown signal to lie in the range of some pre-trained generative model. A popular approach for signal recovery is via gradient descent in the low-dimensional latent space. While gradient descent has achieved good empirical performance, its theoretical behavior is not well understood. In this paper, we introduce the use of stochastic gradient Langevin dynamics (SGLD) for compressed sensing with a generative prior. Under mild assumptions on the generative model, we prove the convergence of SGLD to the true signal. We also demonstrate competitive empirical performance to standard gradient descent.

preprint2020arXiv

Algorithmic Guarantees for Inverse Imaging with Untrained Network Priors

Deep neural networks as image priors have been recently introduced for problems such as denoising, super-resolution and inpainting with promising performance gains over hand-crafted image priors such as sparsity and low-rank. Unlike learned generative priors they do not require any training over large datasets. However, few theoretical guarantees exist in the scope of using untrained neural network priors for inverse imaging problems. We explore new applications and theory for untrained neural network priors. Specifically, we consider the problem of solving linear inverse problems, such as compressive sensing, as well as non-linear problems, such as compressive phase retrieval. We model images to lie in the range of an untrained deep generative network with a fixed seed. We further present a projected gradient descent scheme that can be used for both compressive sensing and phase retrieval and provide rigorous theoretical guarantees for its convergence. We also show both theoretically as well as empirically that with deep network priors, one can achieve better compression rates for the same image quality compared to hand crafted priors.

preprint2020arXiv

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively over-parameterized. Consequently, over the past year, the community has devoted growing interest in analyzing optimization and generalization properties of over-parameterized networks, and several breakthrough works have led to important theoretical progress. However, the majority of existing work only applies to supervised learning scenarios and hence are limited to settings such as classification and regression. In contrast, the role of over-parameterization in the unsupervised setting has gained far less attention. In this paper, we study the gradient dynamics of two-layer over-parameterized autoencoders with ReLU activation. We make very few assumptions about the given training dataset (other than mild non-degeneracy conditions). Starting from a randomly initialized autoencoder network, we rigorously prove the linear convergence of gradient descent in two learning regimes, namely: (i) the weakly-trained regime where only the encoder is trained, and (ii) the jointly-trained regime where both the encoder and the decoder are trained. Our results indicate the considerable benefits of joint training over weak training for finding global optima, achieving a dramatic decrease in the required level of over-parameterization. We also analyze the case of weight-tied autoencoders (which is a commonly used architectural choice in practical settings) and prove that in the over-parameterized setting, training such networks from randomly initialized points leads to certain unexpected degeneracies.

preprint2020arXiv

Deep Generative Models that Solve PDEs: Distributed Computing for Training Large Data-Free Models

Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs). Several (nearly data free) approaches have been recently reported that successfully solve PDEs, with examples including deep feed forward networks, generative networks, and deep encoder-decoder networks. However, practical adoption of these approaches is limited by the difficulty in training these models, especially to make predictions at large output resolutions ($\geq 1024 \times 1024$). Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models - training in reasonable time as well as distributing the storage requirements. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods. We show excellent scalability of this framework on both cloud as well as HPC clusters, and report on the interplay between bandwidth, network topology and bare metal vs cloud. We deploy this approach to train generative models of sizes hitherto not possible, showing that neural PDE solvers can be viably trained for practical applications. We also demonstrate that distributed higher-order optimization methods are $2-3\times$ faster than stochastic gradient-based methods and provide minimal convergence drift with higher batch-size.

preprint2020arXiv

ESPN: Extremely Sparse Pruned Networks

Deep neural networks are often highly overparameterized, prohibiting their use in compute-limited systems. However, a line of recent works has shown that the size of deep networks can be considerably reduced by identifying a subset of neuron indicators (or mask) that correspond to significant weights prior to training. We demonstrate that an simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks. Our algorithm represents a hybrid approach between single shot network pruning methods (such as SNIP) with Lottery-Ticket type approaches. We validate our approach on several datasets and outperform several existing pruning approaches in both test accuracy and compression ratio.

preprint2020arXiv

Hyperparameter Optimization in Neural Networks via Structured Sparse Recovery

In this paper, we study two important problems in the automated design of neural networks -- Hyper-parameter Optimization (HPO), and Neural Architecture Search (NAS) -- through the lens of sparse recovery methods. In the first part of this paper, we establish a novel connection between HPO and structured sparse recovery. In particular, we show that a special encoding of the hyperparameter space enables a natural group-sparse recovery formulation, which when coupled with HyperBand (a multi-armed bandit strategy), leads to improvement over existing hyperparameter optimization methods. Experimental results on image datasets such as CIFAR-10 confirm the benefits of our approach. In the second part of this paper, we establish a connection between NAS and structured sparse recovery. Building upon ``one-shot'' approaches in NAS, we propose a novel algorithm that we call CoNAS by merging ideas from one-shot approaches with a techniques for learning low-degree sparse Boolean polynomials. We provide theoretical analysis on the number of validation error measurements. Finally, we validate our approach on several datasets and discover novel architectures hitherto unreported, achieving competitive (or better) results in both performance and search time compared to the existing NAS approaches.

preprint2015arXiv

Approximation Algorithms for Model-Based Compressive Sensing

Compressive Sensing (CS) stipulates that a sparse signal can be recovered from a small number of linear measurements, and that this recovery can be performed efficiently in polynomial time. The framework of model-based compressive sensing (model-CS) leverages additional structure in the signal and prescribes new recovery schemes that can reduce the number of measurements even further. However, model-CS requires an algorithm that solves the model-projection problem: given a query signal, produce the signal in the model that is also closest to the query signal. Often, this optimization can be computationally very expensive. Moreover, an approximation algorithm is not sufficient for this optimization task. As a result, the model-projection problem poses a fundamental obstacle for extending model-CS to many interesting models. In this paper, we introduce a new framework that we call approximation-tolerant model-based compressive sensing. This framework includes a range of algorithms for sparse recovery that require only approximate solutions for the model-projection problem. In essence, our work removes the aforementioned obstacle to model-based compressive sensing, thereby extending the applicability of model-CS to a much wider class of models. We instantiate this new framework for the Constrained Earth Mover Distance (CEMD) model, which is particularly useful for signal ensembles where the positions of the nonzero coefficients do not change significantly as a function of spatial (or temporal) location. We develop novel approximation algorithms for both the maximization and the minimization versions of the model-projection problem via graph optimization techniques. Leveraging these algorithms into our framework results in a nearly sample-optimal sparse recovery scheme for the CEMD model.

preprint2015arXiv

Efficient Upsampling of Natural Images

We propose a novel method of efficient upsampling of a single natural image. Current methods for image upsampling tend to produce high-resolution images with either blurry salient edges, or loss of fine textural detail, or spurious noise artifacts. In our method, we mitigate these effects by modeling the input image as a sum of edge and detail layers, operating upon these layers separately, and merging the upscaled results in an automatic fashion. We formulate the upsampled output image as the solution to a non-convex energy minimization problem, and propose an algorithm to obtain a tractable approximate solution. Our algorithm comprises two main stages. 1) For the edge layer, we use a nonparametric approach by constructing a dictionary of patches from a given image, and synthesize edge regions in a higher-resolution version of the image. 2) For the detail layer, we use a global parametric texture enhancement approach to synthesize detail regions across the image. We demonstrate that our method is able to accurately reproduce sharp edges as well as synthesize photorealistic textures, while avoiding common artifacts such as ringing and haloing. In addition, our method involves no training phase or estimation of model parameters, and is easily parallelizable. We demonstrate the utility of our method on a number of challenging standard test photos.

preprint2012arXiv

Signal Recovery on Incoherent Manifolds

Suppose that we observe noisy linear measurements of an unknown signal that can be modeled as the sum of two component signals, each of which arises from a nonlinear sub-manifold of a high dimensional ambient space. We introduce SPIN, a first order projected gradient method to recover the signal components. Despite the nonconvex nature of the recovery problem and the possibility of underdetermined measurements, SPIN provably recovers the signal components, provided that the signal manifolds are incoherent and that the measurement operator satisfies a certain restricted isometry property. SPIN significantly extends the scope of current recovery models and algorithms for low dimensional linear inverse problems and matches (or exceeds) the current state of the art in terms of performance.

preprint2010arXiv

Sampling and Recovery of Pulse Streams

Compressive Sensing (CS) is a new technique for the efficient acquisition of signals, images, and other data that have a sparse representation in some basis, frame, or dictionary. By sparse we mean that the N-dimensional basis representation has just K<<N significant coefficients; in this case, the CS theory maintains that just M = K log N random linear signal measurements will both preserve all of the signal information and enable robust signal reconstruction in polynomial time. In this paper, we extend the CS theory to pulse stream data, which correspond to S-sparse signals/images that are convolved with an unknown F-sparse pulse shape. Ignoring their convolutional structure, a pulse stream signal is K=SF sparse. Such signals figure prominently in a number of applications, from neuroscience to astronomy. Our specific contributions are threefold. First, we propose a pulse stream signal model and show that it is equivalent to an infinite union of subspaces. Second, we derive a lower bound on the number of measurements M required to preserve the essential information present in pulse streams. The bound is linear in the total number of degrees of freedom S + F, which is significantly smaller than the naive bound based on the total signal sparsity K=SF. Third, we develop an efficient signal recovery algorithm that infers both the shape of the impulse response as well as the locations and amplitudes of the pulses. The algorithm alternatively estimates the pulse locations and the pulse shape in a manner reminiscent of classical deconvolution algorithms. Numerical experiments on synthetic and real data demonstrate the advantages of our approach over standard CS.

preprint2009arXiv

Model-Based Compressive Sensing

Compressive sensing (CS) is an alternative to Shannon/Nyquist sampling for the acquisition of sparse or compressible signals that can be well approximated by just K << N elements from an N-dimensional basis. Instead of taking periodic samples, CS measures inner products with M < N random vectors and then recovers the signal via a sparsity-seeking optimization or greedy algorithm. Standard CS dictates that robust signal recovery is possible from M = O(K log(N/K)) measurements. It is possible to substantially decrease M without sacrificing robustness by leveraging more realistic signal models that go beyond simple sparsity and compressibility by including structural dependencies between the values and locations of the signal coefficients. This paper introduces a model-based CS theory that parallels the conventional theory and provides concrete guidelines on how to create model-based recovery algorithms with provable performance guarantees. A highlight is the introduction of a new class of structured compressible signals along with a new sufficient condition for robust structured compressible signal recovery that we dub the restricted amplification property, which is the natural counterpart to the restricted isometry property of conventional CS. Two examples integrate two relevant signal models - wavelet trees and block sparsity - into two state-of-the-art CS recovery algorithms and prove that they offer robust recovery from just M=O(K) measurements. Extensive numerical simulations demonstrate the validity and applicability of our new theory and algorithms.

Chinmay Hegde

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

ADKO: Agentic Decentralized Knowledge Optimization

Pathfinding Neural Cellular Automata

A Meta-Analysis of Distributionally-Robust Models

NURBS-Diff: A Differentiable Programming Module for NURBS

On The Computational Complexity of Self-Attention

One-Shot Neural Architecture Search via Compressive Sensing

Revisiting Self-Distillation

Selective Network Linearization for Efficient Private Inference

Smooth-Reduce: Leveraging Patches for Improved Certified Robustness

Adversarially Robust Learning via Entropic Regularization

Provable Compressed Sensing with Generative Priors via Langevin Dynamics

Algorithmic Guarantees for Inverse Imaging with Untrained Network Priors

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

Deep Generative Models that Solve PDEs: Distributed Computing for Training Large Data-Free Models

ESPN: Extremely Sparse Pruned Networks

Hyperparameter Optimization in Neural Networks via Structured Sparse Recovery

Approximation Algorithms for Model-Based Compressive Sensing

Efficient Upsampling of Natural Images

Signal Recovery on Incoherent Manifolds

Sampling and Recovery of Pulse Streams

Model-Based Compressive Sensing