Researcher profile

Arushi Gupta

Arushi Gupta contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
1topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

RubiConv -- Efficient Boundary-Respecting Convolutions

Convolutional architectures have emerged as powerful alternatives to Transformers for sequence modeling. The primary advantage is that they offer improved theoretical sequence length complexity by leveraging the Fast Fourier Transform (FFT). However, this theoretical improvement does not always meaningfully land in practice. One critical obstacle is that applying standard FFTs is not amenable to the large-scale training pipeline wherein data is packed from different sources into a single sequence for hardware efficiency. Indeed, standard FFT algorithms are not easily amenable to document packing. Existing workarounds suffer from severe inefficiencies, crippling the practical performance of convolutional architectures. We close this gap with RubiConv, a novel algorithm for performing hardware-efficient, boundary-respecting convolutions on packed sequences. Extensive experiments show that RubiConv achieves significant speedups over both attention and standard FFT-based baselines. This work makes the theoretical efficiency of long convolutional models a practical reality for large-scale, real-world data packing.

preprint2022arXiv

On Predicting Generalization using GANs

Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters. While generalization bounds can give many insights about architecture design, training algorithms, etc., what they do not currently do is yield good predictions for actual test error. A recently introduced Predicting Generalization in Deep Learning competition~\citep{jiang2020neurips} aims to encourage discovery of methods to better predict test error. The current paper investigates a simple idea: can test error be predicted using {\em synthetic data,} produced using a Generative Adversarial Network (GAN) that was trained on the same training dataset? Upon investigating several GAN models and architectures, we find that this turns out to be the case. In fact, using GANs pre-trained on standard datasets, the test error can be predicted without requiring any additional hyper-parameter tuning. This result is surprising because GANs have well-known limitations (e.g. mode collapse) and are known to not learn the data distribution accurately. Yet the generated samples are good enough to substitute for test data. Several additional experiments are presented to explore reasons why GANs do well at this task. In addition to a new approach for predicting generalization, the counter-intuitive phenomena presented in our work may also call for a better understanding of GANs' strengths and limitations.

preprint2020arXiv

Inherent Noise in Gradient Based Methods

Previous work has examined the ability of larger capacity neural networks to generalize better than smaller ones, even without explicit regularizers, by analyzing gradient based algorithms such as GD and SGD. The presence of noise and its effect on robustness to parameter perturbations has been linked to generalization. We examine a property of GD and SGD, namely that instead of iterating through all scalar weights in the network and updating them one by one, GD (and SGD) updates all the parameters at the same time. As a result, each parameter $w^i$ calculates its partial derivative at the stale parameter $\mathbf{w_t}$, but then suffers loss $\hat{L}(\mathbf{w_{t+1}})$. We show that this causes noise to be introduced into the optimization. We find that this noise penalizes models that are sensitive to perturbations in the weights. We find that penalties are most pronounced for batches that are currently being used to update, and are higher for larger models.