Source author record

Arushi Gupta

Arushi Gupta appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning astro-ph.CO

Catalog footprint

What is connected

4works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RubiConv -- Efficient Boundary-Respecting Convolutions

Convolutional architectures have emerged as powerful alternatives to Transformers for sequence modeling. The primary advantage is that they offer improved theoretical sequence length complexity by leveraging the Fast Fourier Transform (FFT). However, this theoretical improvement does not always meaningfully land in practice. One critical obstacle is that applying standard FFTs is not amenable to the large-scale training pipeline wherein data is packed from different sources into a single sequence for hardware efficiency. Indeed, standard FFT algorithms are not easily amenable to document packing. Existing workarounds suffer from severe inefficiencies, crippling the practical performance of convolutional architectures. We close this gap with RubiConv, a novel algorithm for performing hardware-efficient, boundary-respecting convolutions on packed sequences. Extensive experiments show that RubiConv achieves significant speedups over both attention and standard FFT-based baselines. This work makes the theoretical efficiency of long convolutional models a practical reality for large-scale, real-world data packing.

preprint2022arXiv

On Predicting Generalization using GANs

Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters. While generalization bounds can give many insights about architecture design, training algorithms, etc., what they do not currently do is yield good predictions for actual test error. A recently introduced Predicting Generalization in Deep Learning competition~\citep{jiang2020neurips} aims to encourage discovery of methods to better predict test error. The current paper investigates a simple idea: can test error be predicted using {\em synthetic data,} produced using a Generative Adversarial Network (GAN) that was trained on the same training dataset? Upon investigating several GAN models and architectures, we find that this turns out to be the case. In fact, using GANs pre-trained on standard datasets, the test error can be predicted without requiring any additional hyper-parameter tuning. This result is surprising because GANs have well-known limitations (e.g. mode collapse) and are known to not learn the data distribution accurately. Yet the generated samples are good enough to substitute for test data. Several additional experiments are presented to explore reasons why GANs do well at this task. In addition to a new approach for predicting generalization, the counter-intuitive phenomena presented in our work may also call for a better understanding of GANs' strengths and limitations.

preprint2020arXiv

Inherent Noise in Gradient Based Methods

Previous work has examined the ability of larger capacity neural networks to generalize better than smaller ones, even without explicit regularizers, by analyzing gradient based algorithms such as GD and SGD. The presence of noise and its effect on robustness to parameter perturbations has been linked to generalization. We examine a property of GD and SGD, namely that instead of iterating through all scalar weights in the network and updating them one by one, GD (and SGD) updates all the parameters at the same time. As a result, each parameter $w^i$ calculates its partial derivative at the stale parameter $\mathbf{w_t}$, but then suffers loss $\hat{L}(\mathbf{w_{t+1}})$. We show that this causes noise to be introduced into the optimization. We find that this noise penalizes models that are sensitive to perturbations in the weights. We find that penalties are most pronounced for batches that are currently being used to update, and are higher for larger models.

preprint2016arXiv

Do dark matter halos explain lensing peaks?

We have investigated a recently proposed halo-based model, Camelus, for predicting weak-lensing peak counts, and compared its results over a collection of 162 cosmologies with those from N-body simulations. While counts from both models agree for peaks with $\mathcal{S/N}>1$ (where $\mathcal{S/N}$ is the ratio of the peak height to the r.m.s. shape noise), we find $\approx 50\%$ fewer counts for peaks near $\mathcal{S/N}=0$ and significantly higher counts in the negative $\mathcal{S/N}$ tail. Adding shape noise reduces the differences to within $20\%$ for all cosmologies. We also found larger covariances that are more sensitive to cosmological parameters. As a result, credibility regions in the $\{Ω_m, σ_8\}$ are $\approx 30\%$ larger. Even though the credible contours are commensurate, each model draws its predictive power from different types of peaks. Low peaks, especially those with $2<\mathcal{S/N}<3$, convey important cosmological information in N-body data, as shown in \cite{DietrichHartlap, Kratochvil2010}, but \textsc{Camelus} constrains cosmology almost exclusively from high significance peaks $(\mathcal{S/N}>3)$. Our results confirm the importance of using a cosmology-dependent covariance with at least a 14\% improvement in parameter constraints. We identified the covariance estimation as the main driver behind differences in inference, and suggest possible ways to make Camelus even more useful as a highly accurate peak count emulator.