Researcher profile

Neha Gupta

Neha Gupta contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Ministral 3

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.

preprint2022arXiv

Ensembling over Classifiers: a Bias-Variance Perspective

Ensembles are a straightforward, remarkably effective method for improving the accuracy,calibration, and robustness of models on classification tasks; yet, the reasons that underlie their success remain an active area of research. We build upon the extension to the bias-variance decomposition by Pfau (2013) in order to gain crucial insights into the behavior of ensembles of classifiers. Introducing a dual reparameterization of the bias-variance tradeoff, we first derive generalized laws of total expectation and variance for nonsymmetric losses typical of classification tasks. Comparing conditional and bootstrap bias/variance estimates, we then show that conditional estimates necessarily incur an irreducible error. Next, we show that ensembling in dual space reduces the variance and leaves the bias unchanged, whereas standard ensembling can arbitrarily affect the bias. Empirically, standard ensembling reducesthe bias, leading us to hypothesize that ensembles of classifiers may perform well in part because of this unexpected reduction.We conclude by an empirical analysis of recent deep learning methods that ensemble over hyperparameters, revealing that these techniques indeed favor bias reduction. This suggests that, contrary to classical wisdom, targeting bias reduction may be a promising direction for classifier ensembles.

preprint2022arXiv

Understanding the bias-variance tradeoff of Bregman divergences

This paper builds upon the work of Pfau (2013), which generalized the bias variance tradeoff to any Bregman divergence loss function. Pfau (2013) showed that for Bregman divergences, the bias and variances are defined with respect to a central label, defined as the mean of the label variable, and a central prediction, of a more complex form. We show that, similarly to the label, the central prediction can be interpreted as the mean of a random variable, where the mean operates in a dual space defined by the loss function itself. Viewing the bias-variance tradeoff through operations taken in dual space, we subsequently derive several results of interest. In particular, (a) the variance terms satisfy a generalized law of total variance; (b) if a source of randomness cannot be controlled, its contribution to the bias and variance has a closed form; (c) there exist natural ensembling operations in the label and prediction spaces which reduce the variance and do not affect the bias.

preprint2020arXiv

Active Local Learning

In this work we consider active local learning: given a query point $x$, and active access to an unlabeled training set $S$, output the prediction $h(x)$ of a near-optimal $h \in H$ using significantly fewer labels than would be needed to actually learn $h$ fully. In particular, the number of label queries should be independent of the complexity of $H$, and the function $h$ should be well-defined, independent of $x$. This immediately also implies an algorithm for distance estimation: estimating the value $opt(H)$ from many fewer labels than needed to actually learn a near-optimal $h \in H$, by running local learning on a few random query points and computing the average error. For the hypothesis class consisting of functions supported on the interval $[0,1]$ with Lipschitz constant bounded by $L$, we present an algorithm that makes $O(({1 / ε^6}) \log(1/ε))$ label queries from an unlabeled pool of $O(({L / ε^4})\log(1/ε))$ samples. It estimates the distance to the best hypothesis in the class to an additive error of $ε$ for an arbitrary underlying distribution. We further generalize our algorithm to more than one dimensions. We emphasize that the number of labels used is independent of the complexity of the hypothesis class which depends on $L$. Furthermore, we give an algorithm to locally estimate the values of a near-optimal function at a few query points of interest with number of labels independent of $L$. We also consider the related problem of approximating the minimum error that can be achieved by the Nadaraya-Watson estimator under a linear diagonal transformation with eigenvalues coming from a small range. For a $d$-dimensional pointset of size $N$, our algorithm achieves an additive approximation of $ε$, makes $\tilde{O}({d}/{ε^2})$ queries and runs in $\tilde{O}({d^2}/{ε^{d+4}}+{dN}/{ε^2})$ time.

preprint2020arXiv

Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

We consider networks, trained via stochastic gradient descent to minimize $\ell_2$ loss, with the training labels perturbed by independent noise at each iteration. We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm of the gradient of the model with respect to the parameter vector, evaluated at each data point. This holds for networks of any connectivity, width, depth, and choice of activation function. We interpret this implicit regularization term for three simple settings: matrix sensing, two layer ReLU networks trained on one-dimensional data, and two layer networks with sigmoid activations trained on a single datapoint. For these settings, we show why this new and general implicit regularization effect drives the networks towards "simple" models.

preprint2020arXiv

Projections for COVID-19 spread in India and its worst affected five states using the Modified SEIRD and LSTM models

The last leg of the year 2019 gave rise to a virus named COVID-19 (Corona Virus Disease 2019). Since the beginning of this infection in India, the government implemented several policies and restrictions to curtail its spread among the population. As the time passed, these restrictions were relaxed and people were advised to follow precautionary measures by themselves. These timely decisions taken by the Indian government helped in decelerating the spread of COVID-19 to a large extent. Despite these decisions, the pandemic continues to spread and hence, there is an urgent need to plan and control the spread of this disease. This is possible by finding the future predictions about the spread. Scientists across the globe are working towards estimating the future growth of COVID-19. This paper proposes a Modified SEIRD (Susceptible-Exposed-Infected-Recovered-Deceased) model for projecting COVID-19 infections in India and its five states having the highest number of total cases. In this model, exposed compartment contains individuals which may be asymptomatic but infectious. Deep Learning based Long Short-Term Memory (LSTM) model has also been used in this paper to perform short-term projections. The projections obtained from the proposed Modified SEIRD model have also been compared with the projections made by LSTM for next 30 days. The epidemiological data up to 15th August 2020 has been used for carrying out predictions in this paper. These predictions will help in arranging adequate medical infrastructure and providing proper preventive measures to handle the current pandemic. The effect of different lockdowns imposed by the Indian government has also been used in modelling and analysis in the proposed Modified SEIRD model. The results presented in this paper will act as a beacon for future policy-making to control the COVID-19 spread in India.