Source author record

Krishnan Raghavan

Krishnan Raghavan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Distributed, Parallel, and Cluster Computing math.DS math.OC physics.data-an

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

Federated learning (FL) across multiple HPC facilities faces stochastic admission delays from batch schedulers that dominate wall-clock time. Synchronous FL suffers from severe stragglers, while asynchronous FL accumulates stale updates when queues spike. We propose FedQueue, a queue-aware FL protocol that incorporates scheduler delays directly into training and aggregation, which (i) predicts per-facility queue delays online to budget local work, (ii) applies cutoff-based admission that buffers late arrivals to bound staleness, and (iii) performs staleness-aware aggregation to stabilize heterogeneous local workloads. We prove the convergence for non-convex objectives at rate $\mathcal{O}(1/\sqrt{R})$ under bounded staleness, and show that the admission controls yield bounded staleness with high probability under queue-prediction error. Real-world cross-facility deployment of FedQueue shows 20.5% improvement over baseline algorithms. Controlled queue simulations demonstrate robust improvement over the baselines; in particular, about 34% reduction in time to reach a target accuracy level under high queue variance and non-IID partitions.

preprint2022arXiv

AutoDEUQ: Automated Deep Ensemble with Uncertainty Quantification

Deep neural networks are powerful predictors for a variety of tasks. However, they do not capture uncertainty directly. Using neural network ensembles to quantify uncertainty is competitive with approaches based on Bayesian neural networks while benefiting from better computational scalability. However, building ensembles of neural networks is a challenging task because, in addition to choosing the right neural architecture or hyperparameters for each member of the ensemble, there is an added cost of training each model. We propose AutoDEUQ, an automated approach for generating an ensemble of deep neural networks. Our approach leverages joint neural architecture and hyperparameter search to generate ensembles. We use the law of total variance to decompose the predictive variance of deep ensembles into aleatoric (data) and epistemic (model) uncertainties. We show that AutoDEUQ outperforms probabilistic backpropagation, Monte Carlo dropout, deep ensemble, distribution-free ensembles, and hyper ensemble methods on a number of regression benchmarks.

preprint2022arXiv

Classification of events from $α$-induced reactions in the MUSIC detector via statistical and ML methods

The Multi-Sampling Ionization Chamber (MUSIC) detector is typically used to measure nuclear reaction cross sections relevant for nuclear astrophysics, fusion studies, and other applications. From the MUSIC data produced in one experiment scientists carefully extract an order of $10^3$ events of interest from about $10^{9}$ total events, where each event can be represented by an 18-dimensional vector. However, the standard data classification process is based on expert-driven, manually intensive data analysis techniques that require several months to identify patterns and classify the relevant events from the collected data. To address this issue, we present a method for the classification of events originating from specific $α$-induced reactions by combining statistical and machine learning methods that require significantly less input from the domain scientist, relative to the standard technique. We applied the new method to two experimental data sets and compared our results with those obtained using traditional methods. With few exceptions, the number of events classified by our method agrees within $\pm20\%$ with the results obtained using traditional methods. With the present method, which is the first of its kind for the MUSIC data, we have established the foundation for the automated extraction of physical events of interest from experiments using the MUSIC detector.

preprint2022arXiv

Formalizing the Generalization-Forgetting Trade-off in Continual Learning

We formulate the continual learning (CL) problem via dynamic programming and model the trade-off between catastrophic forgetting and generalization as a two-player sequential game. In this approach, player 1 maximizes the cost due to lack of generalization whereas player 2 minimizes the cost due to catastrophic forgetting. We show theoretically that a balance point between the two players exists for each task and that this point is stable (once the balance is achieved, the two players stay at the balance point). Next, we introduce balanced continual learning (BCL), which is designed to attain balance between generalization and forgetting and empirically demonstrate that BCL is comparable to or better than the state of the art.

Krishnan Raghavan

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

FedQueue: Queue-Aware Federated Learning for Cross-Facility HPC Training

AutoDEUQ: Automated Deep Ensemble with Uncertainty Quantification

Classification of events from $α$-induced reactions in the MUSIC detector via statistical and ML methods

Formalizing the Generalization-Forgetting Trade-off in Continual Learning