Researcher profile

Rahul Jain

Rahul Jain contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

MechVerse: Evaluating Physical Motion Consistency in Video Generation Models

Text- and image-conditioned video generation models have achieved strong visual fidelity and temporal coherence, but they often fail to generate motion governed by kinematic and geometric constraints. In these settings, object parts must remain rigid, maintain contact or coupling with neighboring components, and transfer motion consistently across connected parts. These requirements are especially explicit in articulated mechanical assemblies, where motion is constrained by rigid-link geometry, contact/coupling relations, and transmission through kinematic chains. A generated video may therefore appear plausible while violating the intended mechanism, such as rotating a part that should translate, deforming a rigid component, breaking coupling between parts, or failing to move downstream components. To evaluate this gap, We introduce MechVerse, a benchmark for mechanically consistent image-to-video generation. MechVerse contains 21,156 synthetic clips from 1,357 mechanical assemblies across 141 categories, organized into three tiers of increasing kinematic complexity: independent articulation, pairwise coupling, and densely coupled multi-part mechanisms. Each clip is paired with a structured prompt describing part identities, stationary supports, moving components, motion primitives, direction, speed/extent, and inter-part dependencies. We evaluate proprietary, open-source, and fine-tuned image-to-video models using standard video metrics, instruction-following scores, and human judgments of motion correctness and kinematic coupling. Results show that current models can preserve appearance and smoothness while failing to generate mechanically admissible motion, with errors increasing as coupling complexity grows. MechVerse provides a benchmark for measuring and improving mechanism-aware video generation from image and language inputs.

preprint2026arXiv

Robust LLM Alignment via Distributionally Robust Direct Preference Optimization

A major challenge in aligning large language models (LLMs) with human preferences is the issue of distribution shift. LLM alignment algorithms rely on static preference datasets, assuming that they accurately represent real-world user preferences. However, user preferences vary significantly across geographical regions, demographics, linguistic patterns, and evolving cultural trends. This preference distribution shift leads to catastrophic alignment failures in many real-world applications. We address this problem using the principled framework of distributionally robust optimization, and develop two novel distributionally robust direct preference optimization (DPO) algorithms, namely, Wasserstein DPO (WDPO) and Kullback-Leibler DPO (KLDPO). We characterize the sample complexity of learning the optimal policy parameters for WDPO and KLDPO. Moreover, we propose scalable gradient descent-style learning algorithms by developing suitable approximations for the challenging minimax loss functions of WDPO and KLDPO. Our empirical experiments using benchmark data sets and LLMs demonstrate the superior performance of WDPO and KLDPO in substantially improving the alignment when there is a preference distribution shift.

preprint2026arXiv

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

Behavior Foundation Models (BFMs) enable scalable imitation learning (IL) by pretraining task-agnostic representations that can be rapidly adapted to new tasks. However, existing BFMs assume fixed environment dynamics, limiting their robustness under real-world shifts such as changes in friction, actuation, or sensor noise. We address this by formulating BFM task-inference as a robust minimax optimization problem, enabling adaptation to worst-case dynamics perturbations without modifying pretraining. To the best of our knowledge, this is the first BFM-based framework that achieves robustness to dynamics shifts while relying solely on offline data from a single nominal environment. Our approach significantly outperforms standard BFM and robust offline IL baselines under dynamics shifts. These results demonstrate that robust policy can be achieved entirely at task-inference time, improving the practicality of BFMs in dynamic settings.

preprint2022arXiv

Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

We study regret minimization for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints. We start by designing a policy optimization algorithm with carefully designed action-value estimator and bonus term, and show that for ergodic MDPs, our algorithm ensures $\widetilde{O}(\sqrt{T})$ regret and constant constraint violation, where $T$ is the total number of time steps. This strictly improves over the algorithm of (Singh et al., 2020), whose regret and constraint violation are both $\widetilde{O}(T^{2/3})$. Next, we consider the most general class of weakly communicating MDPs. Through a finite-horizon approximation, we develop another algorithm with $\widetilde{O}(T^{2/3})$ regret and constraint violation, which can be further improved to $\widetilde{O}(\sqrt{T})$ via a simple modification, albeit making the algorithm computationally inefficient. As far as we know, these are the first set of provable algorithms for weakly communicating MDPs with cost constraints.

preprint2022arXiv

Online Bayesian Optimization for Beam Alignment in the SECAR Recoil Mass Separator

The SEparator for CApture Reactions (SECAR) is a next-generation recoil separator system at the Facility for Rare Isotope Beams (FRIB) designed for the direct measurement of capture reactions on unstable nuclei in inverse kinematics. To maximize the performance of the device, careful beam alignment to the central ion optical axis needs to be achieved. This can be difficult to attain through manual tuning by human operators without potentially leaving the system in a sub-optimal and irreproducible state. In this work, we present the first development of online Bayesian optimization with a Gaussian process model to tune an ion beam through a nuclear astrophysics recoil separator. We show that the method achieves small incoming angular deviations (0-1 mrad) in an efficient and reproducible manner that is at least 3 times faster than standard hand-tuning. This method is now routinely used for all separator tuning.

preprint2022arXiv

Optimal Communication and Control Strategies for a Multi-Agent System in the Presence of an Adversary

We consider a multi-agent system in which a decentralized team of agents controls a stochastic system in the presence of an adversary. Instead of committing to a fixed information sharing protocol, the agents can strategically decide at each time whether to share their private information with each other or not. The agents incur a cost whenever they communicate with each other and the adversary may eavesdrop on their communication. Thus, the agents in the team must effectively coordinate with each other while being robust to the adversary's malicious actions. We model this interaction between the team and the adversary as a stochastic zero-sum game where the team aims to minimize a cost while the adversary aims to maximize it. Under some assumptions on the adversary's capabilities, we characterize a min-max control and communication strategy for the team. We supplement this characterization with several structural results that can make the computation of the min-max strategy more tractable.

preprint2022arXiv

Optimal Control of Partially Observable Markov Decision Processes with Finite Linear Temporal Logic Constraints

Autonomous agents often operate in scenarios where the state is partially observed. In addition to maximizing their cumulative reward, agents must execute complex tasks with rich temporal and logical structures. These tasks can be expressed using temporal logic languages like finite linear temporal logic (LTL_f). This paper, for the first time, provides a structured framework for designing agent policies that maximize the reward while ensuring that the probability of satisfying the temporal logic specification is sufficiently high. We reformulate the problem as a constrained partially observable Markov decision process (POMDP) and provide a novel approach that can leverage off-the-shelf unconstrained POMDP solvers for solving it. Our approach guarantees approximate optimality and constraint satisfaction with high probability. We demonstrate its effectiveness by implementing it on several models of interest.

preprint2020arXiv

A Direct Product Theorem for One-Way Quantum Communication

We prove a direct product theorem for the one-way entanglement-assisted quantum communication complexity of a general relation $f\subseteq\mathcal{X}\times\mathcal{Y}\times\mathcal{Z}$. For any $\varepsilon, ζ> 0$ and any $k\geq1$, we show that \[ \mathrm{Q}^1_{1-(1-\varepsilon)^{Ω(ζ^6k/\log|\mathcal{Z}|)}}(f^k) = Ω\left(k\left(ζ^5\cdot\mathrm{Q}^1_{\varepsilon + 12ζ}(f) - \log\log(1/ζ)\right)\right),\] where $\mathrm{Q}^1_{\varepsilon}(f)$ represents the one-way entanglement-assisted quantum communication complexity of $f$ with worst-case error $\varepsilon$ and $f^k$ denotes $k$ parallel instances of $f$. As far as we are aware, this is the first direct product theorem for quantum communication. Our techniques are inspired by the parallel repetition theorems for the entangled value of two-player non-local games, under product distributions due to Jain, Pereszlényi and Yao, and under anchored distributions due to Bavarian, Vidick and Yuen, as well as message-compression for quantum protocols due to Jain, Radhakrishnan and Sen. Our techniques also work for entangled non-local games which have input distributions anchored on any one side. In particular, we show that for any game $G = (q, \mathcal{X}\times\mathcal{Y}, \mathcal{A}\times\mathcal{B}, \mathsf{V})$ where $q$ is a distribution on $\mathcal{X}\times\mathcal{Y}$ anchored on any one side with anchoring probability $ζ$, then \[ ω^*(G^k) = \left(1 - (1-ω^*(G))^5\right)^{Ω\left(\frac{ζ^2 k}{\log(|\mathcal{A}|\cdot|\mathcal{B}|)}\right)}\] where $ω^*(G)$ represents the entangled value of the game $G$. This is a generalization of the result of Bavarian, Vidick and Yuen, who proved a parallel repetition theorem for games anchored on both sides, and potentially a simplification of their proof.

preprint2020arXiv

A near-optimal direct-sum theorem for communication complexity

We show a near optimal direct-sum theorem for the two-party randomized communication complexity. Let $f\subseteq X \times Y\times Z$ be a relation, $\varepsilon> 0$ and $k$ be an integer. We show, $$\mathrm{R}^{\mathrm{pub}}_\varepsilon(f^k) \cdot \log(\mathrm{R}^{\mathrm{pub}}_\varepsilon(f^k)) \ge Ω(k \cdot \mathrm{R}^{\mathrm{pub}}_\varepsilon(f)) \enspace,$$ where $f^k= f \times \ldots \times f$ ($k$-times) and $\mathrm{R}^{\mathrm{pub}}_\varepsilon(\cdot)$ represents the public-coin randomized communication complexity with worst-case error $\varepsilon$. Given a protocol $\mathcal{P}$ for $f^k$ with communication cost $c \cdot k$ and worst-case error $\varepsilon$, we exhibit a protocol $\mathcal{Q}$ for $f$ with external-information-cost $O(c)$ and worst-error $\varepsilon$. We then use a message compression protocol due to Barak, Braverman, Chen and Rao [2013] for simulating $\mathcal{Q}$ with communication $O(c \cdot \log(c\cdot k))$ to arrive at our result. To show this reduction we show some new chain-rules for capacity, the maximum information that can be transmitted by a communication channel. We use the powerful concept of Nash-Equilibrium in game-theory, and its existence in suitably defined games, to arrive at the chain-rules for capacity. These chain-rules are of independent interest.

preprint2020arXiv

A Risk Aware Two-Stage Market Mechanism for Electricity with Renewable Generation

Over the last few decades, electricity markets around the world have adopted multi-settlement structures, allowing for balancing of supply and demand as more accurate forecast information becomes available. Given increasing uncertainty due to adoption of renewables, more recent market design work has focused on optimization of expectation of some quantity, e.g. social welfare. However, social planners and policy makers are often risk averse, so that such risk neutral formulations do not adequately reflect prevailing attitudes towards risk, nor explain the decisions that follow. Hence we incorporate the commonly used risk measure conditional value at risk (CVaR) into the central planning objective, and study how a two-stage market operates when the individual generators are risk neutral. Our primary result is to show existence (by construction) of a sequential competitive equilibrium (SCEq) in this risk-aware two-stage market. Given equilibrium prices, we design a market mechanism which achieves social cost minimization assuming that agents are non strategic.

preprint2020arXiv

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems. In this paper, two model-free algorithms are introduced for learning infinite-horizon average-reward Markov Decision Processes (MDPs). The first algorithm reduces the problem to the discounted-reward version and achieves $\mathcal{O}(T^{2/3})$ regret after $T$ steps, under the minimal assumption of weakly communicating MDPs. To our knowledge, this is the first model-free algorithm for general MDPs in this setting. The second algorithm makes use of recent advances in adaptive algorithms for adversarial multi-armed bandits and improves the regret to $\mathcal{O}(\sqrt{T})$, albeit with a stronger ergodic assumption. This result significantly improves over the $\mathcal{O}(T^{3/4})$ regret achieved by the only existing model-free algorithm by Abbasi-Yadkori et al. (2019a) for ergodic MDPs in the infinite-horizon average-reward setting.

preprint2020arXiv

Multiple Source Replacement Path Problem

One of the classical line of work in graph algorithms has been the Replacement Path Problem: given a graph $G$, $s$ and $t$, find shortest paths from $s$ to $t$ avoiding each edge $e$ on the shortest path from $s$ to $t$. These paths are called replacement paths in literature. For an undirected and unweighted graph, (Malik, Mittal, and Gupta, Operation Research Letters, 1989) and (Hershberger and Suri, FOCS 2001) designed an algorithm that solves the replacement path problem in $\tilde O(m+n)$ time. It is natural to ask whether we can generalize the replacement path problem: {\em can we find all replacement paths from a source $s$ to all vertices in $G$?} This problem is called the Single Source Replacement Path Problem. Recently (Chechik and Cohen, SODA 2019) designed a randomized combinatorial algorithm that solves the Single Source Replacement Path Problem in $\tilde O(m\sqrt n\ + n^2)$ time. One of the questions left unanswered by their work is the case when there are many sources, not one. When there are $n$ sources, the combinatorial algorithm of (Bernstein and Karger, STOC 2009) can be used to find all pair replacement path in $\tilde O(mn + n^3)$ time. However, there is no result known for any general $σ$. Thus, the problem we study is defined as follows: given a set of $σ$ sources, we want to find the replacement path from these sources to all vertices in $G$. We give a randomized combinatorial algorithm for this problem that takes $\tilde O(m\sqrt{n σ} +\ σn^2)$ time. This result generalizes both results known for this problem. Our algorithm is much different and arguably simpler than (Chechik and Cohen, SODA 2019). Like them, we show a matching conditional lower bound using the Boolean Matrix Multiplication conjecture.

preprint2020arXiv

Non-indexability of the Stochastic Appointment Scheduling Problem

Consider a set of jobs with independent random service times to be scheduled on a single machine. The jobs can be surgeries in an operating room, patients' appointments in outpatient clinics, etc. The challenge is to determine the optimal sequence and appointment times of jobs to minimize some function of the server idle time and service start-time delay. We introduce a generalized objective function of delay and idle time, and consider $l_1$-type and $l_2$-type cost functions as special cases of interest. Determining an index-based policy for the optimal sequence in which to schedule jobs has been an open problem for many years. For example, it was conjectured that `least variance first' (LVF) policy is optimal for the $l_1$-type objective. This is known to be true for the case of two jobs with specific distributions. A key result in this paper is that the optimal sequencing problem is non-indexable, i.e., neither the variance, nor any other such index can be used to determine the optimal sequence in which to schedule jobs for $l_1$ and $l_2$-type objectives. We then show that given a sequence in which to schedule the jobs, sample average approximation yields a solution which is statistically consistent.

preprint2020arXiv

Time Space Optimal Algorithm for Computing Separators in Bounded Genus Graphs

A graph separator is a subset of vertices of a graph whose removal divides the graph into small components. Computing small graph separators for various classes of graphs is an important computational task. In this paper, we present a polynomial time algorithm that uses $O(g^{1/2}n^{1/2}\log n)$-space to find an $O(g^{1/2}n^{1/2})$-sized separator of a graph having $n$ vertices and embedded on a surface of genus $g$.

preprint2019arXiv

A minimax approach to one-shot entropy inequalities

One-shot information theory entertains a plethora of entropic quantities, such as the smooth max-divergence, hypothesis testing divergence and information spectrum divergence, that characterize various operational tasks and are used to prove the asymptotic behavior of various tasks in quantum information theory. Tight inequalities between these quantities are thus of immediate interest. In this note we use a minimax approach (appearing previously for example in the proofs of the quantum substate theorem), to simplify the quantum problem to a commutative one, which allows us to derive such inequalities. Our derivations are conceptually different from previous arguments and in some cases lead to tighter relations. We hope that the approach discussed here can lead to progress in open problems in quantum Shannon theory, and exemplify this by applying it to a simple case of the joint smoothing problem.

preprint2019arXiv

Efficient methods for one-shot quantum communication

We address the question of efficient implementation of quantum protocols, with small communication and entanglement, and short depth circuit for encoding or decoding. We introduce two new methods to achieve this, the first method involving two new versions of convex-split lemma that use small amount of additional resource (in comparison to prior version) and the second method being inspired by the technique of classical correlated sampling in computer science. These lead to a series of new consequences, as follows. First, we consider the task of quantum decoupling, where the aim is to apply an operation on a n-qubit register so as to make it independent of an inaccessible quantum system. Many previous works achieve decoupling with the aid of a random unitary. It is known that random unitaries can be replaced by random circuits of size O(n\log n) and depth poly(\log n), or unitary 2-designs based on Clifford circuits of similar size and depth. We show that given any choice of basis such as the computational basis, decoupling can be achieved by a unitary that takes basis vectors to basis vectors. Thus, the circuit acts in a `classical' manner and additionally uses O(n) catalytic qubits in maximally mixed quantum state. Our unitary performs addition and multiplication modulo a prime and hence achieves a circuit size of O(n\log n) and logarithmic depth. This shows that the circuit complexity of integer multiplication (modulo a prime) is lower bounded by the optimal circuit complexity of decoupling. Next, we construct a new one-shot entanglement-assisted protocol for quantum channel coding that achieves near-optimal communication through a given channel. The number of qubits of pre-shared entanglement is exponentially smaller than that in the previous protocol near-optimal in communication. We also achieve similar results for one-shot quantum state redistribution.

preprint2018arXiv

Parallel Device-Independent Quantum Key Distribution

A prominent application of quantum cryptography is the distribution of cryptographic keys that are provably secure. Recently, such security proofs were extended by Vazirani and Vidick (Physical Review Letters, 113, 140501, 2014) to the device-independent (DI) scenario, where the users do not need to trust the integrity of the underlying quantum devices. The protocols analyzed by them and by subsequent authors all require a sequential execution of N multiplayer games, where N is the security parameter. In this work, we prove unconditional security of a protocol where all games are executed in parallel. Besides decreasing the number of time-steps necessary for key generation, this result reduces the security requirements for DI-QKD by allowing arbitrary information leakage of each user's inputs within his or her lab. To the best of our knowledge, this is the first parallel security proof for a fully device-independent QKD protocol. Our protocol tolerates a constant level of device imprecision and achieves a linear key rate.

preprint2018arXiv

Partially smoothed information measures

Smooth entropies are a tool for quantifying resource trade-offs in (quantum) information theory and cryptography. In typical bi- and multi-partite problems, however, some of the sub-systems are often left unchanged and this is not reflected by the standard smoothing of information measures over a ball of close states. We propose to smooth instead only over a ball of close states which also have some of the reduced states on the relevant sub-systems fixed. This partial smoothing of information measures naturally allows to give more refined characterizations of various information-theoretic problems in the one-shot setting. In particular, we immediately get asymptotic second-order characterizations for tasks such as privacy amplification against classical side information or classical state splitting. For quantum problems like state merging the general resource trade-off is tightly characterized by partially smoothed information measures as well.

preprint2016arXiv

Partition bound is quadratically tight for product distributions

Let $f : \{0,1\}^n \times \{0,1\}^n \rightarrow \{0,1\}$ be a 2-party function. For every product distribution $μ$ on $\{0,1\}^n \times \{0,1\}^n$, we show that $$\mathsf{CC}^μ_{0.49}(f) = O\left(\left(\log \mathsf{prt}_{1/8}(f) \cdot \log \log \mathsf{prt}_{1/8}(f)\right)^2\right),$$ where $\mathsf{CC}^μ_\varepsilon(f)$ is the distributional communication complexity of $f$ with error at most $\varepsilon$ under the distribution $μ$ and $\mathsf{prt}_{1/8}(f)$ is the {\em partition bound} of $f$, as defined by Jain and Klauck [{\em Proc. 25th CCC}, 2010]. We also prove a similar bound in terms of $\mathsf{IC}_{1/8}(f)$, the {\em information complexity} of $f$, namely, $$\mathsf{CC}^μ_{0.49}(f) = O\left(\left(\mathsf{IC}_{1/8}(f) \cdot \log \mathsf{IC}_{1/8}(f)\right)^2\right).$$ The latter bound was recently and independently established by Kol [{\em Proc. 48th STOC}, 2016] using a different technique. We show a similar result for query complexity under product distributions. Let $g : \{0,1\}^n \rightarrow \{0,1\}$ be a function. For every bit-wise product distribution $μ$ on $\{0,1\}^n$, we show that $$\mathsf{QC}^μ_{0.49}(g) = O\left(\left( \log \mathsf{qprt}_{1/8}(g) \cdot \log \log\mathsf{qprt}_{1/8}(g) \right)^2 \right),$$ where $\mathsf{QC}^μ_{\varepsilon}(g)$ is the distributional query complexity of $f$ with error at most $\varepsilon$ under the distribution $μ$ and $\mathsf{qprt}_{1/8}(g))$ is the {\em query partition bound} of the function $g$. Partition bounds were introduced (in both communication complexity and query complexity models) to provide LP-based lower bounds for randomized communication complexity and randomized query complexity. Our results demonstrate that these lower bounds are polynomially tight for {\em product} distributions.