Source author record

Ali Rad

Ali Rad appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Machine Learning quant-ph

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards

Reinforcement learning with verifiable rewards (RLVR) is a simple but powerful paradigm for training LLMs: sample a completion, verify it, and update. In practice, however, the verifier is almost never clean--unit tests probe only limited corner cases; human and synthetic labels are imperfect; and LLM judges (e.g., RLAIF) are noisy and can be exploited--and this problem worsens on harder domains (especially coding) where tests are sparse and increasingly model-generated. We ask a pragmatic question: does the verification noise merely slow down the learning (rate), or can it flip the outcome (fate)? To address this, we develop an analytically tractable multi-armed bandit view of RLVR dynamics, instantiated with GRPO and validated in controlled experiments. Modeling false positives and false negatives and grouping completions into recurring reasoning modes yields a replicator-style (natural-selection) flow on the probability simplex. The dynamics decouples into within-correct-mode competition and a one-dimensional evolution for the mass on incorrect modes, whose drift is determined solely by Youden's index J=TPR-FPR. This yields a sharp phase transition: when J>0, the incorrect mass is driven toward extinction (learning); when J=0, the process is neutral; and when J<0, incorrect modes amplify until they dominate (anti-learning and collapse). In the learning regime J>0, noise primarily rescales convergence time ("rate, not fate"). Experiments on verifiable programming tasks under synthetic noise reproduce the predicted J=0 boundary. Beyond noise, the framework offers a general lens for analyzing RLVR stability, convergence, and algorithmic interventions.

preprint2022arXiv

Surviving The Barren Plateau in Variational Quantum Circuits with Bayesian Learning Initialization

Variational quantum-classical hybrid algorithms are seen as a promising strategy for solving practical problems on quantum computers in the near term. While this approach reduces the number of qubits and operations required from the quantum machine, it places a heavy load on a classical optimizer. While often under-appreciated, the latter is a computationally hard task due to the barren plateau phenomenon in parameterized quantum circuits. The absence of guiding features like gradients renders conventional optimization strategies ineffective as the number of qubits increases. Here, we introduce the fast-and-slow algorithm, which uses Bayesian Learning to identify a promising region in parameter space. This is used to initialize a fast local optimizer to find the global optimum point efficiently. We illustrate the effectiveness of this method on the Bars-and-Stripes (BAS) quantum generative model, which has been studied on several quantum hardware platforms. Our results move variational quantum algorithms closer to their envisioned applications in quantum chemistry, combinatorial optimization, and quantum simulation problems.