Researcher profile

Ali Rad

Ali Rad contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards

Reinforcement learning with verifiable rewards (RLVR) is a simple but powerful paradigm for training LLMs: sample a completion, verify it, and update. In practice, however, the verifier is almost never clean--unit tests probe only limited corner cases; human and synthetic labels are imperfect; and LLM judges (e.g., RLAIF) are noisy and can be exploited--and this problem worsens on harder domains (especially coding) where tests are sparse and increasingly model-generated. We ask a pragmatic question: does the verification noise merely slow down the learning (rate), or can it flip the outcome (fate)? To address this, we develop an analytically tractable multi-armed bandit view of RLVR dynamics, instantiated with GRPO and validated in controlled experiments. Modeling false positives and false negatives and grouping completions into recurring reasoning modes yields a replicator-style (natural-selection) flow on the probability simplex. The dynamics decouples into within-correct-mode competition and a one-dimensional evolution for the mass on incorrect modes, whose drift is determined solely by Youden&#39;s index J=TPR-FPR. This yields a sharp phase transition: when J>0, the incorrect mass is driven toward extinction (learning); when J=0, the process is neutral; and when J<0, incorrect modes amplify until they dominate (anti-learning and collapse). In the learning regime J>0, noise primarily rescales convergence time (&#34;rate, not fate&#34;). Experiments on verifiable programming tasks under synthetic noise reproduce the predicted J=0 boundary. Beyond noise, the framework offers a general lens for analyzing RLVR stability, convergence, and algorithmic interventions.

preprint2022arXiv

Surviving The Barren Plateau in Variational Quantum Circuits with Bayesian Learning Initialization

Variational quantum-classical hybrid algorithms are seen as a promising strategy for solving practical problems on quantum computers in the near term. While this approach reduces the number of qubits and operations required from the quantum machine, it places a heavy load on a classical optimizer. While often under-appreciated, the latter is a computationally hard task due to the barren plateau phenomenon in parameterized quantum circuits. The absence of guiding features like gradients renders conventional optimization strategies ineffective as the number of qubits increases. Here, we introduce the fast-and-slow algorithm, which uses Bayesian Learning to identify a promising region in parameter space. This is used to initialize a fast local optimizer to find the global optimum point efficiently. We illustrate the effectiveness of this method on the Bars-and-Stripes (BAS) quantum generative model, which has been studied on several quantum hardware platforms. Our results move variational quantum algorithms closer to their envisioned applications in quantum chemistry, combinatorial optimization, and quantum simulation problems.