Source author record

Ehsan Kamalinejad

Ehsan Kamalinejad appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.AP Artificial Intelligence Computation and Language Machine Learning

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Rate or Fate? RLV$^\varepsilon$R: Reinforcement Learning with Verifiable Noisy Rewards

Reinforcement learning with verifiable rewards (RLVR) is a simple but powerful paradigm for training LLMs: sample a completion, verify it, and update. In practice, however, the verifier is almost never clean--unit tests probe only limited corner cases; human and synthetic labels are imperfect; and LLM judges (e.g., RLAIF) are noisy and can be exploited--and this problem worsens on harder domains (especially coding) where tests are sparse and increasingly model-generated. We ask a pragmatic question: does the verification noise merely slow down the learning (rate), or can it flip the outcome (fate)? To address this, we develop an analytically tractable multi-armed bandit view of RLVR dynamics, instantiated with GRPO and validated in controlled experiments. Modeling false positives and false negatives and grouping completions into recurring reasoning modes yields a replicator-style (natural-selection) flow on the probability simplex. The dynamics decouples into within-correct-mode competition and a one-dimensional evolution for the mass on incorrect modes, whose drift is determined solely by Youden's index J=TPR-FPR. This yields a sharp phase transition: when J>0, the incorrect mass is driven toward extinction (learning); when J=0, the process is neutral; and when J<0, incorrect modes amplify until they dominate (anti-learning and collapse). In the learning regime J>0, noise primarily rescales convergence time ("rate, not fate"). Experiments on verifiable programming tasks under synthetic noise reproduce the predicted J=0 boundary. Beyond noise, the framework offers a general lens for analyzing RLVR stability, convergence, and algorithmic interventions.

preprint2012arXiv

Radial Symmetry of Large Solutions of Semilinear Elliptic Equations with Convection

We study radial symmetry of large solutions of the semi-linear elliptic problem Δu + \nabla h.\nabla u = f(|x|,u), and we provide sharp conditions under which the problem has a radial solution. The result is independent of the rate of growth of the solution at infinity.

preprint2012arXiv

Well-posedness of Wasserstein Gradient Flow Solutions of Higher Order Evolution Equations

A relaxed notion of displacement convexity is defined and used to establish short time existence and uniqueness of Wasserstein gradient flows for higher order energy functionals. As an application, local and global well-posedness of different higher order non-linear evolution equations are derived. Examples include the thin-film equation and the quantum drift diffusion equation in one spatial variable.