Researcher profile

Arash Ahmadi

Arash Ahmadi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - Baseline
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning

Mathematical reasoning is a key benchmark for large language models. Reinforcement learning is a standard post-training mechanism for improving the reasoning capabilities of large language models, yet performance remains sensitive to the design of the reward function that drives policy optimization. This paper introduces a search-driven framework that treats the reward specification itself as an object of optimization. The setting of interest is one in which the base model is held fixed and the reward specification is the primary remaining design lever. Candidate reward functions are generated by a frontier language model, validated automatically, screened through 500-step Group Relative Policy Optimization (GRPO) training runs on a Llama-3.2-3B-Instruct base model with Low-Rank Adaptation (LoRA), and ranked by F1 on the GSM8K test set. Ranked summaries from prior rounds are then fed back into the next round of generation. Over five rounds, the search produces 50 candidate rewards. The mean F1 rises from 0.596 in Round 1 to 0.632 in Round 5, and the top individual reward reaches F1 = 0.787. Seven ensemble configurations of top-ranked rewards are evaluated. The best ensemble achieves F1 = 0.795 (95% bootstrap CI [0.756, 0.832]) and accuracy 0.660 [0.635, 0.686], a 0.19 absolute F1 gain over a base-rewards-only GRPO baseline (F1 = 0.609). Pairwise McNemar tests with Bonferroni correction show all five-or-more-reward configurations are statistically indistinguishable at α = 0.05/21. A three-seed re-training of the best ensemble yields F1 of 0.785. A randomly drawn 5-reward control collapses to F1 = 0.047, which shows that the ranked-feedback loop, not the additive signal of having more rewards, drives the gain.

preprint2025arXiv

Controlled Displacement of Stored Light at Room Temperature

We report the demonstration of spatially translating a stored optical pulse at room temperature over distances exceeding one optical wavelength. By implementing an interferometric scheme, we further measure the average speed of this linear translation, thus harnessing a stopped-light experiment for a sensing application. This work extends the use of quantum memories beyond quantum communication and information contexts, opening a pathway to novel methods of velocity measurements with high sensitivity.

preprint2016arXiv

evt_MNIST: A spike based version of traditional MNIST

Benchmarks and datasets have important role in evaluation of machine learning algorithms and neural network implementations. Traditional dataset for images such as MNIST is applied to evaluate efficiency of different training algorithms in neural networks. This demand is different in Spiking Neural Networks (SNN) as they require spiking inputs. It is widely believed, in the biological cortex the timing of spikes is irregular. Poisson distributions provide adequate descriptions of the irregularity in generating appropriate spikes. Here, we introduce a spike-based version of MNSIT (handwritten digits dataset),using Poisson distribution and show the Poissonian property of the generated streams. We introduce a new version of evt_MNIST which can be used for neural network evaluation.

preprint2015arXiv

Optimized Implementation of Memristor-Based Full Adder by Material Implication Logic

Recently memristor-based applications and circuits are receiving an increased attention. Furthermore, memristors are also applied in logic circuit design. Material implication logic is one of the main areas with memristors. In this paper an optimized memristor-based full adder design by material implication logic is presented. This design needs 27 memristors and less area in comparison with typical CMOS-based 8-bit full adders. Also the presented full adder needs only 184 computational steps which enhance former full adder design speed by 20 percent.

preprint2012arXiv

Biologically Inspired Spiking Neurons : Piecewise Linear Models and Digital Implementation

There has been a strong push recently to examine biological scale simulations of neuromorphic algorithms to achieve stronger inference capabilities. This paper presents a set of piecewise linear spiking neuron models, which can reproduce different behaviors, similar to the biological neuron, both for a single neuron as well as a network of neurons. The proposed models are investigated, in terms of digital implementation feasibility and costs, targeting large scale hardware implementation. Hardware synthesis and physical implementations on FPGA show that the proposed models can produce precise neural behaviors with higher performance and considerably lower implementation costs compared with the original model. Accordingly, a compact structure of the models which can be trained with supervised and unsupervised learning algorithms has been developed. Using this structure and based on a spike rate coding, a character recognition case study has been implemented and tested.