Researcher profile

Ishan Singh

Ishan Singh contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

An Empirical Study of Automating Agent Evaluation

Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate this evaluation process? Our study shows that simply prompting coding assistants is insufficient for this task. Without domain-specific evaluation knowledge, frontier coding assistants achieve only a 30% execution success rate and produce over-engineered evaluations averaging 12+ metrics per agent, indicating that strong coding ability does not automatically translate to reliable agent evaluation. We introduce EvalAgent, an AI assistant that automates the end-to-end agent evaluation pipeline. EvalAgent encodes evaluation domain expertise as evaluation skills (procedural instructions, reusable code and templates, and dynamically retrieved API documentation) that compose into a trace-based pipeline producing complete evaluation artifacts including metrics, executable code, and reports. To systematically assess generated evaluations, we introduce a meta-evaluation framework alongside AgentEvalBench, a benchmark comprising 20 agents, each paired with evaluation requirements and test scenarios. We further propose the Eval@1 metric to measure whether generated evaluation code both executes and yields meaningful results on the first run. Our experiments show that EvalAgent produces focused evaluations, improving Eval@1 from 17.5% to 65%, and achieving 79.5% human expert preference over baseline approaches. Further ablation studies show that evaluation skills are critical for handling complex evaluation: removing them causes Eval@1 to drop significantly from 65% to 30%.

preprint2022arXiv

A Numerical Study of Lid Driven Cavity with Mixed Convection

Direct Numerical Simulation have been carried out for a two dimensional flow in a Lid driven cavity at Reynolds number 5000 and Prandtl number 7 with water as the working fluid. Both the side walls of the enclosure are insulated(i.e. adiabatic boundary condition), while the bottom plate is at higher temperature and the top wall is at colder temperature. Effects of heating of the bottom wall and movement of the top lid have been investigated by conducting numerical simulations at different Richardson numbers by varying from low and moderate magnitudes within the limits of Boussinesq-approximation. Three standard cases has been compared, in the first case heating effects are not taken into account and only the flow due to shear action of the plate is studied. In the second case only the heating effects are taken into account and shear effects are neglected. In the third case effects of both heating and shear action is taken into consideration(i.e. mixed convection). Drag force on the moving plate is calculated in all the three cases and effect of temperature on the drag force is studied. For running the above simulation a code has been developed which is validated by comparing the results with Ghia et al for non-heating case.