Source author record

Yicheng Gao

Yicheng Gao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Computer Vision Networking and Internet Architecture Performance

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments

Real-world clinical diagnosis is a complex process in which the doctor is required to obtain information from both interaction with the patient and conducting medical exams. Additionally, the doctor needs to adapt to different patient personas, as well as noisy and incomplete information that can happen at any time during the process. However, existing benchmarks for medical LLMs and methods for automatic diagnosis largely simplify this process by reducing it to single-turn question answering, noise-free conversations, or sequential exam making, etc., ignoring the interactive and uncertain nature of clinical diagnosis. In this paper, we aim to address this gap by formalizing clinical diagnosis as a Partially Observable Markov Decision Process (POMDP) with three action types: questioning the patient, ordering medical exams as tool calls, and issuing a diagnosis. We also introduce a systematic noise model comprising seven patient noise types and three exam noise types. Using our proposed environment, we train an effective diagnosis agent, \textbf{MedExAgent}, through a two-stage pipeline that first performs supervised finetuning on synthetic conversations structured after the Calgary-Cambridge model for clinical interviews, and then applies DAPO to optimize a composite reward capturing diagnostic accuracy, tool call quality, and exam cost including financial cost and patient discomfort. Through extensive experiments and ablation studies, we demonstrate that MedExAgent achieves diagnostic performance comparable to larger models while maintaining cost-efficient examination strategies.

preprint2026arXiv

When Simulation Lies: A Sim-to-Real Benchmark and Domain-Randomized RL Recipe for Tool-Use Agents

Tool-use language agents are evaluated on benchmarks that assume clean inputs, unambiguous tool registries, and reliable APIs. Real deployments violate all these assumptions: user typos propagate into hallucinated tool names, a misconfigured request timeout can stall an agent indefinitely, and duplicate tool names across servers can freeze an SDK. We study these failures as a sim-to-real gap in the tool-use partially observable Markov decision process (POMDP), where deployment noise enters through the observation, action space, reward-relevant metadata, or transition dynamics. We introduce RobustBench-TC, a benchmark with 22 perturbation types organized by these four POMDP components, each grounded in a verified GitHub issue or documented tool-calling failure. Across 21 models from 1.5B to 32B parameters (including the closed-source o4-mini), the robustness profile is sharply uneven: observation perturbations reduce accuracy by less than 5%, while reward-relevant and transition perturbations reduce accuracy by roughly 40% and 30%, respectively; scale alone does not close these gaps. We then propose ToolRL-DR, a domain-randomization reinforcement learning (RL) recipe that trains a tool-use agent on perturbation-augmented trajectories spanning the three statically encodable POMDP components. On a 3B backbone, ToolRL-DR-Full retains roughly three-quarters of clean accuracy and reaches an aggregate perturbed accuracy comparable to open-source 14B function-calling baselines while substantially narrowing the gap to o4-mini. It closes approximately 27% of the Transition gap despite never seeing transition perturbations in training, suggesting that RL on adversarial static tool-use inputs induces a more persistent retry policy that transfers to unseen runtime failures. The dataset, code and benchmark leaderboard are publicly available.

preprint2022arXiv

JCSP: Joint Caching and Service Placement for Edge Computing Systems

With constrained resources, what, where, and how to cache at the edge is one of the key challenges for edge computing systems. The cached items include not only the application data contents but also the local caching of edge services that handle incoming requests. However, current systems separate the contents and services without considering the latency interplay of caching and queueing. Therefore, in this paper, we propose a novel class of stochastic models that enable the optimization of content caching and service placement decisions jointly. We first explain how to apply layered queueing networks (LQNs) models for edge service placement and show that combining this with genetic algorithms provides higher accuracy in resource allocation than an established baseline. Next, we extend LQNs with caching components to establish a joint modeling method for content caching and service placement (JCSP) and present analytical methods to analyze the resulting model. Finally, we simulate real-world Azure traces to evaluate the JCSP method and find that JCSP achieves up to 35% improvement in response time and 500MB reduction in memory usage than baseline heuristics for edge caching resource allocation.

preprint2014arXiv

Nuclear Norm based Matrix Regression with Applications to Face Recognition with Occlusion and Illumination Changes

Recently regression analysis becomes a popular tool for face recognition. The existing regression methods all use the one-dimensional pixel-based error model, which characterizes the representation error pixel by pixel individually and thus neglects the whole structure of the error image. We observe that occlusion and illumination changes generally lead to a low-rank error image. To make use of this low-rank structural information, this paper presents a two-dimensional image matrix based error model, i.e. matrix regression, for face representation and classification. Our model uses the minimal nuclear norm of representation error image as a criterion, and the alternating direction method of multipliers method to calculate the regression coefficients. Compared with the current regression methods, the proposed Nuclear Norm based Matrix Regression (NMR) model is more robust for alleviating the effect of illumination, and more intuitive and powerful for removing the structural noise caused by occlusion. We experiment using four popular face image databases, the Extended Yale B database, the AR database, the Multi-PIE and the FRGC database. Experimental results demonstrate the performance advantage of NMR over the state-of-the-art regression based face recognition methods.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint