Source author record

Dakuan Lu

Dakuan Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language

Catalog footprint

What is connected

2works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents

Recent advances in vision-language models (VLMs) and reinforcement learning (RL) have driven progress in GUI automation. However, most existing methods rely on static, one-shot visual inputs and passive perception, lacking the ability to adaptively determine when, whether, and how to observe the interface. We present GUI-Eyes, a reinforcement learning framework for active visual perception in GUI tasks. To acquire more informative observations, the agent learns to make strategic decisions on both whether and how to invoke visual tools, such as cropping or zooming, within a two-stage reasoning process. To support this behavior, we introduce a progressive perception strategy that decomposes decision-making into coarse exploration and fine-grained grounding, coordinated by a two-level policy. In addition, we design a spatially continuous reward function tailored to tool usage, which integrates both location proximity and region overlap to provide dense supervision and alleviate the reward sparsity common in GUI environments. On the ScreenSpot-Pro benchmark, GUI-Eyes-3B achieves 44.8% grounding accuracy using only 3k labeled samples, significantly outperforming both supervised and RL-based baselines. These results highlight that tool-aware active perception, enabled by staged policy reasoning and fine-grained reward feedback, is critical for building robust and data-efficient GUI agents.

preprint2026arXiv

ScRPO: From Errors to Insights

We introduce Self-correction Relative Policy Optimization (ScRPO), a novel reinforcement learning framework designed to empower large language models with advanced mathematical reasoning capabilities through iterative self-reflection and error correction. The ScRPO framework operates in two distinct phases: (1) Trial-and-error learning stage, where the model is trained via GRPO, and incorrect responses are collected to form an "error pool"; and (2) Self-correction learning stage, which guides the model to introspectively analyze and rectify the reasoning flaws behind its previous errors. Extensive evaluations across challenging mathematical benchmarks, including AIME, AMC, Olympiad, MATH-500, and GSM8k, validate the efficacy of our approach. Using DeepSeek-R1-Distill-Qwen-1.5B and 7B as backbones, ScRPO achieves average accuracies of 64.8% and 77.8%, respectively. This represents a significant improvement of 6.0% and 3.2% over vanilla baselines, consistently outperforming strong post-training methods such as DAPO and GRPO. These findings establish ScRPO as a robust paradigm for enabling autonomous self-improvement in AI systems, particularly in tasks with limited external feedback.

Dakuan Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

GUI-Eyes: Tool-Augmented Perception for Visual Grounding in GUI Agents

ScRPO: From Errors to Insights