Source author record

Zihan Xu

Zihan Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mtrl-sci Artificial Intelligence cond-mat.mes-hall Computation and Language Computer Vision eess.SP Information Theory Machine Learning math.IT Multiagent Systems

Catalog footprint

What is connected

9works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of task completion. Existing task verification is treated as a passive, post-hoc process: a verifier (i.e., rule-based scoring script, reward or critic model, and LLM-as-a-Judge) analyzes the agent's entire interaction trajectory to determine if the agent succeeds. Such processing of verbose context that contains irrelevant, noisy history poses challenges to the verification protocols and therefore leads to prohibitive cost and low reliability. To overcome this bottleneck, we propose SmartSnap, a paradigm shift from this passive, post-hoc verification to proactive, in-situ self-verification by the agent itself. We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to not only complete a task but also to prove its accomplishment with curated snapshot evidences. Guided by our proposed 3C Principles (Completeness, Conciseness, and Creativity), the agent leverages its accessibility to the online environment to perform self-verification on a minimal, decisive set of snapshots. Such evidences are provided as the sole materials for a general LLM-as-a-Judge verifier to determine their validity and relevance. Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner, bringing performance gains up to 26.08% and 16.66% respectively to 8B and 30B models. The synergizing between solution finding and evidence seeking facilitates the cultivation of efficient, self-verifying agents with competitive performance against DeepSeek V3.1 and Qwen3-235B-A22B. Code is available at: https://github.com/TencentYoutuResearch/SmartSnap

preprint2025arXiv

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Existing Large Language Model (LLM) agent frameworks face two significant challenges: high configuration costs and static capabilities. Building a high-quality agent often requires extensive manual effort in tool integration and prompt engineering, while deployed agents struggle to adapt to dynamic environments without expensive fine-tuning. To address these issues, we propose \textbf{Youtu-Agent}, a modular framework designed for the automated generation and continuous evolution of LLM agents. Youtu-Agent features a structured configuration system that decouples execution environments, toolkits, and context management, enabling flexible reuse and automated synthesis. We introduce two generation paradigms: a \textbf{Workflow} mode for standard tasks and a \textbf{Meta-Agent} mode for complex, non-standard requirements, capable of automatically generating tool code, prompts, and configurations. Furthermore, Youtu-Agent establishes a hybrid policy optimization system: (1) an \textbf{Agent Practice} module that enables agents to accumulate experience and improve performance through in-context optimization without parameter updates; and (2) an \textbf{Agent RL} module that integrates with distributed training frameworks to enable scalable and stable reinforcement learning of any Youtu-Agents in an end-to-end, large-scale manner. Experiments demonstrate that Youtu-Agent achieves state-of-the-art performance on WebWalkerQA (71.47\%) and GAIA (72.8\%) using open-weight models. Our automated generation pipeline achieves over 81\% tool synthesis success rate, while the Practice module improves performance on AIME 2024/2025 by +2.7\% and +5.4\% respectively. Moreover, our Agent RL training achieves 40\% speedup with steady performance improvement on 7B LLMs, enhancing coding/reasoning and searching capabilities respectively up to 35\% and 21\% on Maths and general/multi-hop QA benchmarks.

preprint2022arXiv

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining

Large-scale vision-language pre-training has achieved promising results on downstream tasks. Existing methods highly rely on the assumption that the image-text pairs crawled from the Internet are in perfect one-to-one correspondence. However, in real scenarios, this assumption can be difficult to hold: the text description, obtained by crawling the affiliated metadata of the image, often suffers from the semantic mismatch and the mutual compatibility. To address these issues, we introduce PyramidCLIP, which constructs an input pyramid with different semantic levels for each modality, and aligns visual elements and linguistic elements in the form of hierarchy via peer-level semantics alignment and cross-level relation alignment. Furthermore, we soften the loss of negative samples (unpaired samples) so as to weaken the strict constraint during the pre-training stage, thus mitigating the risk of forcing the model to distinguish compatible negative pairs. Experiments on five downstream tasks demonstrate the effectiveness of the proposed PyramidCLIP. In particular, with the same amount of 15 million pre-training image-text pairs, PyramidCLIP exceeds CLIP on ImageNet zero-shot classification top-1 accuracy by 10.6%/13.2%/10.0% with ResNet50/ViT-B32/ViT-B16 based image encoder respectively. When scaling to larger datasets, PyramidCLIP achieves the state-of-the-art results on several downstream tasks. In particular, the results of PyramidCLIP-ResNet50 trained on 143M image-text pairs surpass that of CLIP using 400M data on ImageNet zero-shot classification task, significantly improving the data efficiency of CLIP.

preprint2022arXiv

Spectral and Energy Efficiency of DCO-OFDM in Visible Light Communication Systems with Finite-Alphabet Inputs

The bound of the information transmission rate of direct current biased optical orthogonal frequency division multiplexing (DCO-OFDM) for visible light communication (VLC) with finite-alphabet inputs is yet unknown, where the corresponding spectral efficiency (SE) and energy efficiency (EE) stems out as the open research problems. In this paper, we derive the exact achievable rate of {the} DCO-OFDM system with finite-alphabet inputs for the first time. Furthermore, we investigate SE maximization problems of {the} DCO-OFDM system subject to both electrical and optical power constraints. By exploiting the relationship between the mutual information and the minimum mean-squared error, we propose a multi-level mercury-water-filling power allocation scheme to achieve the maximum SE. Moreover, the EE maximization problems of {the} DCO-OFDM system are studied, and the Dinkelbach-type power allocation scheme is developed for the maximum EE. Numerical results verify the effectiveness of the proposed theories and power allocation schemes.

preprint2020arXiv

CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking

In dialogue systems, a dialogue state tracker aims to accurately find a compact representation of the current dialogue status, based on the entire dialogue history. While previous approaches often define dialogue states as a combination of separate triples ({\em domain-slot-value}), in this paper, we employ a structured state representation and cast dialogue state tracking as a sequence generation problem. Based on this new formulation, we propose a {\bf C}oa{\bf R}s{\bf E}-to-fine {\bf DI}alogue state {\bf T}racking ({\bf CREDIT}) approach. Taking advantage of the structured state representation, which is a marked language sequence, we can further fine-tune the pre-trained model (by supervised learning) by optimizing natural language metrics with the policy gradient method. Like all generative state tracking methods, CREDIT does not rely on pre-defined dialogue ontology enumerating all possible slot values. Experiments demonstrate our tracker achieves encouraging joint goal accuracy for the five domains in MultiWOZ 2.0 and MultiWOZ 2.1 datasets.

preprint2012arXiv

Electricity generated from Ambient Heat by Pencils

The idea of generating electricity from ambient heat has significant meanings for both science and engineering. Here, we present an interesting idea of using pencil leads, which are made of graphite and clay, to generate electricity from the thermal motion of ions in aqueous electrolyte solution at room temperature. When two pencil leads were placed in parallel in the solutions, output power of 0.655, 1.023, 1.023 and 1.828 nW were generated in 3 M KCl, NaCl, NiCl2 and CuCl2 solutions, respectively. Besides, we also demonstrate that two pieces of reduced graphene oxide films and /or few-layer graphene films can generate much more electricity when dipped into the solutions, while there was no electrodes contact with the solution. This finding further verified that the electricity was not resulted from the chemical reaction between the electrodes and the solutions. The results also demonstrate that ambient thermal energy can be harvested with low dimensional materials, such as graphene, or with the surface of solid material without the presence of temperature gradient. However, the mechanism is still unclear.

preprint2012arXiv

Electricity Harvested from Ambient Heat across Silicon Surface

We report that electricity can be generated from limitless thermal motion of ions by two dimensional (2D) surface of silicon wafer at room temperature. A typical silicon device, on which asymmetric electrodes with Au and Ag thin films were fabricated, can generate a typical open-circuit voltage up to 0.40 V in 5 M CuCl2 solution and an output current over 11 μA when a 25 kΩ resistor was loaded into the circuit. Positive correlation between the output current and the temperature, as well as the concentration, was observed. The maximum output current and power density are 17 μA and 8.6 μW/cm2, respectively. The possibility of chemical reaction was excluded by four groups of control experiments. A possible dynamic drag mechanism was proposed to explain the experimental results. This finding further demonstrates that ambient heat in the environment can be harvested by 2D semiconductor surfaces or low dimensional materials and would contribute significantly to the research of renewable energy. However, this finding does not agree with the second law of thermal dynamatics. A lot of future work will be needed to study the mechanism behind this phenomenon.

preprint2012arXiv

Graphene Battery made of Low Cost Reduced Graphene Oxide

Graphene can collect energy from the ambient heat and convert it to electricity, which makes it an ideal candidate for the fabrication of self-powered devices. However, this technology is suffering the high cost, which limits the practical use of it. In this work, we demonstrated that the cost can be reduced by using low cost reduced graphene oxide (RGO), graphite electrodes and low cost glass substrates. The results showed that this technology can be of practical value for the "battery" industry.

preprint2012arXiv

Self-Charged Graphene Battery Harvests Electricity from Thermal Energy of the Environment

The energy of ionic thermal motion presents universally, which is as high as 4 kJ\bullet kg-1\bullet K-1 in aqueous solution, where thermal velocity of ions is in the order of hundreds of meters per second at room temperature1,2. Moreover, the thermal velocity of ions can be maintained by the external environment, which means it is unlimited. However, little study has been reported on converting the ionic thermal energy into electricity. Here we present a graphene device with asymmetric electrodes configuration to capture such ionic thermal energy and convert it into electricity. An output voltage around 0.35 V was generated when the device was dipped into saturated CuCl2 solution, in which this value lasted over twenty days. A positive correlation between the open-circuit voltage and the temperature, as well as the cation concentration, was observed. Furthermore, we demonstrated that this finding is of practical value by lighting a commercial light-emitting diode up with six of such graphene devices connected in series. This finding provides a new way to understand the behavior of graphene at molecular scale and represents a huge breakthrough for the research of self-powered technology. Moreover, the finding will benefit quite a few applications, such as artificial organs, clean renewable energy and portable electronics.

Zihan Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining

Spectral and Energy Efficiency of DCO-OFDM in Visible Light Communication Systems with Finite-Alphabet Inputs

CREDIT: Coarse-to-Fine Sequence Generation for Dialogue State Tracking

Electricity generated from Ambient Heat by Pencils

Electricity Harvested from Ambient Heat across Silicon Surface

Graphene Battery made of Low Cost Reduced Graphene Oxide

Self-Charged Graphene Battery Harvests Electricity from Thermal Energy of the Environment