Researcher profile

Wenxun Wu

Wenxun Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning

Recent advances in large language models (LLMs) have popularized test-time scaling, where models generate additional reasoning tokens before producing final answers. These approaches have demonstrated significant performance improvements on benchmarks involving mathematical reasoning. However, language models relying solely on direct inference still struggle with tasks demanding up-to-date knowledge or computational tools such as calculators and code interpreters for complex arithmetic operations. To overcome these limitations, we propose Tool-Augmented Policy Optimization (TAPO), a novel reinforcement learning framework that systematically integrates multi-hop reasoning with adaptive tool-calling capabilities. Our approach employs a modified version of Dynamic Sampling Policy Optimization (DAPO), a recently developed RL paradigm, which we adapt specifically for tool invocation scenarios, enabling models to dynamically interleave complex reasoning with on-demand tool usage (including search APIs and Python interpreters). To support this research, we introduce two new datasets: TAPO-easy-60K and TAPO-hard-18K, specifically designed to train and evaluate both fact-based reasoning and mathematical calculation capabilities. Our experiments on Qwen2.5-3B and Qwen2.5-7B models demonstrate the effectiveness of our approach, with both models achieving state-of-the-art performance on tasks requiring external knowledge and mathematical computation among methods with comparable parameters. Notably, TAPO achieves more efficient tool utilization than baseline methods while preventing excessive calls caused by reward hacking. These results highlight the significant potential of combining advanced reasoning with tool usage to enhance model performance in knowledge-intensive and computationally demanding tasks.

preprint2022arXiv

Quantum Computation for Pricing Caps using the LIBOR Market Model

The LIBOR Market Model (LMM) is a widely used model for pricing interest rate derivatives. While the Black-Scholes model is well-known for pricing stock derivatives such as stock options, a larger portion of derivatives are based on interest rates instead of stocks. Pricing interest rate derivatives used to be challenging, as their previous models employed either the instantaneous interest or forward rate that could not be directly observed in the market. This has been much improved since LMM was raised, as it uses directly observable interbank offered rates and is expected to be more precise. Recently, quantum computing has been used to speed up option pricing tasks, but rarely on structured interest rate derivatives. Given the size of the interest rate derivatives market and the widespread use of LMM, we employ quantum computing to price an interest rate derivative, caps, based on the LMM. As caps pricing relates to path-dependent Monte Carlo iterations for different tenors, which is common for many complex structured derivatives, we developed our hybrid classical-quantum approach that applies the quantum amplitude estimation algorithm to estimate the expectation for the last tenor. We show that our hybrid approach still shows better convergence than pure classical Monte Carlo methods, providing a useful case study for quantum computing with a greater diversity of derivatives.