Researcher profile

Jiahao Liu

Jiahao Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Teacher-Guided Policy Optimization for LLM Distillation

The convergence of reinforcement learning and imitation learning has positioned Reverse KL (RKL) as a promising paradigm for on-policy LLM distillation, aiming to unify exploration with teacher supervision. However, we identify a critical limitation: when the student and teacher distributions diverge significantly, standard RKL often fails to yield meaningful improvement due to uninformative negative feedback. To address this inefficiency, we propose Teacher-Guided Policy Optimization (TGPO), an on-policy algorithm that incorporates dense directional guidance by leveraging teacher predictions conditioned on the student's rollout. Because TGPO remains on-policy, the algorithm integrates seamlessly with existing RLVR frameworks without requiring additional data annotation. Experiments on complex reasoning benchmarks demonstrate that TGPO significantly outperforms standard baselines and is robust to different teachers.

preprint2026arXiv

The Evolution of Thought: Tracking LLM Overthinking via Reasoning Dynamics Analysis

Test-time scaling via explicit reasoning trajectories significantly boosts large language model (LLM) performance but often triggers overthinking. To explore this, we analyze reasoning through two lenses: Reasoning Length Dynamics, which reveals a compensatory trade-off between thinking and answer content length that eventually leads to thinking redundancy, and Reasoning Semantic Dynamics, which identifies semantic convergence and repetitive oscillations. These dynamics uncover an instance-specific Reasoning Completion Point (RCP), beyond which computation continues without further performance gain. Since the RCP varies across instances, we propose a Reasoning Completion Point Detector (RCPD), an inference-time early-exit method that identifies the RCP by monitoring the rank dynamics of termination tokens (e.g., </think>). Across AIME and GPQA benchmarks using Qwen3 and DeepSeek-R1, RCPD reduces token usage by up to 44% while preserving accuracy, offering a principled approach to efficient test-time scaling.

preprint2024arXiv

View Distribution Alignment with Progressive Adversarial Learning for UAV Visual Geo-Localization

Unmanned Aerial Vehicle (UAV) visual geo-localization aims to match images of the same geographic target captured from different views, i.e., the UAV view and the satellite view. It is very challenging due to the large appearance differences in UAV-satellite image pairs. Previous works map images captured by UAVs and satellites to a shared feature space and employ a classification framework to learn location-dependent features while neglecting the overall distribution shift between the UAV view and the satellite view. In this paper, we address these limitations by introducing distribution alignment of the two views to shorten their distance in a common space. Specifically, we propose an end-to-end network, called PVDA (Progressive View Distribution Alignment). During training, feature encoder, location classifier, and view discriminator are jointly optimized by a novel progressive adversarial learning strategy. Competition between feature encoder and view discriminator prompts both of them to be stronger. It turns out that the adversarial learning is progressively emphasized until UAV-view images are indistinguishable from satellite-view images. As a result, the proposed PVDA becomes powerful in learning location-dependent yet view-invariant features with good scalability towards unseen images of new locations. Compared to the state-of-the-art methods, the proposed PVDA requires less inference time but has achieved superior performance on the University-1652 dataset.

preprint2022arXiv

Sampling Gaussian Stationary Random Fields: A Stochastic Realization Approach

Generating large-scale samples of stationary random fields is of great importance in the fields such as geomaterial modeling and uncertainty quantification. Traditional methodologies based on covariance matrix decomposition have the diffculty of being computationally expensive, which is even more serious when the dimension of the random field is large. This paper proposes an effcient stochastic realization approach for sampling Gaussian stationary random fields from a systems and control point of view. Specifically, we take the exponential and Gaussian covariance functions as examples and make a decoupling assumption when there are multiple dimensions. Then a rational spectral density is constructed in each dimension using techniques from covariance extension, and the corresponding autoregressive moving-average (ARMA) model is obtained via spectral factorization. As a result, samples of the random field with a specific covariance function can be generated very effciently in the space domain by implementing the ARMA recursion using a white noise input. Such a procedure is computationally cheap due to the fact that the constructed ARMA model has a low order. Furthermore, the same method is integrated to multiscale simulations where interpolations of the generated samples are achieved when one zooms into finer scales. Both theoretical analysis and simulation results show that our approach performs favorably compared with covariance matrix decomposition methods.

preprint2022arXiv

Study of exotic hadrons with machine learning

We analyzed the invariant mass spectrum of near-threshold exotic states for one-channel candidates with a deep neural network. It can extract the scattering length and effective range, which would shed light on the nature of given states, from the experimental mass spectrum. As an application, the mass spectrum of the $X(3872)$ and the $T_{cc}^+$ are studied. The obtained scattering lengths, effective ranges, and most relevant thresholds are consistent with those from fitting to the experimental data. The advantage of the neural network is that it is more stable than the fitting, especially for low-statistic data. The network, which provides another way to analyze the experimental data, can also be applied to other one-channel near-threshold exotic candidates.

preprint2022arXiv

The symbology of Feynman integrals from twistor geometries

We study the symbology of planar Feynman integrals in dimensional regularization by considering geometric configurations in momentum twistor space corresponding to their leading singularities (LS). Cutting propagators in momentum twistor space amounts to intersecting lines associated with loop and external dual momenta, including the special line associated with the point at infinity, which breaks dual conformal symmetry. We show that cross-ratios of intersection points on these lines, especially those on the infinity line, naturally produce symbol letters for Feynman integrals in $D=4-2ε$, which include and generalize their LS. At one loop, we obtain all symbol letters using intersection points from quadruple cuts for integrals up to pentagon kinematics with two massive corners, which agree perfectly with canonical differential equation (CDE) results. We then obtain all two-loop letters, for up to four-mass box and one-mass pentagon kinematics, by considering more intersections arising from two-loop cuts. Finally we comment on how cluster algebras appear from this construction, and importantly how we may extend the method to non-planar integrals.