Researcher profile

Jiaqi Liu

Jiaqi Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives, experiments fail and inform the next attempt, and lessons accumulate across cycles. Existing autonomous research systems often model this process as a linear pipeline: they rely on single-agent reasoning, stop when execution fails, and do not carry experience across runs. We present AutoResearchClaw, a multi-agent autonomous research pipeline built on five mechanisms: structured multi-agent debate for hypothesis generation and result analysis, a self-healing executor with a \textsc{Pivot}/\textsc{Refine} decision loop that transforms failures into information, verifiable result reporting that prevents fabricated numbers and hallucinated citations, human-in-the-loop collaboration with seven intervention modes spanning full autonomy to step-by-step oversight, and cross-run evolution that converts past mistakes into future safeguards. On ARC-Bench, a 25-topic experiment-stage benchmark, AutoResearchClaw outperforms AI Scientist v2 by 54.7%. A human-in-the-loop ablation across seven intervention modes reveals that precise, targeted collaboration at high-leverage decision points consistently outperforms both full autonomy and exhaustive step-by-step oversight. We position AutoResearchClaw as a research amplifier that augments rather than replaces human scientific judgment. Code is available at https://github.com/aiming-lab/AutoResearchClaw.

preprint2026arXiv

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

Interactive agent benchmarks face a tension between scalable construction and realistic workflow evaluation. Hand-authored tasks are expensive to extend and revise, while static prompt evaluation misses failures that only appear when agents operate over persistent state. Existing interactive benchmarks have advanced agent evaluation significantly, but most initialize tasks from clean state and do not systematically test how agents handle pre-existing partial, stale, or conflicting artifacts. We present \textbf{ClawForge}, a generator-backed benchmark framework for executable command-line workflows under state conflict. The framework compiles scenario templates, grounded slots, initialized state, reference trajectories, and validators into reproducible task specifications, and evaluates agents step by step over persistent workflow surfaces using normalized end state and observable side effects rather than exact trajectory matching. We instantiate this framework as the ClawForge-Bench (17 scenarios, 6 ability categories). Results across seven frontier models show that the best model reaches only 45.3% strict accuracy, wrong-state replacement remains below 17\% for all models, and the widest model separation (17% to 90%) is driven by whether agents inspect existing state before acting. Partial-credit and step-efficiency analyses further reveal that many failures are near-miss closures rather than early breakdowns, and that models exhibit qualitatively different failure styles under state conflict.

preprint2026arXiv

Dynamic Graph Neural Networks for Physiological Based Pharmacokinetic Modeling: A Novel Data Driven Approach to Drug Concentration Prediction

Physiologically Based Pharmacokinetic (PBPK) modeling is a key tool in drug development for predicting drug concentration dynamics across organs. Traditional PBPK approaches rely on ordinary differential equations with simplifying assumptions that limit their ability to capture nonlinear and system-level physiological interactions. In this work, we investigate data-driven PBPK modeling using deep learning. We implement two baseline architectures -- a multilayer perceptron (MLP) and a long short-term memory (LSTM) network -- and propose a Dynamic Graph Neural Network (Dynamic GNN) that explicitly models inter-organ interactions through recurrent message passing on a physiological graph. Experiments on a multi-organ pharmacokinetic dataset show that the Dynamic GNN achieves the lowest mean absolute percentage error (MAPE) of 15.7% among all models, demonstrating improved relative accuracy despite slightly higher absolute error compared to the MLP baseline. The model attains an R2 of 0.9342 with more stable error behavior and better captures inter-organ pharmacokinetic relationships. These results highlight the importance of structure-aware modeling for PBPK applications and demonstrate that the proposed Dynamic GNN offers a scalable, equation-free alternative for data-driven pharmacokinetic prediction.

preprint2026arXiv

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.

preprint2026arXiv

MAP-Law: Coverage-Driven Retrieval Control for Multi-Turn Legal Consultation

Legal consultation is a high-stakes, knowledge-intensive task that requires agents to identify relevant legal issues, retrieve authoritative support, and determine when evidence is sufficient for a recommendation. Although retrieval-augmented generation has improved grounding in legal question answering, many multi-turn legal agents still rely on fixed retrieval depth or coarse heuristic control. This often leads to either insufficient support for key legal elements or excessive retrieval that increases context burden and weakens answer focus. We propose MAP-Law, a coverage-driven framework for retrieval control in multi-turn legal consultation. MAP-Law models consultation as a controlled retrieval process over a joint structured state consisting of issue nodes, legal element nodes, and evidence nodes. After each retrieval round, the agent computes Element Coverage, Evidence Coverage, and Marginal Gain, and uses these signals to decide whether to continue retrieval, redirect the search, or generate the final response. In this way, MAP-Law turns stopping from a fixed hyperparameter into an interpretable and auditable decision aligned with legal argumentative structure. Experiments on a self-constructed dataset of 50 cases across eight labor-law scenarios show that MAP-Law with DeepSeek as the action selector achieves an Element Coverage of 0.860 using only 2.9 retrieval rounds and 5.8 evidence pieces on average. Compared with a fixed seven-round baseline, it reduces evidence volume by over 80% and retrieval rounds by 58%. Ablation results further confirm the independent contributions of coverage-driven stopping, joint graph representation, and LLM-based action selection.

preprint2026arXiv

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

We introduce SciEvalKit, a unified benchmarking toolkit designed to evaluate AI models for science across a broad range of scientific disciplines and task capabilities. Unlike general-purpose evaluation platforms, SciEvalKit focuses on the core competencies of scientific intelligence, including Scientific Multimodal Perception, Scientific Multimodal Reasoning, Scientific Multimodal Understanding, Scientific Symbolic Reasoning, Scientific Code Generation, Science Hypothesis Generation and Scientific Knowledge Understanding. It supports six major scientific domains, spanning from physics and chemistry to astronomy and materials science. SciEvalKit builds a foundation of expert-grade scientific benchmarks, curated from real-world, domain-specific datasets, ensuring that tasks reflect authentic scientific challenges. The toolkit features a flexible, extensible evaluation pipeline that enables batch evaluation across models and datasets, supports custom model and dataset integration, and provides transparent, reproducible, and comparable results. By bridging capability-based evaluation and disciplinary diversity, SciEvalKit offers a standardized yet customizable infrastructure to benchmark the next generation of scientific foundation models and intelligent agents. The toolkit is open-sourced and actively maintained to foster community-driven development and progress in AI4Science.

preprint2026arXiv

SkyNative: A Native Multimodal Framework for Remote Sensing Visual Evidence Reasoning

Remote sensing vision-language models commonly rely on pretrained visual encoders to convert images into semantic features before language-model reasoning. While effective for scene-level understanding, this pipeline may prematurely compress local visual evidence, making fine-grained spatial reasoning vulnerable to language priors, especially in ultra-high-resolution remote sensing imagery. We present SkyNative, a native multimodal framework for remote sensing that adopts an encoder-free architecture, removing the pretrained visual backbone to directly represent images as raw patch tokens in the language-model token space. To reconcile low-level visual patches with textual tokens, SkyNative introduces a modality-aware decoupling mechanism that uses modality-specific parameters within a unified autoregressive backbone. We further introduce a visual reliance benchmark that diagnoses whether models ground their answers in image evidence through progressive visual degradation and misleading textual prompts. Across standard remote sensing understanding tasks and large-format spatial reasoning evaluations, SkyNative shows stronger image-grounded perception and improved robustness against prompt-induced language priors. These results suggest that native patch-level multimodal modeling is a promising direction for reliable remote sensing vision-language reasoning.

preprint2026arXiv

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Following the recent achievement of gold-medal performance on the IMO by frontier LLMs, the community is searching for the next meaningful and challenging target for measuring LLM reasoning. Whereas olympiad-style problems measure step-by-step reasoning alone, research-level problems use such reasoning to advance the frontier of mathematical knowledge itself, emerging as a compelling alternative. Yet research-level math benchmarks remain scarce because such problems are difficult to source (e.g., Riemann Bench and FrontierMath-Tier 4 contain 25 and 50 problems, respectively). To support reliable evaluation of next-generation frontier models, we introduce Soohak, a 439-problem benchmark newly authored from scratch by 64 mathematicians. Soohak comprises two subsets. On the Challenge subset, frontier models including Gemini-3-Pro, GPT-5, and Claude-Opus-4.5 reach 30.4%, 26.4%, and 10.4% respectively, leaving substantial headroom, while leading open-weight models such as Qwen3-235B, GPT-OSS-120B, and Kimi-2.5 remain below 15%. Notably, beyond standard problem solving, Soohak introduces a refusal subset that probes a capability intrinsic to research mathematics: recognizing ill-posed problems and pausing rather than producing confident but unjustified answers. On this subset, no model exceeds 50%, identifying refusal as a new optimization target that current models do not directly address. To prevent contamination, the dataset will be publicly released in late 2026, with model evaluations available upon request in the interim.

preprint2026arXiv

STEPS: A Temporal Smooth Error Propagation Solver on the Manifolds for Test-Time Adaptation in Time Series Forecasting

Test-Time Adaptation (TTA) aims to improve time series forecasting under distribution shifts by using limited observations revealed during inference. However, forecasting TTA must operate in a source-free online setting, where the adaptation signal is short, temporally correlated, and potentially noisy. Existing methods can therefore suffer from weak identifiability, error accumulation, and unstable long-horizon corrections when the revealed prefix is sparse or contaminated. To address these issues, we propose STEPS, a Smooth Temporal Error Propagation Solver for TTA in time-series forecasting. STEPS reformulates forecasting TTA as a Dirichlet Boundary Value Problem on a temporal manifold, where the revealed prefix error serves as the boundary condition for the unknown future error field. Then, STEPS solves a smooth and bounded correction field in prediction space: a Local Solver propagates prefix errors under temporal smoothness, a Global Solver retrieves stable cross-window error memory and Spatiotemporal Manifold Fusion (SMF) integrates both solutions into the final correction. Across six standard benchmarks and four frozen backbones, STEPS achieves an average relative MSE reduction of 26.82% over the zero-shot backbone, exceeding the strongest compared TTA baseline by 12.77%. Additional sparse prefix and contamination tests confirm the robustness of STEPS under limited and noisy prefixes.

preprint2026arXiv

Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer

In this work, we target the task of text-driven style transfer in the context of text-to-image (T2I) diffusion models. The main challenge is consistent structure preservation while enabling effective style transfer effects. The past approaches in this field directly concatenate the content and style prompts for a prompt-level style injection, leading to unavoidable structure distortions. In this work, we propose a novel solution to the text-driven style transfer task, namely, Adaptive Style Incorporation~(ASI), to achieve fine-grained feature-level style incorporation. It consists of the Siamese Cross-Attention~(SiCA) to decouple the single-track cross-attention to a dual-track structure to obtain separate content and style features, and the Adaptive Content-Style Blending (AdaBlending) module to couple the content and style information from a structure-consistent manner. Experimentally, our method exhibits much better performance in both structure preservation and stylized effects.

preprint2024arXiv

Unsupervised Continual Anomaly Detection with Contrastively-learned Prompt

Unsupervised Anomaly Detection (UAD) with incremental training is crucial in industrial manufacturing, as unpredictable defects make obtaining sufficient labeled data infeasible. However, continual learning methods primarily rely on supervised annotations, while the application in UAD is limited due to the absence of supervision. Current UAD methods train separate models for different classes sequentially, leading to catastrophic forgetting and a heavy computational burden. To address this issue, we introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD, which equips the UAD with continual learning capability through contrastively-learned prompts. In the proposed UCAD, we design a Continual Prompting Module (CPM) by utilizing a concise key-prompt-knowledge memory bank to guide task-invariant `anomaly' model predictions using task-specific `normal' knowledge. Moreover, Structure-based Contrastive Learning (SCL) is designed with the Segment Anything Model (SAM) to improve prompt learning and anomaly segmentation results. Specifically, by treating SAM's masks as structure, we draw features within the same mask closer and push others apart for general feature representations. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation, demonstrating that our method is significantly better than anomaly detection methods, even with rehearsal training. The code will be available at https://github.com/shirowalker/UCAD.

preprint2022arXiv

A Yaglom type asymptotic result for subcritical branching Brownian motion with absorption

We consider a slightly subcritical branching Brownian motion with absorption, where particles move as Brownian motions with drift $-\sqrt{2+2\varepsilon}$, undergo dyadic fission at rate $1$, and are killed when they reach the origin. We obtain a Yaglom type asymptotic result, showing that the long run expected number of particles conditioned on survival grows exponentially as $1/\sqrt{\varepsilon}$ as the process approaches criticality.

preprint2022arXiv

Boosting Distributed Training Performance of the Unpadded BERT Model

Pre-training models are an important tool in Natural Language Processing (NLP), while the BERT model is a classic pre-training model whose structure has been widely adopted by followers. It was even chosen as the reference model for the MLPerf training benchmark. The distributed training performance optimization of BERT models plays an important role in accelerating the solutions of most NLP tasks. BERT model often uses padding tensors as its inputs, leading to excessive redundant computations. Thus, removing these redundant computations is essential to improve the distributed training performance. This paper designs a new approach to train BERT models with variable-length inputs efficiently. Firstly, we propose a general structure for the variable-length BERT models, and accelerate the encoder layer via our grouped multi-stream FMHA (Fused Multi-Head Attention) method. Secondly, through data exchange, we address the unbalanced workload problem caused by the variable-length inputs, which overlaps highly with the training process. Finally, we optimize the overall performance of the BERT model, such as kernel fusion, and operator optimization. Our experimental results show that our highly optimized BERT model achieves state-of-the-art throughput and ranks first in MLPerf Training v2.0 within the same GPU configuration. The optimizations in this paper can be applied to more BERT-like models in our future works.

preprint2022arXiv

Egocentric Human Trajectory Forecasting with a Wearable Camera and Multi-Modal Fusion

In this paper, we address the problem of forecasting the trajectory of an egocentric camera wearer (ego-person) in crowded spaces. The trajectory forecasting ability learned from the data of different camera wearers walking around in the real world can be transferred to assist visually impaired people in navigation, as well as to instill human navigation behaviours in mobile robots, enabling better human-robot interactions. To this end, a novel egocentric human trajectory forecasting dataset was constructed, containing real trajectories of people navigating in crowded spaces wearing a camera, as well as extracted rich contextual data. We extract and utilize three different modalities to forecast the trajectory of the camera wearer, i.e., his/her past trajectory, the past trajectories of nearby people, and the environment such as the scene semantics or the depth of the scene. A Transformer-based encoder-decoder neural network model, integrated with a novel cascaded cross-attention mechanism that fuses multiple modalities, has been designed to predict the future trajectory of the camera wearer. Extensive experiments have been conducted, with results showing that our model outperforms the state-of-the-art methods in egocentric human trajectory forecasting.

preprint2022arXiv

Label Distribution Learning for Generalizable Multi-source Person Re-identification

Person re-identification (Re-ID) is a critical technique in the video surveillance system, which has achieved significant success in the supervised setting. However, it is difficult to directly apply the supervised model to arbitrary unseen domains due to the domain gap between the available source domains and unseen target domains. In this paper, we propose a novel label distribution learning (LDL) method to address the generalizable multi-source person Re-ID task (i.e., there are multiple available source domains, and the testing domain is unseen during training), which aims to explore the relation of different classes and mitigate the domain-shift across different domains so as to improve the discrimination of the model and learn the domain-invariant feature, simultaneously. Specifically, during the training process, we produce the label distribution via the online manner to mine the relation information of different classes, thus it is beneficial for extracting the discriminative feature. Besides, for the label distribution of each class, we further revise it to give more and equal attention to the other domains that the class does not belong to, which can effectively reduce the domain gap across different domains and obtain the domain-invariant feature. Furthermore, we also give the theoretical analysis to demonstrate that the proposed method can effectively deal with the domain-shift issue. Extensive experiments on multiple benchmark datasets validate the effectiveness of the proposed method and show that the proposed method can outperform the state-of-the-art methods. Besides, further analysis also reveals the superiority of the proposed method.

preprint2022arXiv

Long-time asymptotics and stability for the sine-Gordon equation

In this paper, we study the long-time dynamics and stability properties of the sine-Gordon equation $$f_{tt}-f_{xx}+\sin f=0.$$ Firstly, we use the nonlinear steepest descent for Riemann-Hilbert problems to compute the long-time asymptotics of the solutions to the sine-Gordon equation whose initial condition belongs to some weighted Sobolev spaces. Secondly, we study the asymptotic stability of the sine-Gordon equation. It is known that the obstruction to the asymptotic stability of the sine-Gordon equation in the energy space is the existence of small breathers which is also closely related to the emergence of wobbling kinks. Combining the long-time asymptotics and a refined approximation argument, we analyze the asymptotic stability properties of the sine-Gordon equation in weighted energy spaces. Our stability analysis gives a criterion for the weight which is sharp up to the endpoint so that the asymptotic stability holds.

preprint2022arXiv

Long-time asymptotics of the modified KdV equation in weighted Sobolev spaces

The long time behavior of solutions to the defocusing modified Korteweg-de vries (MKdV) equation is established for initial conditions in some weighted Sobolev spaces. Our approach is based on the nonlinear steepest descent method of Deift and Zhou and its reformulation by Dieng and McLaughlin through $\overline{\partial}$-derivatives. To extend the asymptotics to solutions with initial data in lower regularity spaces, we apply a global approximation via PDE techniques.

preprint2022arXiv

Particle configurations for branching Brownian motion with an inhomogeneous branching rate

Aiming to understand the distribution of fitness levels of individuals in a large population undergoing selection, we study the particle configurations of branching Brownian motion where each particle independently moves as Brownian motion with negative drift, particles can die or undergo dyadic fission, and the difference between the birth rate and the death rate is proportional to the particle's location. Under some assumptions, we obtain the limit in probability of the number of particles in any given interval and an explicit formula for the asymptotic empirical density of the fitness distribution. We show that after a sufficiently long time, the fitness distribution from the lowest to the highest fitness levels approximately evolves as a traveling wave with a profile which is asymptotically related the the Airy function. Our work complements the results in Roberts and Schweinsberg (2021), giving a fuller picture of the fitness distribution.

preprint2020arXiv

Combine Convolution with Recurrent Networks for Text Classification

Convolutional neural network (CNN) and recurrent neural network (RNN) are two popular architectures used in text classification. Traditional methods to combine the strengths of the two networks rely on streamlining them or concatenating features extracted from them. In this paper, we propose a novel method to keep the strengths of the two networks to a great extent. In the proposed model, a convolutional neural network is applied to learn a 2D weight matrix where each row reflects the importance of each word from different aspects. Meanwhile, we use a bi-directional RNN to process each word and employ a neural tensor layer that fuses forward and backward hidden states to get word representations. In the end, the weight matrix and word representations are combined to obtain the representation in a 2D matrix form for the text. We carry out experiments on a number of datasets for text classification. The experimental results confirm the effectiveness of the proposed method.

preprint2020arXiv

Modeling, Analysis, and Optimization of Grant-Free NOMA in Massive MTC via Stochastic Geometry

Massive machine-type communications (mMTC) is a crucial scenario to support booming Internet of Things (IoTs) applications. In mMTC, although a large number of devices are registered to an access point (AP), very few of them are active with uplink short packet transmission at the same time, which requires novel design of protocols and receivers to enable efficient data transmission and accurate multi-user detection (MUD). Aiming at this problem, grant-free non-orthogonal multiple access (GF-NOMA) protocol is proposed. In GF-NOMA, active devices can directly transmit their preambles and data symbols altogether within one time frame, without grant from the AP. Compressive sensing (CS)-based receivers are adopted for non-orthogonal preambles (NOP)-based MUD, and successive interference cancellation is exploited to decode the superimposed data signals. In this paper, we model, analyze, and optimize the CS-based GF-MONA mMTC system via stochastic geometry (SG), from an aspect of network deployment. Based on the SG network model, we first analyze the success probability as well as the channel estimation error of the CS-based MUD in the preamble phase and then analyze the average aggregate data rate in the data phase. As IoT applications highly demands low energy consumption, low infrastructure cost, and flexible deployment, we optimize the energy efficiency and AP coverage efficiency of GF-NOMA via numerical methods. The validity of our analysis is verified via Monte Carlo simulations. Simulation results also show that CS-based GF-NOMA with NOP yields better MUD and data rate performances than contention-based GF-NOMA with orthogonal preambles and CS-based grant-free orthogonal multiple access.

preprint2019arXiv

Global Existence for the Derivative Nonlinear Schrödinger Equation with Arbitrary Spectral Singularities

We show that the derivative nonlinear Schrödinger (DNLS) equation is globally well-posed in the weighted Sobolev space $H^{2,2}(\mathbb{R})$. Our result exploits the complete integrability of DNLS and removes certain spectral conditions on the initial data, thanks to Xin Zhou's analysis on spectral singularities in the context of inverse scattering.