Researcher profile

Jiahao Zhang

Jiahao Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC

Personal LLM agents increasingly combine foreground reactive interactions with background proactive monitoring, forming long-lived, stateful LLM flows that interleave prefill and token-by-token decode. While modern heterogeneous SoCs integrate CPUs, iGPUs, and NPUs to support on-device intelligence, existing LLM engines assume static, single-shot inference and lack mechanisms for flow-level concurrency, prioritization, and efficient accelerator coordination. As a result, commodity SoCs remain poorly matched to the dynamic, mixed-criticality execution patterns of personal agents. This paper presents Agent$.$xpu, the first LLM engine that orchestrates concurrent reactive and proactive LLM flows on commodity SoCs. Extensive profiling uncovers unique SoC characteristics of operator-accelerator affinity, asymmetric DDR contention, and stage-divergent batching behaviors distinct from cloud-serving assumptions. Agent$.$xpu introduces three key techniques: a heterogeneous execution graph (HEG) capturing NPU/iGPU affinity and elastic operator binding; flow-aware NPU-iGPU coordination with stage elasticity, decoupling prefill and decode to reduce bandwidth contention and enforce priorities; and fine-grained preemption with slack-aware piggybacking to guarantee reactive responsiveness without starving proactive work. Across realistic personal-agent workloads, Agent$.$xpu delivers 1.2-4.9$\times$ proactive throughput and reduces reactive latency by at least 91%, compared with both industrial iGPU-only serving engine and NPU-iGPU static inference with optimal tensor-partitioning schemes. Agent$.$xpu also minimizes energy consumption and graphics interference via controlled iGPU usage.

preprint2026arXiv

AssemblyBench: Physics-Aware Assembly of Complex Industrial Objects

Assembling objects from parts requires understanding multimodal instructions, linking them to 3D components, and predicting physically plausible 6-DoF motions for each assembly step. Existing datasets focus on simplified scenarios, overlooking shape complexities and assembly trajectories in industrial assemblies. We introduce AssemblyBench, a synthetic dataset of 2,789 industrial objects with multimodal instruction manuals, corresponding 3D part models, and part assembly trajectories. We also propose a transformer-based model, AssemblyDyno, which uses the instructional manual and the 3D shape of each part to jointly predict assembly order and part assembly trajectories. AssemblyDyno outperforms prior works in both assembly pose estimation and trajectory feasibility, where the latter is evaluated by our physics-based simulations.

preprint2026arXiv

Engagement Process: Rethinking the Temporal Interface of Action and Observation

Task completion in digital and physical environments increasingly involves complex temporal interaction, where actions and observations unfold over different time scales rather than align with fixed observation--action steps. To model such interactions, we propose \emph{Engagement Process} (EP), an interaction formalism that inherits the decision-theoretic structure of POMDPs while making time explicit in the action--observation interface. EP represents actions and observations as decoupled event streams along time, rather than updates paired at fixed decision steps. This interface captures single-agent timing issues such as deliberation latency, delayed feedback, and persistent actions, while supporting richer agent-side organization, multi-rate coordination, and compositional interaction among subsystems. Across toy, LLM-agent, and learning experiments, EP exposes temporal behaviors hidden by step-based interfaces and enables policies to adapt under explicit time costs.

preprint2026arXiv

From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning

Supervised fine-tuning (SFT) data selection is commonly formulated as instance ranking: score each example and retain a top-$k$ subset. However, effective SFT training subsets are often produced through ordered curation recipes, where filtering, mixing, and deduplication operators jointly shape the final data distribution. We formulate this problem as fixed-pool data recipe search: given a raw instruction pool and a library of grounded operators, the goal is to discover an executable recipe that constructs a high-quality selected subset under a limited budget of full SFT evaluations, without generating, rewriting, or augmenting training samples. We introduce AutoSelection, a two-layer solver that decouples fixed-pool materialization based on cached task-, data-, and model-side signals from expensive full evaluation, using warmup probes, realized subset states, local recipe edits, Gaussian-process-assisted ranking, and stagnation-triggered reseeding. Experiments on a 90K instruction pool show that AutoSelection achieves the strongest in-distribution reasoning average across three base models, outperforming full-data training, random recipe search, random top-$k$, and single-operator selectors. Additional Out-of-distribution graph-reasoning results, search-stability analyses, structural ablations, and 1.5B-to-7B transfer checks further show that recipe structure matters beyond individual selection operators. Code is available at https://github.com/w253/AutoSelection.

preprint2026arXiv

TriALS: Triphasic-Aided Liver Lesion Segmentation Benchmark in Non-Contrast CT

Automated segmentation of liver lesions on non-contrast computed tomography (NCCT) is clinically important but fundamentally challenging, particularly in low-resource settings across Africa and Asia where contrast agents are frequently unavailable. Progress has been limited by the absence of annotated NCCT benchmarks. Here we describe the TriALS challenge for automated liver lesion segmentation under contrast-limited conditions, supported by a multi-centre dataset of 150 cases with four-phase CT acquisitions (600 volumes) from Egyptian and Chinese institutions. Algorithms were evaluated on 70 cases from three institutions, including an independent external cohort. The top-performing method achieved a mean venous-phase Dice of 0.754, consistent with human-level performance, yet dropped to 0.57 on NCCT. On external validation, the leading method outperformed off-the-shelf models by up to 28% in Dice on NCCT. Algorithm performance was most strongly predicted by training data scale and pre-training strategy. A cross-year comparison exposed a persistent perceptual barrier on NCCT that scaling pre-training alone cannot overcome. Data, annotations, and code are available at https://github.com/xmed-lab/TriALS.

preprint2022arXiv

MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems

A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonlinear operators. However, it can be challenging by using the original model to learn nonlinear operators for high-dimensional stochastic problems. We propose a new multi-resolution autoencoder DeepONet model referred to as MultiAuto-DeepONet to deal with this difficulty with the aid of convolutional autoencoder. The encoder part of the network is designed to reduce the dimensionality as well as discover the hidden features of high-dimensional stochastic inputs. The decoder is designed to have a special structure, i.e. in the form of DeepONet. The first DeepONet in decoder is designed to reconstruct the input function involving randomness while the second one is used to approximate the solution of desired equations. Those two DeepONets has a common branch net and two independent trunk nets. This architecture enables us to deal with multi-resolution inputs naturally. By adding $L_1$ regularization to our network, we found the outputs from the branch net and two trunk nets all have sparse structures. This reduces the number of trainable parameters in the neural network thus making the model more efficient. Finally, we conduct several numerical experiments to illustrate the effectiveness of our proposed MultiAuto-DeepONet model with uncertainty quantification.

preprint2022arXiv

PAGP: A physics-assisted Gaussian process framework with active learning for forward and inverse problems of partial differential equations

In this work, a Gaussian process regression(GPR) model incorporated with given physical information in partial differential equations(PDEs) is developed: physics-assisted Gaussian processes(PAGP). The targets of this model can be divided into two types of problem: finding solutions or discovering unknown coefficients of given PDEs with initial and boundary conditions. We introduce three different models: continuous time, discrete time and hybrid models. The given physical information is integrated into Gaussian process model through our designed GP loss functions. Three types of loss function are provided in this paper based on two different approaches to train the standard GP model. The first part of the paper introduces the continuous time model which treats temporal domain the same as spatial domain. The unknown coefficients in given PDEs can be jointly learned with GP hyper-parameters by minimizing the designed loss function. In the discrete time models, we first choose a time discretization scheme to discretize the temporal domain. Then the PAGP model is applied at each time step together with the scheme to approximate PDE solutions at given test points of final time. To discover unknown coefficients in this setting, observations at two specific time are needed and a mixed mean square error function is constructed to obtain the optimal coefficients. In the last part, a novel hybrid model combining the continuous and discrete time models is presented. It merges the flexibility of continuous time model and the accuracy of the discrete time model. The performance of choosing different models with different GP loss functions is also discussed. The effectiveness of the proposed PAGP methods is illustrated in our numerical section.

preprint2022arXiv

RMFGP: Rotated Multi-fidelity Gaussian process with Dimension Reduction for High-dimensional Uncertainty Quantification

Multi-fidelity modelling arises in many situations in computational science and engineering world. It enables accurate inference even when only a small set of accurate data is available. Those data often come from a high-fidelity model, which is computationally expensive. By combining the realizations of the high-fidelity model with one or more low-fidelity models, the multi-fidelity method can make accurate predictions of quantities of interest. This paper proposes a new dimension reduction framework based on rotated multi-fidelity Gaussian process regression and a Bayesian active learning scheme when the available precise observations are insufficient. By drawing samples from the trained rotated multi-fidelity model, the so-called supervised dimension reduction problems can be solved following the idea of the sliced average variance estimation (SAVE) method combined with a Gaussian process regression dimension reduction technique. This general framework we develop can effectively solve high-dimensional problems while the data are insufficient for applying traditional dimension reduction methods. Moreover, a more accurate surrogate Gaussian process model of the original problem can be obtained based on our trained model. The effectiveness of the proposed rotated multi-fidelity Gaussian process(RMFGP) model is demonstrated in four numerical examples. The results show that our method has better performance in all cases and uncertainty propagation analysis is performed for last two cases involving stochastic partial differential equations.

preprint2022arXiv

Spatiotemporal differentiators generating optical vortices with transverse orbital angular momentum and detecting sharp change of pulse envelope

As a new degree of freedom for optical manipulation, recently spatiotemporal optical vortices (STOVs) carrying transverse orbital angular momentums have been experimentally demonstrated with pulse shapers. Here a spatiotemporal differentiator is proposed to generate STOVs with transverse orbital angular momentum. In order to create phase singularity in the spatiotemporal domain, the spatiotemporal differentiator is designed by breaking spatial mirror symmetry. In contrast to pulse shapers, the device proposed here is a simple one-dimensional periodic nanostructure and thus it is much more compact. For a normal incident pulse, the differentiator generates a transmitted STOV pulse with transverse orbital angular momentum. Furthermore, the interference of the generated STOVs can be used to detect the sharp changes of pulse envelopes, in both spatial and temporal dimensions.

preprint2020arXiv

The tetragonal phase of CH$_{3}$NH$_{3}$PbI$_{3}$ is strongly anharmonic

Halide perovskite (HP) semiconductors exhibit unique strong coupling between the electronic and structural dynamics. The high-temperature cubic phase of HPs is known to be entropically stabilized, with imaginary frequencies in the calculated phonon dispersion relation. Similar calculations, based on the static average crystal structure, predict a stable tetragonal phase with no imaginary modes. This work shows that in contrast to standard theory predictions, the room-temperature tetragonal phase of CH$_{3} $NH$_{3} $PbI$_{3}$ is strongly anharmonic. We use Raman polarization-orientation (PO) measurements and \textit{ab initio} molecular dynamics (AIMD) to investigate the origin and temperature evolution of the strong structural anharmonicity throughout the tetragonal phase. Raman PO measurements reveal a new spectral feature that resembles a soft mode. This mode shows an unusual continuous increase in damping with temperature which is indicative of an anharmonic potential surface. The analysis of AIMD trajectories identifies two major sources of anharmonicity: the orientational unlocking of the [CH$_{3} $NH$_{3}$]$^+$ ions and large amplitude octahedral tilting that continuously increases with temperature. Our work suggests that the standard phonon picture cannot describe the structural dynamics of tetragonal CH$_{3} $NH$_{3} $PbI$_{3}$.

preprint2019arXiv

Halide perovskites under polarized light: Vibrational symmetry analysis using polarized Raman

In the last decade, hybrid organic-inorganic halide perovskites have emerged as a new type of semiconductor for photovoltaics and other optoelectronic applications. Unlike standard, tetrahedrally bonded semiconductors (e.g. Si and GaAs), the ionic thermal fluctuations in the halide perovskites (i.e. structural dynamics) are strongly coupled to the electronic dynamics. Therefore, it is crucial to obtain accurate and detailed knowledge about the nature of atomic motions within the crystal. This has proved to be challenging due to low thermal stability and the complex, temperature dependent structural phase sequence of the halide perovskites. Here, these challenges are overcome and a detailed analysis of the mode symmetries is provided in the low-temperature orthorhombic phase of methylammonium-lead iodide. Raman measurements using linearly- and circularly- polarized light at 1.16 eV excitation are combined with density functional perturbation theory (DFPT). By performing an iterative analysis of Raman polarization-orientation dependence and DFPT mode analysis, the crystal orientation is determined. Subsequently, accounting for birefringence effects detected using circularly polarized light excitation, the symmetries of all the observed Raman-active modes at 10 K are assigned.

preprint2019arXiv

Quantum-critical phase out of frustrated magnetism in a strongly correlated metal

Strange-metal phenomena often develop at the border of antiferromagnetic order in strongly correlated metals. It has been well established that they can originate from the fluctuations anchored by the point of continuous quantum phase transition out of the antiferromagnetic order, i.e., a quantum critical point. What has been unclear is how these phenomena can be associated with a potential new phase of matter at zero temperature. Here we show that magnetic frustration of the 4f-local moments in the distorted Kagome intermetallic compound CePdAl gives rise to such a paramagnetic quantum-critical phase. Moreover, we demonstrate that this phase turns into a Fermi liquid through a Mott-like crossover; in a two-dimensional parameter space of pressure and magnetic field, this crossover is linked to a line of zero-temperature 4f-electron localization-delocalization phase transitions at low and moderate pressures. Our discovery motivates a new design principle for strongly correlated metallic states with unconventional excitations that may underlie the development of such effects as high temperature superconductivity.