Source author record

Jian Cao

Jian Cao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

20works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AP2O-Coder: Adaptively Progressive Preference Optimization for Reducing Compilation and Runtime Errors in LLM-Generated Code

LLMs' code generation capabilities have yielded substantial improvements in the effectiveness of programming tasks. However, LLM-generated code still suffers from compilation and runtime errors. Existing offline preference optimization methods primarily focus on enhancing LLMs' coding abilities using pass/fail signals in the preference data, overlooking the deep-level error types in the failed codes. To address this, we propose Adaptively Progressive Preference Optimization (AP2O) for coding (i.e., AP2O-Coder), a method that guides LLMs adaptively and methodically to reduce code errors for code generation. Specifically, we construct an error notebook from failed codes and progressively optimize the LLM to correct errors type by type. Furthermore, we adaptively replay error types to tailor to the LLM's changing weaknesses throughout the training process. Through extensive experiments on both code and general LLMs (Llama, Qwen, and DeepSeek series) with parameters ranging from 0.5B to 34B, our AP2O-Coder improves code generation performance by up to 3% in pass@k while using less preference data. Code: https://github.com/TsingZ0/AP2O

preprint2026arXiv

GAPO: Robust Advantage Estimation for Real-World Code LLMs

Reinforcement learning (RL) is widely used for post-training large language models (LLMs) in code editing, where group-relative methods, such as GRPO, are popular due to their critic-free and normalized advantage estimation. However, in real-world code-editing scenarios, reward distributions are often skewed with unpredictable noise, leading to distorted advantage computation and increased rollout outliers. To address this issue, we propose Group Adaptive Policy Optimization (GAPO), which adaptively finds an interval with the highest SNR (Signal to Noise Ratio) per prompt and uses the median of that interval as an adaptive Q to replace the group mean in advantage calculation to reduce noise further. This adaptive Q robustly handles rollout noise while remaining plug-and-play and efficient. We evaluate GAPO on nine instruction-tuned LLMs (3B-14B) using a collected large dataset of 51,844 real-world, history-aware code-editing tasks spanning 10 programming languages. GAPO yields up to 4.35 in-domain (ID) and 5.30 out-of-domain (OOD) exact-match improvements over GRPO and its variant DAPO, while achieving lower clipping ratios and higher GPU throughput. Code: https://github.com/TsingZ0/verl-GAPO.

preprint2026arXiv

Machine Learning-Driven Creep Law Discovery Across Alloy Compositional Space

Hihg-temperature creep characterization of structural alloys traditionally relies on serial uniaxial tests, which are highly inefficient for exploring the large search space of alloy compositions and for material discovery. Here, we introduce a machine-learning-assisted, high-throughput framework for creep law identification based on a dimple array bulge instrument (DABI) configuration, which enables parallel creep testing of 25 dimples, each fabricated from a different alloy, in a single experiment. Full-field surface displacements of dimples undergoing time-dependent creep-induced bulging under inert gas pressure are measured by 3D digital image correlation. We train a recurrent neural network (RNN) as a surrogate model, mapping creep parameters and loading conditions to the time-dependent deformation response of DABI. Coupling this surrogate with a particle swarm optimization scheme enables rapid and global inverse identification with sparsity regularization of creep parameters from experiment displacement-time histories. In addition, we propose a phenomenological creep law with a time-dependent stress exponent that captures the sigmoidal primary creep observed in wrought INCONEL 625 and extracts its temperature dependence from DABI test at multiple temperatures. Furthermore, we employ a general creep law combining several conventional forms together with regularized inversion to identify the creep laws for 47 additional Fe-, Ni-, and Co-rich alloys and to automatically select the dominant functional form for each alloy. This workflow combined with DABI experiment provides a quantitative, high-throughput creep characterization platform that is compatible with data mining, composition-property modeling, and nonlinear structural optimization with creep behavior across a large alloy design space.

preprint2026arXiv

Permutation-preserving Functions and Neural Vecchia Covariance Kernels

We introduce a novel framework for constructing scalable and flexible covariance kernels for Gaussian processes (GPs) by directly learning the covariance structure under a regression-type parameterization induced by Vecchia approximations, using deep neural architectures. Specifically, we model kriging coefficients and conditional standard deviations, deterministic quantities that uniquely characterize the covariance, providing stable and informative learning targets. Exploiting the permutation-equivariant structure of conditioning sets in the Vecchia factorization, we derive a universal representation for permutation-preserving functions and design neural architectures that respect this symmetry, leading to improved training stability and data efficiency. The proposed approach enables expressive, non-stationary kernel learning while maintaining computational scalability, thereby bridging classical GP methodology with modern deep learning.

preprint2026arXiv

STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models

Computer-aided design (CAD) is vital to modern manufacturing, yet model creation remains labor-intensive and expertise-heavy. To enable non-experts to translate intuitive design intent into manufacturable artifacts, recent large language models-based text-to-CAD efforts focus on command sequences or script-based formats like CadQuery. However, these formats are kernel-dependent and lack universality for manufacturing. In contrast, the Standard for the Exchange of Product Data (STEP, ISO 10303) file is a widely adopted, neutral boundary representation (B-rep) format directly compatible with manufacturing, but its graph-structured, cross-referenced nature poses unique challenges for auto-regressive LLMs. To address this, we curate a dataset of ~40K STEP-caption pairs and introduce novel preprocessing tailored for the graph-structured format of STEP, including a depth-first search-based reserialization that linearizes cross-references while preserving locality and chain-of-thought(CoT)-style structural annotations that guide global coherence. We integrate retrieval-augmented generation to ground predictions in relevant examples for supervised fine-tuning, and refine generation quality through reinforcement learning with a specific Chamfer Distance-based geometric reward. Experiments demonstrate consistent gains of our STEP-LLM in geometric fidelity over the Text2CAD baseline, with improvements arising from multiple stages of our framework: the RAG module substantially enhances completeness and renderability, the DFS-based reserialization strengthens overall accuracy, and the RL further reduces geometric discrepancy. Both metrics and visual comparisons confirm that STEP-LLM generates shapes with higher fidelity than Text2CAD. These results show the feasibility of LLM-driven STEP model generation from natural language, showing its potential to democratize CAD design for manufacturing.

preprint2025arXiv

Efficient GPU-computing simulation platform JAX-PF for differentiable phase field model

We present JAX-PF, an open-source, GPU-accelerated, and differentiable Phase Field (PF) software package, supporting both explicit and implicit time stepping schemes. Leveraging the modern computing architecture JAX, JAX-PF achieves high performance through array programming and GPU acceleration, delivering ~5x speedup over PRISMS-PF with MPI (24 CPU cores) for systems with ~4.19 million degrees of freedom using explicit schemes, and scaling efficiently with implicit schemes for large-size problems. Furthermore, a key feature of JAX-PF is automatic differentiation (AD), eliminating manual derivations of free-energy functionals and Jacobians. Beyond forward simulations, JAX-PF demonstrates its potential in inverse design by providing sensitivities for gradient-based optimization. We demonstrate, for the first time, the calibration of PF material parameters using AD-based sensitivities, highlighting its capability for high-dimensional inverse problems. By combining efficiency, flexibility, and full differentiability, JAX-PF offers a fast, practical, and integrated tool for forward simulation and inverse design, advancing co-designing of material and manufacturing processes and supporting the goals of the Materials Genome Initiative.

preprint2024arXiv

FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for Data and Model Heterogeneity in Federated Learning

Recently, Heterogeneous Federated Learning (HtFL) has attracted attention due to its ability to support heterogeneous models and data. To reduce the high communication cost of transmitting model parameters, a major challenge in HtFL, prototype-based HtFL methods are proposed to solely share class representatives, a.k.a, prototypes, among heterogeneous clients while maintaining the privacy of clients' models. However, these prototypes are naively aggregated into global prototypes on the server using weighted averaging, resulting in suboptimal global knowledge which negatively impacts the performance of clients. To overcome this challenge, we introduce a novel HtFL approach called FedTGP, which leverages our Adaptive-margin-enhanced Contrastive Learning (ACL) to learn Trainable Global Prototypes (TGP) on the server. By incorporating ACL, our approach enhances prototype separability while preserving semantic meaning. Extensive experiments with twelve heterogeneous models demonstrate that our FedTGP surpasses state-of-the-art methods by up to 9.08% in accuracy while maintaining the communication and privacy advantages of prototype-based HtFL. Our code is available at https://github.com/TsingZ0/FedTGP.

preprint2023arXiv

Boosting Pruned Networks with Linear Over-parameterization

Structured pruning compresses neural networks by reducing channels (filters) for fast inference and low footprint at run-time. To restore accuracy after pruning, fine-tuning is usually applied to pruned networks. However, too few remaining parameters in pruned networks inevitably bring a great challenge to fine-tuning to restore accuracy. To address this challenge, we propose a novel method that first linearly over-parameterizes the compact layers in pruned networks to enlarge the number of fine-tuning parameters and then re-parameterizes them to the original layers after fine-tuning. Specifically, we equivalently expand the convolution/linear layer with several consecutive convolution/linear layers that do not alter the current output feature maps. Furthermore, we utilize similarity-preserving knowledge distillation that encourages the over-parameterized block to learn the immediate data-to-data similarities of the corresponding dense layer to maintain its feature learning ability. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet which significantly outperforms the vanilla fine-tuning strategy, especially for large pruning ratio.

preprint2023arXiv

Hybrid thermal modeling of additive manufacturing processes using physics-informed neural networks for temperature prediction and parameter identification

Understanding the thermal behavior of additive manufacturing (AM) processes is crucial for enhancing the quality control and enabling customized process design. Most purely physics-based computational models suffer from intensive computational costs and the need of calibrating unknown parameters, thus not suitable for online control and iterative design application. Data-driven models taking advantage of the latest developed computational tools can serve as a more efficient surrogate, but they are usually trained over a large amount of simulation data and often fail to effectively use small but high-quality experimental data. In this work, we developed a hybrid physics-based data-driven thermal modeling approach of AM processes using physics-informed neural networks. Specifically, partially observed temperature data measured from an infrared camera is combined with the physics laws to predict full-field temperature history and to discover unknown material and process parameters. In the numerical and experimental examples, the effectiveness of adding auxiliary training data and using the pretrained model on training efficiency and prediction accuracy, as well as the ability to identify unknown parameters with partially observed data, are demonstrated. The results show that the hybrid thermal model can effectively identify unknown parameters and capture the full-field temperature accurately, and thus it has the potential to be used in iterative process design and real-time process control of AM.

preprint2022arXiv

Evaluation of the systematic shifts of a ${}^{40}\textrm{Ca}^+-{}^{27}\textrm{Al}^+$ optical clock

Quantum-logic-based ${}^{27}\textrm{Al}^+$ optical clock has been demonstrated in several schemes as there are different choices of the auxiliary ion species. In this paper, we present the first detailed evaluation of the systematic shift and the total uncertainty of an ${}^{27}\textrm{Al}^+$ optical clock sympathetically cooled by a ${}^{40}\textrm{Ca}^+$ ion. The total systematic uncertainty of the ${}^{40}\textrm{Ca}^+ - {}^{27}\textrm{Al}^+$ quantum logic clock has been estimated to be $7.9 \times 10^{-18}$, which was mainly limited by the uncertainty of the quadratic Zeeman shift. By comparing the frequency of two counter-propagating clock beams on the same ion, we measured the frequency stability to be $3.7 \times 10^{-14} /\sqrtτ$.

preprint2022arXiv

Locally anisotropic covariance functions on the sphere

Rapid developments in satellite remote-sensing technology have enabled the collection of geospatial data on a global scale, hence increasing the need for covariance functions that can capture spatial dependence on spherical domains. We propose a general method of constructing nonstationary, locally anisotropic covariance functions on the sphere based on covariance functions in R^3. We also provide theorems that specify the conditions under which the resulting correlation function is isotropic or axially symmetric. For large datasets on the sphere commonly seen in modern applications, the Vecchia approximation is used to achieve higher scalability on statistical inference. The importance of flexible covariance structures is demonstrated numerically using simulated data and a precipitation dataset.

preprint2022arXiv

Scalable computation of predictive probabilities in probit models with Gaussian process priors

Predictive models for binary data are fundamental in various fields, and the growing complexity of modern applications has motivated several flexible specifications for modeling the relationship between the observed predictors and the binary responses. A widely-implemented solution is to express the probability parameter via a probit mapping of a Gaussian process indexed by predictors. However, unlike for continuous settings, there is a lack of closed-form results for predictive distributions in binary models with Gaussian process priors. Markov chain Monte Carlo methods and approximation strategies provide common solutions to this problem, but state-of-the-art algorithms are either computationally intractable or inaccurate in moderate-to-high dimensions. In this article, we aim to cover this gap by deriving closed-form expressions for the predictive probabilities in probit Gaussian processes that rely either on cumulative distribution functions of multivariate Gaussians or on functionals of multivariate truncated normals. To evaluate these quantities we develop novel scalable solutions based on tile-low-rank Monte Carlo methods for computing multivariate Gaussian probabilities, and on mean-field variational approximations of multivariate truncated normals. Closed-form expressions for the marginal likelihood and for the posterior distribution of the Gaussian process are also discussed. As shown in simulated and real-world empirical studies, the proposed methods scale to dimensions where state-of-the-art solutions are impractical.

preprint2021arXiv

A note on fractional Askey--Wilson integrals

In this paper, we generalize fractional $q$-integrals by the method of $q$-difference equation. In addition, we deduce fractional Askey--Wilson integral, reversal type fractional Askey--Wilson integral and Ramanujan type fractional Askey--Wilson integral.

preprint2021arXiv

Dynamic Social Media Monitoring for Fast-Evolving Online Discussions

Tracking and collecting fast-evolving online discussions provides vast data for studying social media usage and its role in people's public lives. However, collecting social media data using a static set of keywords fails to satisfy the growing need to monitor dynamic conversations and to study fast-changing topics. We propose a dynamic keyword search method to maximize the coverage of relevant information in fast-evolving online discussions. The method uses word embedding models to represent the semantic relations between keywords and predictive models to forecast the future time series. We also implement a visual user interface to aid in the decision-making process in each round of keyword updates. This allows for both human-assisted tracking and fully-automated data collection. In simulations using historical #MeToo data in 2017, our human-assisted tracking method outperforms the traditional static baseline method significantly, with 37.1% higher F-1 score than traditional static monitors in tracking the top trending keywords. We conduct a contemporary case study to cover dynamic conversations about the recent Presidential Inauguration and to test the dynamic data collection system. Our case studies reflect the effectiveness of our process and also points to the potential challenges in future deployment.

preprint2021arXiv

MetaFEM: A Generic FEM Solver By Meta-expressions

Current multi-physics Finite Element Method (FEM) solvers are complex systems in terms of both their mathematical complexity and lines of code. This paper proposes a skeleton generic FEM solver, named MetaFEM, in total about 5,000 lines of Julia code, which translates generic input Partial Differential Equation (PDE) weak forms into corresponding GPU-accelerated simulations with a grammar similar to FEniCS or FreeFEM. Two novel approaches differentiate MetaFEM from the common solvers: (1) the FEM kernel is based on an original theory/algorithm which explicitly processes meta-expressions, as the name suggests, and (2) the symbolic engine is a rule-based Computer Algebra System (CAS), i.e., the equations are rewritten/derived according to a set of rewriting rules instead of going through completely fixed routines, supporting easy customization by developers. Example cases in thermal conduction, linear elasticity and incompressible flow are presented to demonstrate utility.

preprint2016arXiv

A transportable 40Ca+ single-ion clock with $7.7\times 10^{-17}$ systematic uncertainty

A transportable optical clock refer to the $4s^2S_{1/2}-3d^2D_{5/2}$ electric quadrupole transition at 729 nm of single $^{40}Ca^+$ trapped in mini Paul trap has been developed. The physical system of $^{40}Ca^+$ optical clock is re-engineered from a bulky and complex setup to an integration of two subsystems: a compact single ion unit including ion trapping and detection modules, and a compact laser unit including laser sources, beam distributor and frequency reference modules. Apart from the electronics, the whole equipment has been constructed within a volume of 0.54 $m^3$. The systematic fractional uncertainty has been evaluated to be $7.7\times 10^{-17}$, and the Allan deviation fits to be $2.3\times {10}^{-14}/\sqrtτ$ by clock self-comparison with a probe pulse time 20 ms.

preprint2016arXiv

Evaluation of blackbody radiation shift with temperature associated fractional uncertainty at 10E-18 level for 40Ca+ ion optical clock

In this paper, blackbody radiation (BBR) temperature rise seen by the $^{40}$Ca$^+$ ion confined in a miniature Paul trap and its uncertainty have been evaluated via finite-element method (FEM) modelling. The FEM model was validated by comparing with thermal camera measurements, which were calibrated by PT1000 resistance thermometer, at several points on a dummy trap. The input modelling parameters were analyzed carefully in detail, and their contributions to the uncertainty of environment temperature were evaluated on the validated FEM model. The result shows that the temperature rise seen by $^{40}$Ca$^+$ ion is 1.72 K with an uncertainty of 0.46 K. It results in a contribution of 2.2 mHz to the systematic uncertainty of $^{40}$Ca$^+$ ion optical clock, corresponding to a fractional uncertainty 5.4$\times$10$^{-18}$. This is much smaller than the uncertainty caused by the BBR shift coefficient, which is evaluated to be 4.8 mHz and at 10$^{-17}$ level in fractional frequency units.

preprint2015arXiv

Towards a Decoupled Context-Oriented Programming Language for the Internet of Things

Easily programming behaviors is one major issue of a large and reconfigurable deployment in the Internet of Things. Such kind of devices often requires to externalize part of their behavior such as the sensing, the data aggregation or the code offloading. Most existing context-oriented programming languages integrate in the same class or close layers the whole behavior. We propose to abstract and separate the context tracking from the decision process, and to use event-based handlers to interconnect them. We keep a very easy declarative and non-layered programming model. We illustrate by defining an extension to Golo-a JVM-based dynamic language.

preprint2012arXiv

Two sharp inequalities for bounding the Seiffert mean by the arithmetic, centroidal, and contra-harmonic means

In the paper, the authors find the best possible constants appeared in two inequalities for bounding the Seiffert mean by the linear combinations of the arithmetic, centroidal, and contra-harmonic means.

preprint2006arXiv

Asymptotically Optimal Multiple-access Communication via Distributed Rate Splitting

We consider the multiple-access communication problem in a distributed setting for both the additive white Gaussian noise channel and the discrete memoryless channel. We propose a scheme called Distributed Rate Splitting to achieve the optimal rates allowed by information theory in a distributed manner. In this scheme, each real user creates a number of virtual users via a power/rate splitting mechanism in the M-user Gaussian channel or via a random switching mechanism in the M-user discrete memoryless channel. At the receiver, all virtual users are successively decoded. Compared with other multiple-access techniques, Distributed Rate Splitting can be implemented with lower complexity and less coordination. Furthermore, in a symmetric setting, we show that the rate tuple achieved by this scheme converges to the maximum equal rate point allowed by the information-theoretic bound as the number of virtual users per real user tends to infinity. When the capacity regions are asymmetric, we show that a point on the dominant face can be achieved asymptotically. Finally, when there is an unequal number of virtual users per real user, we show that differential user rate requirements can be accommodated in a distributed fashion.

Jian Cao

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

AP2O-Coder: Adaptively Progressive Preference Optimization for Reducing Compilation and Runtime Errors in LLM-Generated Code

GAPO: Robust Advantage Estimation for Real-World Code LLMs

Machine Learning-Driven Creep Law Discovery Across Alloy Compositional Space

Permutation-preserving Functions and Neural Vecchia Covariance Kernels

STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models

Efficient GPU-computing simulation platform JAX-PF for differentiable phase field model

FedTGP: Trainable Global Prototypes with Adaptive-Margin-Enhanced Contrastive Learning for Data and Model Heterogeneity in Federated Learning

Boosting Pruned Networks with Linear Over-parameterization

Hybrid thermal modeling of additive manufacturing processes using physics-informed neural networks for temperature prediction and parameter identification

Evaluation of the systematic shifts of a ${}^{40}\textrm{Ca}^+-{}^{27}\textrm{Al}^+$ optical clock

Locally anisotropic covariance functions on the sphere

Scalable computation of predictive probabilities in probit models with Gaussian process priors

A note on fractional Askey--Wilson integrals

Dynamic Social Media Monitoring for Fast-Evolving Online Discussions

MetaFEM: A Generic FEM Solver By Meta-expressions

A transportable 40Ca+ single-ion clock with $7.7\times 10^{-17}$ systematic uncertainty

Evaluation of blackbody radiation shift with temperature associated fractional uncertainty at 10E-18 level for 40Ca+ ion optical clock

Towards a Decoupled Context-Oriented Programming Language for the Internet of Things

Two sharp inequalities for bounding the Seiffert mean by the arithmetic, centroidal, and contra-harmonic means

Asymptotically Optimal Multiple-access Communication via Distributed Rate Splitting