Source author record

Yufei Zhang

Yufei Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Artificial Intelligence math.NA Numerical Analysis physics.flu-dyn astro-ph.CO Computation and Language Computer Vision Distributed, Parallel, and Cluster Computing hep-ex hep-ph math.GM

Catalog footprint

What is connected

17works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents

Memory enables Large Language Model (LLM) agents to perceive, store, and use information from past dialogues, which is essential for personalization. However, existing methods fail to properly model the temporal dimension of memory in two aspects: 1) Temporal inaccuracy: memories are organized by dialogue time rather than their actual occurrence time; 2) Temporal fragmentation: existing methods focus on point-wise memory, losing durative information that captures persistent states and evolving patterns. To address these limitations, we propose Temporal Semantic Memory (TSM), a memory framework that models semantic time for point-wise memory and supports the construction and utilization of durative memory. During memory construction, it first builds a semantic timeline rather than a dialogue one. Then, it consolidates temporally continuous and semantically related information into a durative memory. During memory utilization, it incorporates the query's temporal intent on the semantic timeline, enabling the retrieval of temporally appropriate durative memories and providing time-valid, duration-consistent context to support response generation. Experiments on LongMemEval and LoCoMo show that TSM consistently outperforms existing methods and achieves up to 12.2% absolute improvement in accuracy, demonstrating the effectiveness of the proposed method.

preprint2026arXiv

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

Reinforcement learning (RL) has become a critical paradigm for LLM post-training, yet the rollout phase -- accounting for 50--80% of total step time -- is bottlenecked by skewed generation: long-tailed trajectories indispensable for model performance block the entire training pipeline. Asynchronous training offers a natural remedy by overlapping generation with training, but introduces a fundamental tension between efficiency and algorithmic correctness. We identify three constraints in asynchronous training to preserve convergence: intra-trajectory policy consistency, data integrity, and bounded staleness. Existing approaches fail to intrinsically address the long-tailed trajectory problem, which is further exacerbated by the imbalance characteristic of Mix-of-Experts models, or deviate from the standard RL training formulation, thereby hindering model convergence. Therefore, we propose DORA (Dynamic ORchestration for Asynchronous Rollout), which addresses this challenge through algorithm-system co-design. DORA introduces multi-version streaming rollout, a novel asynchronous paradigm that maintains multiple policy versions concurrently -- simultaneously achieving full bubble elimination without compromising algorithmic constraints. Experimental results demonstrate that our DORA system achieves substantial improvements in throughput -- up to 2--3 times higher than state-of-the-art systems on open-source benchmarks -- without compromising convergence. Furthermore, in large-scale industrial applications with tens of thousands of accelerators, DORA accelerates RL training by 2--4 times compared to synchronous training across various scenarios. The resultant open-source models, LongCat-Flash-Thinking, exhibit competitive performance on complex reasoning benchmarks, matching the capability of most advanced LLMs.

preprint2026arXiv

FormalASR: End-to-End Spoken Chinese to Formal Text

Automatic speech recognition (ASR) systems are typically optimized for verbatim transcription, which preserves disfluencies, filler words, and informal spoken structures that are often unsuitable for downstream writing-oriented applications. A common workaround is a two-stage ASR+LLM pipeline for post-editing, but this design increases latency and memory cost and is difficult to deploy on-device. We present FormalASR, two compact end-to-end models (0.6B and 1.7B) that directly transcribe spoken Chinese into formal written text. To enable this setting, we build WenetSpeech-Formal and Speechio-Formal, two large-scale spoken-to-formal datasets constructed by LLM-based rewriting and quality filtering. We then fine-tune Qwen3-ASR at two scales (0.6B and 1.7B) with supervised fine-tuning. Experiments on WenetSpeech-Formal and Speechio-Formal show that FormalASR achieves up to 37.4% relative CER reduction over verbatim baselines, while also improving ROUGE-L and BERTScore. FormalASR requires no post-processing LLM at deployment time, providing a lightweight, on-device solution for spoken-to-formal transcription.

preprint2026arXiv

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

Recent advances in large reasoning models LRMs have enabled agentic search systems to perform complex multi-step reasoning across multiple sources. However, most studies focus on general information retrieval and rarely explores vertical domains with unique challenges. In this work, we focus on local life services and introduce LocalSearchBench, which encompass diverse and complex business scenarios. Real-world queries in this domain are often ambiguous and require multi-hop reasoning across merchants and products, remaining challenging and not fully addressed. As the first comprehensive benchmark for agentic search in local life services, LocalSearchBench comprises a database of over 1.3M merchant entries across 6 service categories and 9 major cities, and 900 multi-hop QA tasks from real user queries that require multi-step reasoning. We also developed LocalPlayground, a unified environment integrating multiple tools for LRMs interaction. Experiments show that even state-of-the-art LRMs struggle on LocalSearchBench: the best model (DeepSeek-V3.2) achieves only 35.60% correctness, and most models have issues with completeness (average 60.32%) and faithfulness (average 30.72%). This highlights the need for specialized benchmarks and domain-specific agent training in local life services. Code, Benchmark, and Leaderboard are available at https://localsearchbench.github.io/.

preprint2022arXiv

Data augmented turbulence modeling for three-dimensional separation flows

Field inversion and machine learning are implemented in this study to describe three-dimensional separation flow around an axisymmetric hill and augment the Spart-Allmaras model. The discrete adjoint method is used to solve the field inversion problem, and an artificial neural network is used as the machine learning model. A validation process for field inversion is proposed to adjust the hyperparameters and obtain a physically acceptable solution. The field inversion result shows that the non-equilibrium turbulence effects in the boundary layer upstream of the mean separation line and in the separating shear layer dominate the flow structure in the 3-D separating flow, which agrees with prior physical knowledge. However, the effect of turbulence anisotropy on the mean flow appears to be limited. Two approaches are proposed and implemented in the machine learning stage to overcome the problem of sample imbalance while reducing the computational cost during training. The results are all satisfactory, which proves the effectiveness of the proposed approaches.

preprint2022arXiv

Deeply Learned Preselection of Higgs Dijet Decays at Future Lepton Colliders

Future electron-positron colliders will play a leading role in the precision measurement of Higgs boson couplings which is one of the central interests in particle physics. Aiming at maximizing the performance to measure the Higgs couplings to the bottom, charm and strange quarks, we develop machine learning methods to improve the selection of events with a Higgs decaying to dijets. Our methods are based on the Boosted Decision Tree (BDT), Fully-Connected Neural Network (FCNN) and Convolutional Neural Network (CNN). We find that the BDT and FCNN-based algorithms outperform the conventional cut-based method. With our improved selection of Higgs decaying to dijet events using the FCNN, the charm quark signal strength is measured with a $16\%$ error, which is roughly a factor of two better than the $34\%$ precision obtained by the cut-based analysis. Also, the strange quark signal strength is constrained as $μ_{ss} \lesssim 35$ at the $95\%$ C.L. with the FCNN, which is to be compared with $μ_{ss} \lesssim 70$ obtained by the cut-based method.

preprint2022arXiv

Logarithmic regret for episodic continuous-time linear-quadratic reinforcement learning over a finite-time horizon

We study finite-time horizon continuous-time linear-quadratic reinforcement learning problems in an episodic setting, where both the state and control coefficients are unknown to the controller. We first propose a least-squares algorithm based on continuous-time observations and controls, and establish a logarithmic regret bound of order $O((\ln M)(\ln\ln M))$, with $M$ being the number of learning episodes. The analysis consists of two parts: perturbation analysis, which exploits the regularity and robustness of the associated Riccati differential equation; and parameter estimation error, which relies on sub-exponential properties of continuous-time least-squares estimators. We further propose a practically implementable least-squares algorithm based on discrete-time observations and piecewise constant controls, which achieves similar logarithmic regret with an additional term depending explicitly on the time stepsizes used in the algorithm.

preprint2022arXiv

Model Pruning Based on Quantified Similarity of Feature Maps

Convolutional Neural Networks (CNNs) has been applied in numerous Internet of Things (IoT) devices for multifarious downstream tasks. However, with the increasing amount of data on edge devices, CNNs can hardly complete some tasks in time with limited computing and storage resources. Recently, filter pruning has been regarded as an effective technique to compress and accelerate CNNs, but existing methods rarely prune CNNs from the perspective of compressing high-dimensional tensors. In this paper, we propose a novel theory to find redundant information in three-dimensional tensors, namely Quantified Similarity between Feature Maps (QSFM), and utilize this theory to guide the filter pruning procedure. We perform QSFM on datasets (CIFAR-10, CIFAR-100 and ILSVRC-12) and edge devices, demonstrate that the proposed method can find the redundant information in the neural networks effectively with comparable compression and tolerable drop of accuracy. Without any fine-tuning operation, QSFM can compress ResNet-56 on CIFAR-10 significantly (48.7% FLOPs and 57.9% parameters are reduced) with only a loss of 0.54% in the top-1 accuracy. For the practical application of edge devices, QSFM can accelerate MobileNet-V2 inference speed by 1.53 times with only a loss of 1.23% in the ILSVRC-12 top-1 accuracy.

preprint2022arXiv

Physically Interpretable Feature Learning and Inverse Design of Supercritical Airfoils

Machine-learning models have demonstrated a great ability to learn complex patterns and make predictions. In high-dimensional nonlinear problems of fluid dynamics, data representation often greatly affects the performance and interpretability of machine learning algorithms. With the increasing application of machine learning in fluid dynamics studies, the need for physically explainable models continues to grow. This paper proposes a feature learning algorithm based on variational autoencoders, which is able to assign physical features to some latent variables of the variational autoencoder. In addition, it is theoretically proved that the remaining latent variables are independent of the physical features. The proposed algorithm is trained to include shock wave features in its latent variables for the reconstruction of supercritical pressure distributions. The reconstruction accuracy and physical interpretability are also compared with those of other variational autoencoders. Then, the proposed algorithm is used for the inverse design of supercritical airfoils, which enables the generation of airfoil geometries based on physical features rather than the complete pressure distributions. It also demonstrates the ability to manipulate certain pressure distribution features of the airfoil without changing the others.

preprint2022arXiv

Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls

We study finite-time horizon continuous-time linear-convex reinforcement learning problems in an episodic setting. In this problem, the unknown linear jump-diffusion process is controlled subject to nonsmooth convex costs. We show that the associated linear-convex control problems admit Lipchitz continuous optimal feedback controls and further prove the Lipschitz stability of the feedback controls, i.e., the performance gap between applying feedback controls for an incorrect model and for the true model depends Lipschitz-continuously on the magnitude of perturbations in the model coefficients; the proof relies on a stability analysis of the associated forward-backward stochastic differential equation. We then propose a novel least-squares algorithm which achieves a regret of the order $O(\sqrt{N\ln N})$ on linear-convex learning problems with jumps, where $N$ is the number of learning episodes; the analysis leverages the Lipschitz stability of feedback controls and concentration properties of sub-Weibull random variables. Numerical experiment confirms the convergence and the robustness of the proposed algorithm.

preprint2022arXiv

SolarGAN: Synthetic Annual Solar Irradiance Time Series on Urban Building Facades via Deep Generative Networks

Building Integrated Photovoltaics (BIPV) is a promising technology to decarbonize urban energy systems via harnessing solar energy available on building envelopes. While methods to assess solar irradiation, especially on rooftops, are well established, the assessment on building facades usually involves a higher effort due to more complex urban features and obstructions. The drawback of existing physics-based simulation programs is that they require significant manual modelling effort and computing time for generating time resolved deterministic results. Yet, solar irradiation is highly intermittent and representing its inherent uncertainty may be required for designing robust BIPV energy systems. Targeting on these drawbacks, this paper proposes a data-driven model based on Deep Generative Networks (DGN) to efficiently generate high-fidelity stochastic ensembles of annual hourly solar irradiance time series on building facades with uncompromised spatiotemporal resolution at the urban scale. The only input required is easily obtainable, simple fisheye images as categorical shading masks captured from 3D models. In principle, even actual photographs of urban contexts can be utilized, given they are semantically segmented. Our validations exemplify the high fidelity of the generated time series when compared to the physics-based simulator. To demonstrate the model's relevance for urban energy planning, we showcase its potential for generative design by parametrically altering characteristic features of the urban environment and producing corresponding time series on building facades under different climatic contexts in real-time.

preprint2021arXiv

Data-driven turbulence modeling in separated flows considering physical mechanism analysis

Accurate simulation of turbulent flow with separation is an important but challenging problem. In this paper, a data-driven Reynolds-averaged turbulence modeling approach, field inversion and machine learning is implemented to modify the Spalart-Allmaras model separately on three cases, namely, the S809 airfoil, a periodic hill and the GLC305 airfoil with ice shape 944. Field inversion based on a discrete adjoint method is used to quantify the model-form uncertainty with limited experimental data. An artificial neural network is trained to predict the model corrections with local flow features to extract generalized modeling knowledge. Physical knowledge of the nonequilibrium turbulence in the separating shear layer is considered when setting the prior model uncertainty. The results show that the model corrections from the field inversion demonstrate strong consistency with the underlying physical mechanism of nonequilibrium turbulence. The quantity of interest from the observation data can be reproduced with relatively high accuracy by the augmented model. In addition, the validation in similar flow conditions shows a certain extent of generalization ability.

preprint2021arXiv

Geometrical constraints on curvature from galaxy-lensing cross-correlations

Accurate constraints on curvature provide a powerful probe of inflation. However, curvature constraints based on specific assumptions of dark energy may lead to unreliable conclusions when used to test inflation models. To avoid this, it is important to obtain constraints that are independent on assumptions for dark energy. In this paper, we investigate such constraints on curvature from the geometrical probe constructed from galaxy-lensing cross-correlations. We study comprehensively the cross-correlations of galaxy with magnification, measured from type Ia supernovae's brightnesses ("$gκ^{\rm SN}$"), with shear ("$gκ^{\rm g}$"), and with CMB lensing ("$gκ^{\rm CMB}$"). We find for the LSST and Stage IV CMB surveys, "$gκ^{\rm SN}$" , "$gκ^{\rm g}$" and "$gκ^{\rm CMB}$" can be detected with signal-to-noise ratio $S/N=104,\ 2291,\ 1842$ respectively. When combined with supernovae Hubble diagram ("SN") to constrain curvature, we find galaxy-lensing cross-correlation becomes increasingly important with more degrees of freedom allowed in dark energy. Without any priors, we obtain error on $Ω_K$ of $0.723$ from "SN + $gκ^{\rm SN}$", $0.0417$ from "SN + $gκ^{\rm g}$", and $0.04$ from "SN + $gκ^{\rm g}$ + $gκ^{\rm CMB}$" for the LSST and Stage IV CMB surveys. The last one is more competitive than a Stage IV BAO survey ("BAO"). When galaxy-lensing cross-correlations are added to the combined probe of "SN + BAO + CMB", where "CMB" stands for Planck measurement for the CMB acoustic scale, we obtain constraint on $Ω_K$ of $0.0013$, which is a factor of 7 improvement from "SN + BAO + CMB". We study improvements in these results from increasing the high redshift extension of supernovae.

preprint2020arXiv

A neural network based policy iteration algorithm with global $H^2$-superlinear convergence for stochastic games on domains

In this work, we propose a class of numerical schemes for solving semilinear Hamilton-Jacobi-Bellman-Isaacs (HJBI) boundary value problems which arise naturally from exit time problems of diffusion processes with controlled drift. We exploit policy iteration to reduce the semilinear problem into a sequence of linear Dirichlet problems, which are subsequently approximated by a multilayer feedforward neural network ansatz. We establish that the numerical solutions converge globally in the $H^2$-norm, and further demonstrate that this convergence is superlinear, by interpreting the algorithm as an inexact Newton iteration for the HJBI equation. Moreover, we construct the optimal feedback controls from the numerical value functions and deduce convergence. The numerical schemes and convergence results are then extended to HJBI boundary value problems corresponding to controlled diffusion processes with oblique boundary reflection. Numerical experiments on the stochastic Zermelo navigation problem are presented to illustrate the theoretical results and to demonstrate the effectiveness of the method.

preprint2020arXiv

Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems

This paper proposes penalty schemes for a class of weakly coupled systems of Hamilton-Jacobi-Bellman quasi-variational inequalities (HJBQVIs) arising from stochastic hybrid control problems of regime-switching models with both continuous and impulse controls. We show that the solutions of the penalized equations converge monotonically to those of the HJBQVIs. We further establish that the schemes are half-order accurate for HJBQVIs with Lipschitz coefficients, and first-order accurate for equations with more regular coefficients. Moreover, we construct the action regions and optimal impulse controls based on the error estimates and the penalized solutions. The penalty schemes and convergence results are then extended to HJBQVIs with possibly negative impulse costs. We also demonstrate the convergence of monotone discretizations of the penalized equations, and establish that policy iteration applied to the discrete equation is monotonically convergent with an arbitrary initial guess in an infinite dimensional setting. Numerical examples for infinite-horizon optimal switching problems are presented to illustrate the effectiveness of the penalty schemes over the conventional direct control scheme.

preprint2020arXiv

Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems

In this paper, we establish that for a wide class of controlled stochastic differential equations (SDEs) with stiff coefficients, the value functions of corresponding zero-sum games can be represented by a deep artificial neural network (DNN), whose complexity grows at most polynomially in both the dimension of the state equation and the reciprocal of the required accuracy. Such nonlinear stiff systems may arise, for example, from Galerkin approximations of controlled stochastic partial differential equations (SPDEs), or controlled PDEs with uncertain initial conditions and source terms. This implies that DNNs can break the curse of dimensionality in numerical approximations and optimal control of PDEs and SPDEs. The main ingredient of our proof is to construct a suitable discrete-time system to effectively approximate the evolution of the underlying stochastic dynamics. Similar ideas can also be applied to obtain expression rates of DNNs for value functions induced by stiff systems with regime switching coefficients and driven by general Lévy noise.

preprint2020arXiv

Total Difference Chromatic Numbers of Graphs

Inspired by graceful labelings and total labelings of graphs, we introduce the idea of total difference labelings. A $k$-total labeling of a graph $G$ is an assignment of $k$ distinct labels to the edges and vertices of a graph so that adjacent vertices, incident edges, and an edge and its incident vertices receive different labels. A $k$-total difference labeling of a graph $G$ is a function $f$ from the set of edges and vertices of $G$ to the set $\{1,2,\ldots,k\}$, that is a $k$-total labeling of $G$ and for which $f(\{u,v\})=|f(u)-f(v)|$ for any two adjacent vertices $u$ and $v$ of $G$ with incident edge $\{u,v\}$. The least positive integer $k$ for which $G$ has a $k$-total difference labeling is its total difference chromatic number, $χ_{td}(G)$. We determine the total difference chromatic number of paths, cycles, stars, wheels, gears and helms. We also provide bounds for total difference chromatic numbers of caterpillars, lobsters, and general trees.

Yufei Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training

FormalASR: End-to-End Spoken Chinese to Formal Text

LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services

Data augmented turbulence modeling for three-dimensional separation flows

Deeply Learned Preselection of Higgs Dijet Decays at Future Lepton Colliders

Logarithmic regret for episodic continuous-time linear-quadratic reinforcement learning over a finite-time horizon

Model Pruning Based on Quantified Similarity of Feature Maps

Physically Interpretable Feature Learning and Inverse Design of Supercritical Airfoils

Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls

SolarGAN: Synthetic Annual Solar Irradiance Time Series on Urban Building Facades via Deep Generative Networks

Data-driven turbulence modeling in separated flows considering physical mechanism analysis

Geometrical constraints on curvature from galaxy-lensing cross-correlations

A neural network based policy iteration algorithm with global $H^2$-superlinear convergence for stochastic games on domains

Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems

Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems

Total Difference Chromatic Numbers of Graphs