Researcher profile

Yu-Feng Li

Yu-Feng Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

Activation Compression in LLMs: Theoretical Analysis and Efficient Algorithm

Training large language models (LLMs) is highly memory-intensive, as training must store not only weights and optimizer states but also intermediate activations for backpropagation. While existing memory-efficient methods largely focus on gradients and optimizer states, activation compression is less well established due to the lack of LLM-tailored theory and guarantees. In this work, we develop a theoretical framework showing that activation compression is safe for linear operators when activation compression is unbiased, but problematic for nonlinear ones. We further derive gradient variance bound and establish convergence guarantees for applying activation compression to all linear operators under the standard $L$-smoothness assumption, showing that it does not change the convergence rate. Guided by the theory, we propose an activation-gradient co-compression method that reuses low-rank activation factors to compress linear-layer gradients without extra computation or additional gradient error. We conduct extensive experiments on Qwen and LLaMA models using a pretraining benchmark and multiple fine-tuning benchmarks to validate our theory and demonstrate competitive performance of our method in both accuracy and compression efficiency. We provide our code in the supplementary material for reproducibility.

preprint2026arXiv

Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks

Foundation model-driven agents often struggle with long-horizon planning due to the transient nature of purely prompting-based reasoning. While existing skill induction methods mitigate this by distilling experience into state-blind parameterized scripts, they fail to capture the conditional logic required for robust execution in dynamic environments. In this paper, we propose Neuro-Symbolic Skill Induction (NSI), a framework that lifts interaction traces into modular, \textit{logic-grounded} programs. By synthesizing explicit control flows and dynamic variable binding, NSI empowers agents to discover \textit{when} and \textit{why} to act. This paradigm enables the efficient generalization, allowing agents to induce skills from few-shot examples and flexibly adapt to unseen goals. Experiments on a series of agentic tasks demonstrate that NSI consistently outperforms state-of-the-art baselines, empowering agents to self-evolve into architects of logic-grounded skills.

preprint2026arXiv

Programmatic Context Augmentation for LLM-based Symbolic Regression

Symbolic regression (SR), the task of discovering mathematical expressions that best describe a given dataset, remains a fundamental challenge in scientific discovery. Traditional approaches, primarily based on genetic algorithms and related evolutionary methods, have proven useful but suffer from scalability and expressivity limitations. Recently, large language model (LLM)-based evolutionary search methods have been introduced into SR and show promise. However, existing LLM-based approaches typically rely on scalar evaluation metrics, such as mean squared error, as the sole source of feedback during the search process, thereby overlooking the rich information embedded in the dataset. To address this limitation, we propose a novel LLM-based evolutionary search framework that incorporates programmatic context augmentation. By enabling code-based interactions with the dataset, our method can actively perform data analysis and extract informative signals, beyond aggregated evaluation scores. We evaluate our framework on advanced benchmarks, such as LLM-SRBench, and demonstrate superior efficiency and accuracy compared to strong baselines.

preprint2026arXiv

Revisiting the Travel Planning Capabilities of Large Language Models

Travel planning serves as a critical task for long-horizon reasoning, exposing significant deficits in LLMs. However, existing benchmarks and evaluations primarily assess final plans in an end-to-end manner, which lacks interpretability and makes it difficult to analyze the root causes of failures. To bridge this gap, we decompose travel planning into five constituent atomic sub-capabilities, including \emph{Constraint Extraction}, \emph{Tool Use}, \emph{Plan Generation}, \emph{Error Identification}, and \emph{Error Correction}. We implement a decoupled evaluation protocol leveraging oracle intermediate contexts to rigorously isolate these components, thereby measuring the atomic performance boundary without the noise of cascading errors. Our results highlight a clear contrast in performance: while LLMs are proficient in extracting explicit constraints, they struggle to infer implicit, open-world requirements. Furthermore, they exhibit structural biases in plan generation and suffer from ineffective self-correction, characterized by excessive sensitivity and erroneous persistence. These findings offer precise directions for improving LLM reasoning and planning abilities.

preprint2026arXiv

VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning

Multi-model learning has attracted great attention in visual-text tasks. However, visual-tabular data, which plays a pivotal role in high-stakes domains like healthcare and industry, remains underexplored. In this paper, we introduce \textit{VT-Bench}, the first unified benchmark for standardizing vision-tabular discriminative prediction and generative reasoning tasks. VT-Bench aggregates 14 datasets across 9 domains (medical-centric, while covering pets, media, and transportation) with over 756K samples. We evaluate 23 representative models, including unimodal experts, specialized visual-tabular models, general-purpose vision-language models (VLMs), and tool-augmented methods, highlighting substantial challenges of visual-tabular learning. We believe VT-Bench will stimulate the community to build more powerful multi-modal vision-tabular foundation models. Benchmark: https://github.com/Ziyi-Jia990/VT-Bench

preprint2022arXiv

Constraining Light Mediators via Detection of Coherent Elastic Solar Neutrino Nucleus Scattering

Dark matter (DM) direct detection experiments are entering the multiple-ton era and will be sensitive to the coherent elastic neutrino nucleus scattering (CE$ν$NS) of solar neutrinos, enabling the possibility to explore contributions from new physics with light mediators at the low energy range. In this paper we consider light mediator models (scalar, vector and axial vector) and the corresponding contributions to the solar neutrino CE$ν$NS process. Motivated by the current status of new generation of DM direct detection experiments and the future plan, we study the sensitivity of light mediators in DM direct detection experiments of different nuclear targets and detector techniques. The constraints from the latest $^8$B solar neutrino measurements of XENON-1T are also derived. Finally, We show that the solar neutrino CE$ν$NS process can provide stringent limitation on the $ L_μ-L_τ $ model with the vector mediator mass below 100 MeV, covering the viable parameter space of the solution to the $ (g-2)_μ$ anomaly.

preprint2022arXiv

Detecting and Monitoring Tidal Dissipation of Hot Jupiters in the Era of SiTian

Transit Timing Variation (TTV) of hot Jupiters provides direct observational evidence of planet tidal dissipation. Detecting tidal dissipation through TTV needs high precision transit timings and long timing baselines. In this work, we predict and discuss the potential scientific contribution of SiTian Survey in detecting and analyzing exoplanet TTV. We develop a tidal dissipation detection pipeline for SiTian Survey that aims at time-domain astronomy with 72 1-meter optical telescopes. The pipeline includes the modules of light curve deblending, transit timing obtaining, and TTV modeling. SiTian is capable to detect more than 25,000 exoplanets among which we expect $\sim$50 sources showing evidence of tidal dissipation. We present detection and analysis of tidal dissipating targets, based on simulated SiTian light curves of XO-3b and WASP-161b. The transit light curve modeling gives consistent results within 1$σ$ to input values of simulated light curves. Also, the parameter uncertainties predicted by Monte-Carlo Markov Chain are consistent with the distribution obtained from simulating and modeling the light curve 1000 times. The timing precision of SiTian observations is $\sim$ 0.5 minutes with one transit visit. We show that differences between TTV origins, e.g., tidal dissipation, apsidal precession, multiple planets, would be significant, considering the timing precision and baseline. The detection rate of tidal dissipating hot Jupiters would answer a crucial question of whether the planet migrates at an early formation stage or random stages due to perturbations, e.g., planet scattering, secular interaction. SiTian identified targets would be constructive given that the sample would extend tenfold.

preprint2022arXiv

Model-Independent Determination of Isotopic Cross Sections per Fission for Reactor Antineutrinos

Model-independent reactor isotopic cross sections per fission are determined by global fits of the reactor antineutrino data from High-Enriched Uranium (HEU) reactor rates, Low-Enriched Uranium (LEU) reactor rates, and reactor fuel evolution data. Taking account of the implicit quasi-linear relationship between the fission fractions of $^{239}\rm{Pu}$ and $^{241}\rm{Pu}$ in the LEU reactor data, the Inverse-Beta-Decay (IBD) yields and their correlations of the fissionable isotopes $^{235}\rm{U}$, $^{238}\rm{U}$, and Pu's are obtained. The data-driven isotopic IBD yields provide an anomaly-free model for the reactor isotopic cross sections per fission, where better than 1\% accuracy of the expected reactor IBD yields can be achieved for future experiments.

preprint2022arXiv

Neutrinoless double beta decay in the minimal type-I seesaw model: How the enhancement or cancellation happens?

We discuss the contribution of right-handed neutrinos (RHNs) to the effective neutrino mass of the neutrinoless double beta decay within the minimal type-I seesaw model using the intrinsic seesaw relation of neutrino mass and mixing parameters and the relative mass dependence of the nuclear matrix elements. In the viable parameter space, we find the possibilities of both the enhancement and cancellation to the effective neutrino mass from RHNs. The bounds on the parameter space of the RHNs can be determined with the effective neutrino mass extracted from neutrinoless double beta decay experiments.

preprint2022arXiv

Prospects for the detection of the Diffuse Supernova Neutrino Background with the experiments SK-Gd and JUNO

The advent of gadolinium-loaded Super-Kamiokande (SK-Gd) and of the soon-to-start JUNO liquid scintillator detector marks a substantial improvement in the global sensitivity for the Diffuse Supernova Neutrino Background (DSNB). The present article reviews the detector properties most relevant for the DSNB searches in both experiments and estimates the expected signal and background levels. Based on these inputs, we evaluate the sensitivity of both experiments individually and combined. Using a simplified statistical approach, we find that both SK-Gd and JUNO have the potential to reach $>$3$σ$ evidence of the DSNB signal within 10 years of measurement. The combined results are likely to enable a $5σ$ discovery of the DSNB signal within the next decade.

preprint2021arXiv

Ab initio calculations of reactor antineutrino fluxes with exact lepton wave functions

New \textit{ab initio} calculations of the isotopic reactor antineutrino fluxes are provided with exact numerical calculations of the lepton wave functions, assuming all the decay branches are allowed GT transitions. We illustrate that the analytical Fermi function and finite size effect each could have the largest spectral deviation of $\mathcal{O}(10\%)$, whereas the effect of their combination could result in spectral deviations at the level of 5%-10%. Meanwhile, we also find that several forms of the extended charge distributions have negligible effects on the spectral variation. Using the state-of-the-art nuclear databases, compared to usual \textit{ab initio} calculations using the analytical single beta decay spectrum, our new calculation shows sizable but opposite spectral deviations at the level of 2%-4% for the cumulative antineutrino and electron energy spectra which may partially contribute to the observed spectral excess in the high energy antineutrino range. Finally we observe that the {bias} of analytical beta decay spectrum approximation is rather universal for all the four fissionable isotopes.

preprint2021arXiv

Collective neutrino oscillations in moving and polarized matter

We consider neutrino evolution master equations in dense moving and polarized matter consisted of electrons, neutrons, protons and neutrinos. We also take into account the neutrino magnetic moment interaction with a magnetic field. We point out the mechanisms responsible for the neutrino spin precession and provide the expressions for the corresponding interaction Hamiltonians that should be taken into account in theoretical treatments of collective neutrino oscillations.

preprint2020arXiv

JULOC: A Local 3-D Refined Crust Model for the Geoneutrino Measurement at JUNO

Geothermal energy is the key to drive the plate tectonics and interior thermodynamics of the Earth. The surface heat flux, as measured in boreholes, provide limited insights into the relative contributions of primordial versus radiogenic sources of the heat budget of the mantle. Geoneutrinos, electron antineutrinos that produced from the radioactive decay of the heat producing elements, are unique probes that bring direct information about the amount and distribution of heat producing elements in the crust and mantle. Cosmochemical, geochemical, and geodynamic compositional models of the Bulk Silicate Earth (BSE) individually predicts different mantle neutrino fluxes, and therefore can be distinguished by the direct measurement of geoneutrinos. The 20 kton detector of the Jiangmen Underground Neutrino Observatory (JUNO), currently under construction in the Guangdong Province (China), is expected to provide an exciting opportunity to obtain a high statistics measurement, which will produce sufficient data to address several key questions of geological importance. To test different compositional models of the mantle, an accurate estimation of the crust geoneutrino flux based on a three-dimensional (3-D) crust model in advance is important. This paper presents a 3-D crust model over a surface area of 10-degrees-times-10-degrees grid surrounding the JUNO detector and a depth down to the Moho discontinuity, based on the geological, geophysical and geochemistry properties. The 3-D model provides a distinction of the volumes of the different geological layers together with the corresponding Th and U abundances. We also present our predicted local contribution to the total geoneutrino flux and the corresponding radiogenic heat.

preprint2020arXiv

Prospects for Pre-supernova Neutrino Observation in Future Large Liquid-scintillator Detectors

Before massive stars heavier than $(8 \cdots 10)$ solar masses evolve to the phase of a gravitational core collapse, they will emit a huge number of MeV-energy neutrinos that are mainly produced in the thermal processes and nuclear weak interactions. The detection of such pre-supernova (pre-SN) neutrinos could provide an important and independent early warning for the optical observations of core-collapse SNe. In this paper, we investigate the prospects of future large liquid-scintillator detectors for the observation of pre-SN neutrinos in both $\barν^{}_e + p \to e^+ + n$ and $ν(\barν) + e^- \to ν(\barν) + e^-$ reaction channels, where $ν$ ($\barν$) denotes neutrinos (antineutrinos) of all three flavors. We propose a quantitative assessment of the capability in terms of three working criteria, namely, how far the SN distance can be covered, how long the early warning before the core collapse can be sent out, and how well the direction pointing to the SN can be determined. The dependence of the final results on the different models of progenitor stars, neutrino flavor conversions and the relevant backgrounds is also discussed.

preprint2020arXiv

Weakly Supervised Learning Meets Ride-Sharing User Experience Enhancement

Weakly supervised learning aims at coping with scarce labeled data. Previous weakly supervised studies typically assume that there is only one kind of weak supervision in data. In many applications, however, raw data usually contains more than one kind of weak supervision at the same time. For example, in user experience enhancement from Didi, one of the largest online ride-sharing platforms, the ride comment data contains severe label noise (due to the subjective factors of passengers) and severe label distribution bias (due to the sampling bias). We call such a problem as "compound weakly supervised learning". In this paper, we propose the CWSL method to address this problem based on Didi ride-sharing comment data. Specifically, an instance reweighting strategy is employed to cope with severe label noise in comment data, where the weights for harmful noisy instances are small. Robust criteria like AUC rather than accuracy and the validation performance are optimized for the correction of biased data label. Alternating optimization and stochastic gradient methods accelerate the optimization on large-scale data. Experiments on Didi ride-sharing comment data clearly validate the effectiveness. We hope this work may shed some light on applying weakly supervised learning to complex real situations.

preprint2019arXiv

Non-negligible Oscillation Effects in the Crustal Geo-neutrino Calculations

An accurate prediction of the geo-neutrino signal from the crust serves as a necessary prerequisite in the determination of the geo-neutrino flux from the mantle. In this work we report the non-negligible effect associated to the exact three-flavor antineutrino survival probability in the calculation of the crustal geo-neutrino signal, which was usually approximated as a constant average in previous studies. A geo-neutrino signal underestimation of about 1-2 TNU is observed as a result of the oscillatory behaviour within the local crustal region extending for about 300 km from the experimental site. We also estimated that the Mikheyev-Smirnov-Wolfenstein matter oscillation is responsible for a $0.1\%$-$0.3\%$ increase of the local crustal signal, depending on the detector location. This work reminds that the exact oscillation possibility in matter should be considered for future prediction of the local crustal geo-neutrino signal.

preprint2019arXiv

Towards the meV limit of the effective neutrino mass in neutrinoless double-beta decays

In this paper, we emphasize why it is important for future neutrinoless double-beta ($0νββ$) decay experiments to reach the sensitivity to the effective neutrino mass $|m^{}_{ββ}| \approx 1~{\rm meV}$. Assuming such a sensitivity and the precisions on neutrino oscillation parameters after the JUNO experiment, we fully explore the constrained regions of the lightest neutrino mass $m^{}_1$ and two Majorana-type CP-violating phases $\{ρ, σ\}$. The implications for the neutrino mass spectrum, the effective neutrino mass $m^{}_β$ in beta decays and the sum of three neutrino masses $Σ\equiv m^{}_1 + m^{}_2 + m^{}_3$ relevant for cosmological observations are also discussed.

preprint2013arXiv

Convex and Scalable Weakly Labeled SVMs

In this paper, we study the problem of learning from weakly labeled data, where labels of the training examples are incomplete. This includes, for example, (i) semi-supervised learning where labels are partially known; (ii) multi-instance learning where labels are implicitly known; and (iii) clustering where labels are completely unknown. Unlike supervised learning, learning with weak labels involves a difficult Mixed-Integer Programming (MIP) problem. Therefore, it can suffer from poor scalability and may also get stuck in local minimum. In this paper, we focus on SVMs and propose the WellSVM via a novel label generation strategy. This leads to a convex relaxation of the original MIP, which is at least as tight as existing convex Semi-Definite Programming (SDP) relaxations. Moreover, the WellSVM can be solved via a sequence of SVM subproblems that are much more scalable than previous convex SDP relaxations. Experiments on three weakly labeled learning tasks, namely, (i) semi-supervised learning; (ii) multi-instance learning for locating regions of interest in content-based information retrieval; and (iii) clustering, clearly demonstrate improved performance, and WellSVM is also readily applicable on large data sets.

preprint2009arXiv

Matter Effects in Solar Neutrino Active-Sterile Oscillations

We study the matter effects for solar neutrino oscillations in a general scheme, without any constraint on the number of sterile neutrinos and the mixing matrix elements, only assuming a realistic hierarchy of neutrino squared-mass differences in which the smallest squared-mass difference is effective in solar neutrino oscillations. The validity of the analytic results are illustrated with a numerical solution of the evolution equation in the simplest case of four-neutrino mixing with the realistic matter density profile inside the Sun.