Source author record

Bowen Xiao

Bowen Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language Distributed, Parallel, and Cluster Computing hep-ph q-fin.TR

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Stable Preference Optimization: A Bilevel Approach to Catastrophic Preference Shift

Direct Preference Learning has emerged as a dominant offline paradigm for preference optimization. Most of these methods are based on the Bradley-Terry (BT) model for pairwise preference ranking, which directly aligns language model with human preference. Prior work has observed a counter-intuitive phenomenon termed likelihood displacement, where the absolute probability of preferred responses decreases simultaneously during training. We demonstrate that such displacement can lead to a more devastating failure mode, which we defined as \textit{Catastrophic Preference Shift}, where the lost preference probability mass inadvertently shifts toward out-of-distribution (OOD) responses. Such a failure mode is a key limitation shared across BT-style direct preference learning methods, due to the fundamental conflict between the unconstrained discriminative alignment and generative foundational capabilities, ultimately leading to severe performance degradation (e.g., SimPO suffers a significant drop in reasoning accuracy from 73.5\% to 37.5\%). We analyze existing BT-style methods from the probability evolution perspective and theoretically prove that these methods exhibit over-reliance on model initialization and can lead to preference shift. To resolve these counter-intuitive behaviors, we propose a theoretically grounded Stable Preference Optimization (SPO) framework that constrains preference learning within a safe alignment region. Empirical evaluations demonstrate that SPO effectively stabilizes and enhances the performance of existing BT-style preference learning methods. SPO provides new insights into the design of preference learning objectives and opens up new avenues towards more reliable and interpretable language model alignment.

preprint2022arXiv

FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance

As deep reinforcement learning (DRL) has been recognized as an effective approach in quantitative finance, getting hands-on experiences is attractive to beginners. However, to train a practical DRL trading agent that decides where to trade, at what price, and what quantity involves error-prone and arduous development and debugging. In this paper, we introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance and to develop their own stock trading strategies. Along with easily-reproducible tutorials, FinRL library allows users to streamline their own developments and to compare with existing schemes easily. Within FinRL, virtual environments are configured with stock market datasets, trading agents are trained with neural networks, and extensive backtesting is analyzed via trading performance. Moreover, it incorporates important trading constraints such as transaction cost, market liquidity and the investor's degree of risk-aversion. FinRL is featured with completeness, hands-on tutorial and reproducibility that favors beginners: (i) at multiple levels of time granularity, FinRL simulates trading environments across various stock markets, including NASDAQ-100, DJIA, S&P 500, HSI, SSE 50, and CSI 300; (ii) organized in a layered architecture with modular structure, FinRL provides fine-tuned state-of-the-art DRL algorithms (DQN, DDPG, PPO, SAC, A2C, TD3, etc.), commonly-used reward functions and standard evaluation baselines to alleviate the debugging workloads and promote the reproducibility, and (iii) being highly extendable, FinRL reserves a complete set of user-import interfaces. Furthermore, we incorporated three application demonstrations, namely single stock trading, multiple stock trading, and portfolio allocation. The FinRL library will be available on Github at link https://github.com/AI4Finance-LLC/FinRL-Library.

preprint2019arXiv

EdgeToll: A Blockchain-based Toll Collection System for Public Sharing of Heterogeneous Edges

Edge computing is a novel paradigm designed toimprove the quality of service for latency sensitive cloud applications. However, the state-of-the-art edge services are designedfor specific applications, which are isolated from each other.To better improve the utilization level of edge nodes, publicresource sharing among edges from distinct service providersshould be encouraged economically. In this work, we employ thepayment channel techniques to design and implement EdgeToll,a blockchain-based toll collection system for heterogeneous public edge sharing. Test-bed has been developed to validate theproposal and preliminary experiments have been conducted todemonstrate the time and cost efficiency of the system.

preprint2010arXiv

Phases of Augmented Hadronic Light-Front Wave Functions

It is an important question whether the final/initial state gluonic interactions which lead to naive-time-reversal-odd single-spin asymmetries and diffraction at leading twist can be associated in a definite way with the light-front wave function hadronic eigensolutions of QCD. We use light-front time-ordered perturbation theory to obtain augmented light-front wave functions which contain an imaginary phase which depends on the choice of advanced or retarded boundary condition for the gauge potential in light-cone gauge. We apply this formalism to the wave functions of the valence Fock states of nucleons and pions, and show how this illuminates the factorization properties of naive-time-reversal-odd transverse momentum dependent observables which arise from rescattering. In particular, one calculates the identical leading-twist Sivers function from the overlap of augmented light-front wavefunctions that one obtains from explicit calculations of the single-spin asymmetry in semi-inclusive deep inelastic lepton-polarized nucleon scattering where the required phases come from the final-state rescattering of the struck quark with the nucleon spectators.