Source author record

Zhihui Xie

Zhihui Xie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math-ph math.AP math.MP Machine Learning Computation and Language Information Retrieval

Catalog footprint

What is connected

5works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DISA: Offline Importance Sampling for Distribution-Matching LLM-RL

Modern reasoning agents are increasingly evaluated on their ability to generate multiple valid solution paths, plans, or tool-use traces for a given input. Standard reward-maximizing RL tends to collapse onto the most easily reinforced high-reward mode, whereas distribution-matching RL aims to allocate probability mass across the entire reward-shaped solution set. Achieving this objective requires computing a prompt-dependent partition function over the trajectory space. Because existing distribution-matching methods learn this partition function online alongside the policy, calibration errors in the partition function directly distort policy updates and remain impossible to diagnose independently. We introduce DISA, short for Decoupled Importance-Sampled Anchoring, which moves this calibration problem outside the RL loop. DISA draws proposal trajectories offline, estimates the partition function via importance sampling, and freezes the resulting partition-function estimate before policy optimization begins. This decoupling preserves the distribution-matching objective while strictly separating partition-function estimation from policy learning in data, gradients, loss, and diagnostics. Empirically, on two open-weight backbones across six math and three code benchmarks, DISA matches or exceeds the online-coupled distribution-matching baseline FlowRL, outperforms rewardmaximization baselines GRPO and GSPO on math averages, and exceeds LoRASFT distillation by up to 13.8 Mean@8 points on the same offline trajectories. An LLM-as-judge evaluation further shows that DISA retains substantially more strategy-level diversity than reward-maximization baselines, and sensitivity studies on the proposal strength and inverse temperature follow the bias-variance pattern predicted by the analysis.

preprint2022arXiv

Comparison-based Conversational Recommender System with Relative Bandit Feedback

With the recent advances of conversational recommendations, the recommender system is able to actively and dynamically elicit user preference via conversational interactions. To achieve this, the system periodically queries users' preference on attributes and collects their feedback. However, most existing conversational recommender systems only enable the user to provide absolute feedback to the attributes. In practice, the absolute feedback is usually limited, as the users tend to provide biased feedback when expressing the preference. Instead, the user is often more inclined to express comparative preferences, since user preferences are inherently relative. To enable users to provide comparative preferences during conversational interactions, we propose a novel comparison-based conversational recommender system. The relative feedback, though more practical, is not easy to be incorporated since its feedback scale is always mismatched with users' absolute preferences. With effectively collecting and understanding the relative feedback from an interactive manner, we further propose a new bandit algorithm, which we call RelativeConUCB. The experiments on both synthetic and real-world datasets validate the advantage of our proposed method, compared to the existing bandit algorithms in the conversational recommender systems.

preprint2014arXiv

Unconditional Uniqueness of the cubic Gross-Pitaevskii Hierarchy with Low Regularity

In this paper, we establish the unconditional uniqueness of solutions to the cubic Gross-Pitaevskii hierarchy on $\mathbb{R}^d$ in a low regularity Sobolev type space. More precisely, we reduce the regularity $s$ down to the currently known regularity requirement for unconditional uniqueness of solutions to the cubic nonlinear Schrödinger equation ($s\ge\frac{d}{6}$ if $d=1,2$ and $s>s_c=\frac{d-2}{2}$ if $d\ge 3$). In such a way, we extend the recent work of Chen-Hainzl-Pavlović-Seiringer.

preprint2014arXiv

Uniqueness of solutions to the 3D quintic Gross-Pitaevskii Hierarchy

In this paper, we study solutions to the three-dimensional quintic Gross-Pitaevskii hierarchy. We prove unconditional uniqueness among all small solutions in the critical space $\mathfrak{H}^1$ (which corresponds to $H^1$ on the NLS level). With slight modifications to the proof, we also prove unconditional uniqueness of solutions to the Hartree hierarchy without smallness condition. Our proof uses the quantum de Finetti theorem, and is an extension of the work by Chen-Hainzl-Pavlović-Seiringer \cite{CHPS}, and our previous work \cite{UniqueLowReg}.

preprint2013arXiv

Derivation of a Nonlinear Schrödinger Equation with a General power-type nonlinerity

In this paper we study the derivation of a certain type of NLS from many-body interactions of bosonic particles. We consider a model with a finite linear combination of $n$-body interactions, where $n \geq 2$ is an integer. We show that the $k$-particle marginal density of the BBGKY hierarchy converges when particle number goes to infinity, and the limit solves a corresponding infinite Gross-Pitaevskii hierarchy. We prove the uniqueness of factorized solution to the Gross-Pitaevskii hierarchy based on a priori space time estimates. The convergence is established by adapting the arguments originated or developed in \cite{ESY}, \cite{KSS} and \cite{CPquintic}. For the uniqueness part, we expand the procedure followed in \cite{KM} by introducing a different board game argument to handle the new contraction operator. This new board game argument helps us obtain a good estimate on the Duhamel terms. In \cite{KM}, the relevant space time estimates are assumed to be true, while we give a prove for it.