Researcher profile

Xiaosong Sun

Xiaosong Sun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

RelayGR: Scaling Long-Sequence Generative Recommendation via Cross-Stage Relay-Race Inference

Real-time recommender systems execute multi-stage cascades (retrieval, pre-processing, fine-grained ranking) under strict tail-latency SLOs, leaving only tens of milliseconds for ranking. Generative recommendation (GR) models can improve quality by consuming long user-behavior sequences, but in production their online sequence length is tightly capped by the ranking-stage P99 budget. We observe that the majority of GR tokens encode user behaviors that are independent of the item candidates, suggesting an opportunity to pre-infer a user-behavior prefix once and reuse it during ranking rather than recomputing it on the critical path. Realizing this idea at industrial scale is non-trivial: the prefix cache must survive across multiple pipeline stages before the final ranking instance is determined, the user population implies cache footprints far beyond a single device, and indiscriminate pre-inference would overload shared resources under high QPS. We present RelayGR, a production system that enables in-HBM relay-race inference for GR. RelayGR selectively pre-infers long-term user prefixes, keeps their KV caches resident in HBM over the request lifecycle, and ensures the subsequent ranking can consume them without remote fetches. RelayGR combines three techniques: 1) a sequence-aware trigger that admits only at-risk requests under a bounded cache footprint and pre-inference load, 2) an affinity-aware router that co-locates cache production and consumption by routing both the auxiliary pre-infer signal and the ranking request to the same instance, and 3) a memory-aware expander that uses server-local DRAM to capture short-term cross-request reuse while avoiding redundant reloads. We implement RelayGR on Huawei Ascend NPUs and evaluate it with real queries. Under a fixed P99 SLO, RelayGR supports up to 1.5$\times$ longer sequences and improves SLO-compliant throughput by up to 3.6$\times$.

preprint2011arXiv

Polynomial maps with invertible sums of Jacobian matrices and of directional Derivatives

Let $F: C^n \rightarrow C^m$ be a polynomial map with $degF=d \geq 2$. We prove that $F$ is invertible if $m = n$ and $\sum^{d-1}_{i=1} JF(α_i)$ is invertible for all $i$, which is trivially the case for invertible quadratic maps. More generally, we prove that for affine lines $L = \{β+ μγ| μ\in C\} \subseteq C^n$ ($γ\ne 0$), $F|_L$ is linearly rectifiable, if and only if $\sum^{d-1}_{i=1} JF(α_i) \cdot γ\ne 0$ for all $α_i \in L$. This appears to be the case for all affine lines $L$ when $F$ is injective and $d \le 3$. We also prove that if $m = n$ and $\sum^{n}_{i=1} JF(α_i)$ is invertible for all $α_i \in C^n$, then $F$ is a composition of an invertible linear map and an invertible polynomial map $X+H$ with linear part $X$, such that the subspace generated by $\{JH(α) | α\in C^n\}$ consists of nilpotent matrices.