Source author record

Xiaosong Sun

Xiaosong Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Distributed, Parallel, and Cluster Computing Machine Learning math.AC

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RelayGR: Scaling Long-Sequence Generative Recommendation via Cross-Stage Relay-Race Inference

Real-time recommender systems execute multi-stage cascades (retrieval, pre-processing, fine-grained ranking) under strict tail-latency SLOs, leaving only tens of milliseconds for ranking. Generative recommendation (GR) models can improve quality by consuming long user-behavior sequences, but in production their online sequence length is tightly capped by the ranking-stage P99 budget. We observe that the majority of GR tokens encode user behaviors that are independent of the item candidates, suggesting an opportunity to pre-infer a user-behavior prefix once and reuse it during ranking rather than recomputing it on the critical path. Realizing this idea at industrial scale is non-trivial: the prefix cache must survive across multiple pipeline stages before the final ranking instance is determined, the user population implies cache footprints far beyond a single device, and indiscriminate pre-inference would overload shared resources under high QPS. We present RelayGR, a production system that enables in-HBM relay-race inference for GR. RelayGR selectively pre-infers long-term user prefixes, keeps their KV caches resident in HBM over the request lifecycle, and ensures the subsequent ranking can consume them without remote fetches. RelayGR combines three techniques: 1) a sequence-aware trigger that admits only at-risk requests under a bounded cache footprint and pre-inference load, 2) an affinity-aware router that co-locates cache production and consumption by routing both the auxiliary pre-infer signal and the ranking request to the same instance, and 3) a memory-aware expander that uses server-local DRAM to capture short-term cross-request reuse while avoiding redundant reloads. We implement RelayGR on Huawei Ascend NPUs and evaluate it with real queries. Under a fixed P99 SLO, RelayGR supports up to 1.5$\times$ longer sequences and improves SLO-compliant throughput by up to 3.6$\times$.

preprint2011arXiv

Polynomial maps with invertible sums of Jacobian matrices and of directional Derivatives

Let $F: C^n \rightarrow C^m$ be a polynomial map with $degF=d \geq 2$. We prove that $F$ is invertible if $m = n$ and $\sum^{d-1}_{i=1} JF(α_i)$ is invertible for all $i$, which is trivially the case for invertible quadratic maps. More generally, we prove that for affine lines $L = \{β+ μγ| μ\in C\} \subseteq C^n$ ($γ\ne 0$), $F|_L$ is linearly rectifiable, if and only if $\sum^{d-1}_{i=1} JF(α_i) \cdot γ\ne 0$ for all $α_i \in L$. This appears to be the case for all affine lines $L$ when $F$ is injective and $d \le 3$. We also prove that if $m = n$ and $\sum^{n}_{i=1} JF(α_i)$ is invertible for all $α_i \in C^n$, then $F$ is a composition of an invertible linear map and an invertible polynomial map $X+H$ with linear part $X$, such that the subspace generated by $\{JH(α) | α\in C^n\}$ consists of nilpotent matrices.