Source author record

Yunjiang Jiang

Yunjiang Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR Information Retrieval Machine Learning math.CO math.RT Artificial Intelligence Computation and Language math.ST Statistics Theory

Catalog footprint

What is connected

14works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LIME: Link-based user-item Interaction Modeling with decoupled xor attention for Efficient test time scaling

Scaling large recommendation systems requires advancing three major frontiers: processing longer user histories, expanding candidate sets, and increasing model capacity. While promising, transformers' computational cost scales quadratically with the user sequence length and linearly with the number of candidates. This trade-off makes it prohibitively expensive to expand candidate sets or increase sequence length at inference, despite the significant performance improvements. We introduce \textbf{LIME}, a novel architecture that resolves this trade-off. Through two key innovations, LIME fundamentally reduces computational complexity. First, low-rank ``link embeddings" enable pre-computation of attention weights by decoupling user and candidate interactions, making the inference cost nearly independent of candidate set size. Second, a linear attention mechanism, \textbf{LIME-XOR}, reduces the complexity with respect to user sequence length from quadratic ($O(N^2)$) to linear ($O(N)$). Experiments on public and industrial datasets show LIME achieves near-parity with state-of-the-art transformers but with a 10$\times$ inference speedup on large candidate sets or long sequence lengths. When tested on a major recommendation platform, LIME improved user engagement while maintaining minimal inference costs with respect to candidate set size and user history length, establishing a new paradigm for efficient and expressive recommendation systems.

preprint2022arXiv

Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Product quantization (PQ) coupled with a space rotation, is widely used in modern approximate nearest neighbor (ANN) search systems to significantly compress the disk storage for embeddings and speed up the inner product computation. Existing rotation learning methods, however, minimize quantization distortion for fixed embeddings, which are not applicable to an end-to-end training scenario where embeddings are updated constantly. In this paper, based on geometric intuitions from Lie group theory, in particular the special orthogonal group $SO(n)$, we propose a family of block Givens coordinate descent algorithms to learn rotation matrix that are provably convergent on any convex objectives. Compared to the state-of-the-art SVD method, the Givens algorithms are much more parallelizable, reducing runtime by orders of magnitude on modern GPUs, and converge more stably according to experimental studies. They further improve upon vanilla product quantization significantly in an end-to-end training scenario.

preprint2022arXiv

Sequential Search with Off-Policy Reinforcement Learning

Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's current and past search queries, in addition to behavior on historical query sessions. The SS learning task is even more important than the counterpart SR task for most of E-commence companies due to its much larger online serving demands as well as traffic volume. To this end, we propose a highly scalable hybrid learning model that consists of an RNN learning framework leveraging all features in short-term user-item interactions, and an attention model utilizing selected item-only features from long-term interactions. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. Moreover, we explore the use of off-policy reinforcement learning in multi-session personalized search ranking. Specifically, we design a pairwise Deep Deterministic Policy Gradient model that efficiently captures users' long term reward in terms of pairwise classification error. Extensive ablation experiments demonstrate significant improvement each component brings to its state-of-the-art baseline, on a variety of offline and online metrics.

preprint2021arXiv

Adversarial Mixture Of Experts with Category Hierarchy Soft Constraint

Product search is the most common way for people to satisfy their shopping needs on e-commerce websites. Products are typically annotated with one of several broad categorical tags, such as "Clothing" or "Electronics", as well as finer-grained categories like "Refrigerator" or "TV", both under "Electronics". These tags are used to construct a hierarchy of query categories. Distributions of features such as price and brand popularity vary wildly across query categories. In addition, feature importance for the purpose of CTR/CVR predictions differs from one category to another. In this work, we leverage the Mixture of Expert (MoE) framework to learn a ranking model that specializes for each query category. In particular, our gate network relies solely on the category ids extracted from the user query. While classical MoE's pick expert towers spontaneously for each input example, we explore two techniques to establish more explicit and transparent connections between the experts and query categories. To help differentiate experts on their domain specialties, we introduce a form of adversarial regularization among the expert outputs, forcing them to disagree with one another. As a result, they tend to approach each prediction problem from different angles, rather than copying one another. This is validated by a much stronger clustering effect of the gate output vectors under different categories. In addition, soft gating constraints based on the categorical hierarchy are imposed to help similar products choose similar gate values. and make them more likely to share similar experts. This allows aggregation of training data among smaller sibling categories to overcome data scarcity.

preprint2021arXiv

Heterogeneous Network Embedding for Deep Semantic Relevance Match in E-commerce Search

Result relevance prediction is an essential task of e-commerce search engines to boost the utility of search engines and ensure smooth user experience. The last few years eyewitnessed a flurry of research on the use of Transformer-style models and deep text-match models to improve relevance. However, these two types of models ignored the inherent bipartite network structures that are ubiquitous in e-commerce search logs, making these models ineffective. We propose in this paper a novel Second-order Relevance, which is fundamentally different from the previous First-order Relevance, to improve result relevance prediction. We design, for the first time, an end-to-end First-and-Second-order Relevance prediction model for e-commerce item relevance. The model is augmented by the neighborhood structures of bipartite networks that are built using the information of user behavioral feedback, including clicks and purchases. To ensure that edges accurately encode relevance information, we introduce external knowledge generated from BERT to refine the network of user behaviors. This allows the new model to integrate information from neighboring items and queries, which are highly relevant to the focus query-item pair under consideration. Results of offline experiments showed that the new model significantly improved the prediction accuracy in terms of human relevance judgment. An ablation study showed that the First-and-Second-order model gained a 4.3% average gain over the First-order model. Results of an online A/B test revealed that the new model derived more commercial benefits compared to the base model.

preprint2020arXiv

Fine-tune BERT for E-commerce Non-Default Search Ranking

The quality of non-default ranking on e-commerce platforms, such as based on ascending item price or descending historical sales volume, often suffers from acute relevance problems, since the irrelevant items are much easier to be exposed at the top of the ranking results. In this work, we propose a two-stage ranking scheme, which first recalls wide range of candidate items through refined query/title keyword matching, and then classifies the recalled items using BERT-Large fine-tuned on human label data. We also implemented parallel prediction on multiple GPU hosts and a C++ tokenization custom op of Tensorflow. In this data challenge, our model won the 1st place in the supervised phase (based on overall F1 score) and 2nd place in the final phase (based on average per query F1 score).

preprint2020arXiv

Towards Personalized and Semantic Retrieval: An End-to-End Solution for E-commerce Search via Embedding Learning

Nowadays e-commerce search has become an integral part of many people's shopping routines. Two critical challenges stay in today's e-commerce search: how to retrieve items that are semantically relevant but not exact matching to query terms, and how to retrieve items that are more personalized to different users for the same search query. In this paper, we present a novel approach called DPSR, which stands for Deep Personalized and Semantic Retrieval, to tackle this problem. Explicitly, we share our design decisions on how to architect a retrieval system so as to serve industry-scale traffic efficiently and how to train a model so as to learn query and item semantics accurately. Based on offline evaluations and online A/B test with live traffics, we show that DPSR model outperforms existing models, and DPSR system can retrieve more personalized and semantically relevant items to significantly improve users' search experience by +1.29% conversion rate, especially for long tail queries by +10.03%. As a result, our DPSR system has been successfully deployed into JD.com's search production since 2019.

preprint2015arXiv

Mixing time of Metropolis chain based on random transposition walk converging to multivariate Ewens distribution

We prove sharp rates of convergence to the Ewens equilibrium distribution for a family of Metropolis algorithms based on the random transposition shuffle on the symmetric group, with starting point at the identity. The proofs rely heavily on the theory of symmetric Jack polynomials, developed initially by Jack [Proc. Roy. Soc. Edinburgh Sect. A 69 (1970/1971) 1-18], Macdonald [Symmetric Functions and Hall Polynomials (1995) New York] and Stanley [Adv. Math. 77 (1989) 76-115]. This completes the analysis started by Diaconis and Hanlon in [Contemp. Math. 138 (1992) 99-117]. In the end we also explore other integrable Markov chains that can be obtained from symmetric function theory.

preprint2012arXiv

Smallest Gaps Between Eigenvalues of Random Matrices With Complex Ginibre, Wishart and Universal Unitary Ensembles

In this paper we study the limiting distribution of the $k$ smallest gaps between eigenvalues of three kinds of random matrices -- the Ginibre ensemble, the Wishart ensemble and the universal unitary ensemble. All of them follow a Poissonian ansatz. More precisely, for the Ginibre ensemble we have a global result in which the $k$-th smallest gap has typical length $n^{-3/4}$ with density $x^{4k-1}e^{-x^4}$ after normalization. For the Wishart and the universal unitary ensemble, it has typical length $n^{-4/3}$ and has density $x^{3k-1}e^{-x^3}$ after normalization.

preprint2012arXiv

Total variation bound for Kac's random walk

We show that the classical Kac's random walk on $(n-1)$-sphere $S^{n-1}$ starting from the point mass at $e_1$ mixes in $\mathcal{O}(n^5(\log n)^3)$ steps in total variation distance. The main argument uses a truncation of the running density after a burn-in period, followed by $\mathcal{L}^2$ convergence using the spectral gap information derived by other authors. This improves upon a previous bound by Diaconis and Saloff-Coste of order $\mathcal {O}(n^{2n})$.

preprint2011arXiv

Asymptotic correlations of metrics on the symmetric groups

We consider the asymptotic joint distributions among several families of well-known metrics on $S_n$, the symmetric group. These include the bi-invariant metrics such as the Cayley and Hamming distance, and the left-invariant metrics such as Spearman's footrule, Kendall's tau, and the Ulam distance. We also introduce a natural limit of the Spearman family, $ρ_\infty$, and study its asymptotic distribution and relation with other metrics. This is a continuation of earlier work on the asymptotic independence of bi-invariant metrics on both $S_n$ and general linear groups over a finite field. The technique is based on some simple observation about the record map and Hammersley's device. In several cases, we give near-optimal estimate of the error term for asymptotic independence. This simplifies significantly the proof of a central limit theorem by Bai, Chao, and Liang regarding the oscillation of a permutation.

preprint2011arXiv

Asymptotic spectral independence of Wigner ensembles

We consider the joint distribution of eigenvalue clusters of the Wigner ensemble separated by macroscopic distances (i.e., on the same scale as the difference between the edges of the semicircle law). We prove that under an averaging condition, the correlation function governing any finite collection of clusters converges to that of independence point processes. The proof relies heavily on the machinery developed by Erdos, Ramirez, Schlein, and Yau.

preprint2011arXiv

Mixing time upper bound for the uniformized Rosenthal walk on the special orthogonal groups

We prove that a uniformized variant of both the Rosenthal walk \cite{Rosenthal} and the Kac random walk \cite{Kac} on SO(n) mixes in $\cO(n^3)$ steps in total variation distance. The proof also extends easily to Rosenthal walk with fixed angle $θ\neq π$. To the best of our knowledge, this is the first polynomial time bound for both walks. The techniques employed are mainly from representation theory of SO(n). But a crucial new ingredient is the interpretation of the Fourier coefficients of the character ratio as counting the number of particle cascade paths arising from the classical branching rules.

preprint2011arXiv

On Number of Turns in Reduced Random Lattice Paths

We consider the tree-reduced path of symmetric random walk on $\ZZ^{d}$. It is interesting to ask about the number of turns $T_n$ in the reduced path after $n$ steps. This question arises from inverting signature for lattice paths. We show that, when $n$ is large, the mean and variance of $T_n$ have the same order as $n$, while the second order terms are O(1). We then use these estimates to obtain limit theorems for $T_n$. Similar results hold for any other finite patterns as well.

Yunjiang Jiang

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

LIME: Link-based user-item Interaction Modeling with decoupled xor attention for Efficient test time scaling

Givens Coordinate Descent Methods for Rotation Matrix Learning in Trainable Embedding Indexes

Sequential Search with Off-Policy Reinforcement Learning

Adversarial Mixture Of Experts with Category Hierarchy Soft Constraint

Heterogeneous Network Embedding for Deep Semantic Relevance Match in E-commerce Search

Fine-tune BERT for E-commerce Non-Default Search Ranking

Towards Personalized and Semantic Retrieval: An End-to-End Solution for E-commerce Search via Embedding Learning

Mixing time of Metropolis chain based on random transposition walk converging to multivariate Ewens distribution

Smallest Gaps Between Eigenvalues of Random Matrices With Complex Ginibre, Wishart and Universal Unitary Ensembles

Total variation bound for Kac's random walk

Asymptotic correlations of metrics on the symmetric groups

Asymptotic spectral independence of Wigner ensembles

Mixing time upper bound for the uniformized Rosenthal walk on the special orthogonal groups

On Number of Turns in Reduced Random Lattice Paths