Source author record

Yunlong Xu

Yunlong Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Retrieval Distributed, Parallel, and Cluster Computing Machine Learning physics.ao-ph physics.data-an

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation

Model-based methods for recommender systems have been studied extensively for years. Modern recommender systems usually resort to 1) representation learning models which define user-item preference as the distance between their embedding representations, and 2) embedding-based Approximate Nearest Neighbor (ANN) search to tackle the efficiency problem introduced by large-scale corpus. While providing efficient retrieval, the embedding-based retrieval pattern also limits the model capacity since the form of user-item preference measure is restricted to the distance between their embedding representations. However, for other more precise user-item preference measures, e.g., preference scores directly derived from a deep neural network, they are computationally intractable because of the lack of an efficient retrieval method, and an exhaustive search for all user-item pairs is impractical. In this paper, we propose a novel method to extend ANN search to arbitrary matching functions, e.g., a deep neural network. Our main idea is to perform a greedy walk with a matching function in a similarity graph constructed from all items. To solve the problem that the similarity measures of graph construction and user-item matching function are heterogeneous, we propose a pluggable adversarial training task to ensure the graph search with arbitrary matching function can achieve fairly high precision. Experimental results in both open source and industry datasets demonstrate the effectiveness of our method. The proposed method has been fully deployed in the Taobao display advertising platform and brings a considerable advertising revenue increase. We also summarize our detailed experiences in deployment in this paper.

preprint2022arXiv

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

The development of personalized recommendation has significantly improved the accuracy of information matching and the revenue of e-commerce platforms. Recently, it has 2 trends: 1) recommender systems must be trained timely to cope with ever-growing new products and ever-changing user interests from online marketing and social network; 2) SOTA recommendation models introduce DNN modules to improve prediction accuracy. Traditional CPU-based recommender systems cannot meet these two trends, and GPU- centric training has become a trending approach. However, we observe that GPU devices in training recommender systems are underutilized, and they cannot attain an expected throughput improvement as what it has achieved in CV and NLP areas. This issue can be explained by two characteristics of these recommendation models: First, they contain up to a thousand input feature fields, introducing fragmentary and memory-intensive operations; Second, the multiple constituent feature interaction submodules introduce substantial small-sized compute kernels. To remove this roadblock to the development of recommender systems, we propose a novel framework named PICASSO to accelerate the training of recommendation models on commodity hardware. Specifically, we conduct a systematic analysis to reveal the bottlenecks encountered in training recommendation models. We leverage the model structure and data distribution to unleash the potential of hardware through our packing, interleaving, and caching optimization. Experiments show that PICASSO increases the hardware utilization by an order of magnitude on the basis of SOTA baselines and brings up to 6x throughput improvement for a variety of industrial recommendation models. Using the same hardware budget in production, PICASSO on average shortens the walltime of daily training tasks by 7 hours, significantly reducing the delay of continuous delivery.

preprint2022arXiv

TSUBASA: Climate Network Construction on Historical and Real-Time Data

A climate network represents the global climate system by the interactions of a set of anomaly time-series. Network science has been applied on climate data to study the dynamics of a climate network. The core task and first step to enable interactive network science on climate data is the efficient construction and update of a climate network on user-defined time-windows. We present TSUBASA, an algorithm for the efficient construction of climate networks based on the exact calculation of Pearsons correlation of large time-series. By pre-computing simple and low-overhead statistics, TSUBASA can efficiently compute the exact pairwise correlation of time-series on arbitrary time windows at query time. For real-time data, TSUBASA proposes a fast and incremental way of updating a network at interactive speed. Our experiments show that TSUBASA is faster than approximate solutions at least one order of magnitude for both historical and real-time data and outperforms a baseline for time-series correlation calculation up to two orders of magnitude.

Yunlong Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

TSUBASA: Climate Network Construction on Historical and Real-Time Data