Researcher profile

Fuyu Lv

Fuyu Lv contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating only a subset of experts per token. However, standard MoE employs a fixed Top-K routing strategy, leading to redundant computation and suboptimal inference latency. Existing acceleration methods either require costly retraining with architectural changes or suffer from severe performance drop at high sparsity due to train-inference mismatch. To address these limitations, we propose BEAM (Binary Expert Activation Masking), a novel method that learns token-adaptive expert selection via trainable binary masks. With a straight-through estimator and an auxiliary regularization loss, BEAM induces dynamic expert sparsity through end-to-end training while maintaining model capability. We further implement an efficient custom CUDA kernel for BEAM, ensuring seamless integration with the vLLM inference framework. Experiments show that BEAM retains over 98\% of the original model's performance while reducing MoE layer FLOPs by up to 85\%, achieving up to 2.5$\times$ faster decoding and 1.4$\times$ higher throughput, demonstrating its effectiveness as a practical, plug-and-play solution for efficient MoE inference.

preprint2022arXiv

IHGNN: Interactive Hypergraph Neural Network for Personalized Product Search

A good personalized product search (PPS) system should not only focus on retrieving relevant products, but also consider user personalized preference. Recent work on PPS mainly adopts the representation learning paradigm, e.g., learning representations for each entity (including user, product and query) from historical user behaviors (aka. user-product-query interactions). However, we argue that existing methods do not sufficiently exploit the crucial collaborative signal, which is latent in historical interactions to reveal the affinity between the entities. Collaborative signal is quite helpful for generating high-quality representation, exploiting which would benefit the representation learning of one node from its connected nodes. To tackle this limitation, in this work, we propose a new model IHGNN for personalized product search. IHGNN resorts to a hypergraph constructed from the historical user-product-query interactions, which could completely preserve ternary relations and express collaborative signal based on the topological structure. On this basis, we develop a specific interactive hypergraph neural network to explicitly encode the structure information (i.e., collaborative signal) into the embedding process. It collects the information from the hypergraph neighbors and explicitly models neighbor feature interaction to enhance the representation of the target entity. Extensive experiments on three real-world datasets validate the superiority of our proposal over the state-of-the-arts.

preprint2022arXiv

Intelligent Request Strategy Design in Recommender System

Waterfall Recommender System (RS), a popular form of RS in mobile applications, is a stream of recommended items consisting of successive pages that can be browsed by scrolling. In waterfall RS, when a user finishes browsing a page, the edge (e.g., mobile phones) would send a request to the cloud server to get a new page of recommendations, known as the paging request mechanism. RSs typically put a large number of items into one page to reduce excessive resource consumption from numerous paging requests, which, however, would diminish the RSs' ability to timely renew the recommendations according to users' real-time interest and lead to a poor user experience. Intuitively, inserting additional requests inside pages to update the recommendations with a higher frequency can alleviate the problem. However, previous attempts, including only non-adaptive strategies (e.g., insert requests uniformly), would eventually lead to resource overconsumption. To this end, we envision a new learning task of edge intelligence named Intelligent Request Strategy Design (IRSD). It aims to improve the effectiveness of waterfall RSs by determining the appropriate occasions of request insertion based on users' real-time intention. Moreover, we propose a new paradigm of adaptive request insertion strategy named Uplift-based On-edge Smart Request Framework (AdaRequest). AdaRequest 1) captures the dynamic change of users' intentions by matching their real-time behaviors with their historical interests based on attention-based neural networks. 2) estimates the counterfactual uplift of user purchase brought by an inserted request based on causal inference. 3) determines the final request insertion strategy by maximizing the utility function under online resource constraints. We conduct extensive experiments on both offline dataset and online A/B test to verify the effectiveness of AdaRequest.

preprint2022arXiv

Modeling User Behavior with Graph Convolution for Personalized Product Search

User preference modeling is a vital yet challenging problem in personalized product search. In recent years, latent space based methods have achieved state-of-the-art performance by jointly learning semantic representations of products, users, and text tokens. However, existing methods are limited in their ability to model user preferences. They typically represent users by the products they visited in a short span of time using attentive models and lack the ability to exploit relational information such as user-product interactions or item co-occurrence relations. In this work, we propose to address the limitations of prior arts by exploring local and global user behavior patterns on a user successive behavior graph, which is constructed by utilizing short-term actions of all users. To capture implicit user preference signals and collaborative patterns, we use an efficient jumping graph convolution to explore high-order relations to enrich product representations for user preference modeling. Our approach can be seamlessly integrated with existing latent space based methods and be potentially applied in any product retrieval method that uses purchase history to model user preferences. Extensive experiments on eight Amazon benchmarks demonstrate the effectiveness and potential of our approach. The source code is available at \url{https://github.com/floatSDSDS/SBG}.

preprint2022arXiv

Re-weighting Negative Samples for Model-Agnostic Matching

Recommender Systems (RS), as an efficient tool to discover users' interested items from a very large corpus, has attracted more and more attention from academia and industry. As the initial stage of RS, large-scale matching is fundamental yet challenging. A typical recipe is to learn user and item representations with a two-tower architecture and then calculate the similarity score between both representation vectors, which however still struggles in how to properly deal with negative samples. In this paper, we find that the common practice that randomly sampling negative samples from the entire space and treating them equally is not an optimal choice, since the negative samples from different sub-spaces at different stages have different importance to a matching model. To address this issue, we propose a novel method named Unbiased Model-Agnostic Matching Approach (UMA$^2$). It consists of two basic modules including 1) General Matching Model (GMM), which is model-agnostic and can be implemented as any embedding-based two-tower models; and 2) Negative Samples Debias Network (NSDN), which discriminates negative samples by borrowing the idea of Inverse Propensity Weighting (IPW) and re-weighs the loss in GMM. UMA$^2$ seamlessly integrates these two modules in an end-to-end multi-task learning framework. Extensive experiments on both real-world offline dataset and online A/B test demonstrate its superiority over state-of-the-art methods.

preprint2022arXiv

XDM: Improving Sequential Deep Matching with Unclicked User Behaviors for Recommender System

Deep learning-based sequential recommender systems have recently attracted increasing attention from both academia and industry. Most of industrial Embedding-Based Retrieval (EBR) system for recommendation share the similar ideas with sequential recommenders. Among them, how to comprehensively capture sequential user interest is a fundamental problem. However, most existing sequential recommendation models take as input clicked or purchased behavior sequences from user-item interactions. This leads to incomprehensive user representation and sub-optimal model performance, since they ignore the complete user behavior exposure data, i.e., items impressed yet unclicked by users. In this work, we attempt to incorporate and model those unclicked item sequences using a new learning approach in order to explore better sequential recommendation technique. An efficient triplet metric learning algorithm is proposed to appropriately learn the representation of unclicked items. Our method can be simply integrated with existing sequential recommendation models by a confidence fusion network and further gain better user representation. The offline experimental results based on real-world E-commerce data demonstrate the effectiveness and verify the importance of unclicked items in sequential recommendation. Moreover we deploy our new model (named XDM) into EBR of recommender system at Taobao, outperforming the deployed previous generation SDM.

preprint2020arXiv

ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation

Recommender system (RS) devotes to predicting user preference to a given item and has been widely deployed in most web-scale applications. Recently, knowledge graph (KG) attracts much attention in RS due to its abundant connective information. Existing methods either explore independent meta-paths for user-item pairs over KG, or employ graph neural network (GNN) on whole KG to produce representations for users and items separately. Despite effectiveness, the former type of methods fails to fully capture structural information implied in KG, while the latter ignores the mutual effect between target user and item during the embedding propagation. In this work, we propose a new framework named Adaptive Target-Behavior Relational Graph network (ATBRG for short) to effectively capture structural relations of target user-item pairs over KG. Specifically, to associate the given target item with user behaviors over KG, we propose the graph connect and graph prune techniques to construct adaptive target-behavior relational graph. To fully distill structural information from the sub-graph connected by rich relations in an end-to-end fashion, we elaborate on the model design of ATBRG, equipped with relation-aware extractor layer and representation activation layer. We perform extensive experiments on both industrial and benchmark datasets. Empirical results show that ATBRG consistently and significantly outperforms state-of-the-art methods. Moreover, ATBRG has also achieved a performance improvement of 5.1% on CTR metric after successful deployment in one popular recommendation scenario of Taobao APP.

preprint2020arXiv

Entire Space Multi-Task Modeling via Post-Click Behavior Decomposition for Conversion Rate Prediction

Recommender system, as an essential part of modern e-commerce, consists of two fundamental modules, namely Click-Through Rate (CTR) and Conversion Rate (CVR) prediction. While CVR has a direct impact on the purchasing volume, its prediction is well-known challenging due to the Sample Selection Bias (SSB) and Data Sparsity (DS) issues. Although existing methods, typically built on the user sequential behavior path ``impression$\to$click$\to$purchase'', is effective for dealing with SSB issue, they still struggle to address the DS issue due to rare purchase training samples. Observing that users always take several purchase-related actions after clicking, we propose a novel idea of post-click behavior decomposition. Specifically, disjoint purchase-related Deterministic Action (DAction) and Other Action (OAction) are inserted between click and purchase in parallel, forming a novel user sequential behavior graph ``impression$\to$click$\to$D(O)Action$\to$purchase''. Defining model on this graph enables to leverage all the impression samples over the entire space and extra abundant supervised signals from D(O)Action, which will effectively address the SSB and DS issues together. To this end, we devise a novel deep recommendation model named Elaborated Entire Space Supervised Multi-task Model ($ESM^{2}$). According to the conditional probability rule defined on the graph, it employs multi-task learning to predict some decomposed sub-targets in parallel and compose them sequentially to formulate the final CVR. Extensive experiments on both offline and online environments demonstrate the superiority of $ESM^{2}$ over state-of-the-art models. The source code and dataset will be released.

preprint2020arXiv

MTBRN: Multiplex Target-Behavior Relation Enhanced Network for Click-Through Rate Prediction

Click-through rate (CTR) prediction is a critical task for many industrial systems, such as display advertising and recommender systems. Recently, modeling user behavior sequences attracts much attention and shows great improvements in the CTR field. Existing works mainly exploit attention mechanism based on embedding product when considering relations between user behaviors and target item. However, this methodology lacks of concrete semantics and overlooks the underlying reasons driving a user to click on a target item. In this paper, we propose a new framework named Multiplex Target-Behavior Relation enhanced Network (MTBRN) to leverage multiplex relations between user behaviors and target item to enhance CTR prediction. Multiplex relations consist of meaningful semantics, which can bring a better understanding on users' interests from different perspectives. To explore and model multiplex relations, we propose to incorporate various graphs (e.g., knowledge graph and item-item similarity graph) to construct multiple relational paths between user behaviors and target item. Then Bi-LSTM is applied to encode each path in the path extractor layer. A path fusion network and a path activation network are devised to adaptively aggregate and finally learn the representation of all paths for CTR prediction. Extensive offline and online experiments clearly verify the effectiveness of our framework.

preprint2019arXiv

SDM: Sequential Deep Matching Model for Online Large-scale Recommender System

Capturing users' precise preferences is a fundamental problem in large-scale recommender system. Currently, item-based Collaborative Filtering (CF) methods are common matching approaches in industry. However, they are not effective to model dynamic and evolving preferences of users. In this paper, we propose a new sequential deep matching (SDM) model to capture users' dynamic preferences by combining short-term sessions and long-term behaviors. Compared with existing sequence-aware recommendation methods, we tackle the following two inherent problems in real-world applications: (1) there could exist multiple interest tendencies in one session. (2) long-term preferences may not be effectively fused with current session interests. Long-term behaviors are various and complex, hence those highly related to the short-term session should be kept for fusion. We propose to encode behavior sequences with two corresponding components: multi-head self-attention module to capture multiple types of interests and long-short term gated fusion module to incorporate long-term preferences. Successive items are recommended after matching between sequential user behavior vector and item embedding vectors. Offline experiments on real-world datasets show the superior performance of the proposed SDM. Moreover, SDM has been successfully deployed on online large-scale recommender system at Taobao and achieves improvements in terms of a range of commercial metrics.