Researcher profile

Dawei Yin

Dawei Yin contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
36works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

36 published item(s)

preprint2026arXiv

Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models

Recent advances in synergizing large reasoning models (LRMs) with retrieval-augmented generation (RAG) have shown promising results, yet two critical challenges remain: (1) reasoning models typically operate from a single, unchallenged perspective, limiting their ability to conduct deep, self-correcting reasoning over external documents, and (2) existing training paradigms rely excessively on outcome-oriented rewards, which provide insufficient signal for shaping the complex, multi-step reasoning process. To address these issues, we propose an Reasoner-Verifier framework named Adversarial Reasoning RAG (ARR). The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage that requires no external scoring model. This reward combines explicit observational signals with internal model uncertainty to jointly optimize reasoning fidelity and verification rigor. Experiments on multiple benchmarks demonstrate the effectiveness of our method.

preprint2026arXiv

Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search

Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (LLMs) to interleave reasoning with tool use. However, prevailing systems rely on monolithic agents that suffer from structural bottlenecks, including unconstrained reasoning outputs that inflate trajectories, sparse outcome-level rewards that complicate credit assignment, and stochastic search noise that destabilizes learning. To address these challenges, we propose \textbf{M-ASK} (Multi-Agent Search and Knowledge), a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context. This decomposition allows each agent to focus on a well-defined subtask and reduces interference between search and context construction. Furthermore, to enable stable coordination, M-ASK employs turn-level rewards to provide granular supervision for both search decisions and knowledge updates. Experiments on multi-hop QA benchmarks demonstrate that M-ASK outperforms strong baselines, achieving not only superior answer accuracy but also significantly more stable training dynamics.\footnote{The source code for M-ASK is available at https://github.com/chenyiqun/M-ASK.}

preprint2026arXiv

EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

Extending the context window of large language models typically requires training on sequences at the target length, incurring quadratic memory and computational costs that make long-context adaptation expensive and difficult to reproduce. We propose EndPrompt, a method that achieves effective context extension using only short training sequences. The core insight is that exposing a model to long-range relative positional distances does not require constructing full-length inputs: we preserve the original short context as an intact first segment and append a brief terminal prompt as a second segment, assigning it positional indices near the target context length. This two-segment construction introduces both local and long-range relative distances within a short physical sequence while maintaining the semantic continuity of the training text--a property absent in chunk-based simulation approaches that split contiguous context. We provide a theoretical analysis grounded in Rotary Position Embedding and the Bernstein inequality, showing that position interpolation induces a rigorous smoothness constraint over the attention function, with shared Transformer parameters further suppressing unstable extrapolation to unobserved intermediate distances. Applied to LLaMA-family models extending the context window from 8K to 64K, EndPrompt achieves an average RULER score of 76.03 and the highest average on LongBench, surpassing LCEG (72.24), LongLoRA (72.95), and full-length fine-tuning (69.23) while requiring substantially less computation. These results demonstrate that long-context generalization can be induced from sparse positional supervision, challenging the prevailing assumption that dense long-sequence training is necessary for reliable context-window extension. The code is available at https://github.com/clx1415926/EndPrompt.

preprint2026arXiv

FlexSpec: Frozen Drafts Meet Evolving Targets in Edge-Cloud Collaborative LLM Speculative Decoding

Deploying large language models (LLMs) in mobile and edge computing environments is constrained by limited on-device resources, scarce wireless bandwidth, and frequent model evolution. Although edge-cloud collaborative inference with speculative decoding (SD) can reduce end-to-end latency by executing a lightweight draft model at the edge and verifying it with a cloud-side target model, existing frameworks fundamentally rely on tight coupling between the two models. Consequently, repeated model synchronization introduces excessive communication overhead, increasing end-to-end latency, and ultimately limiting the scalability of SD in edge environments. To address these limitations, we propose FlexSpec, a communication-efficient collaborative inference framework tailored for evolving edge-cloud systems. The core design of FlexSpec is a shared-backbone architecture that allows a single and static edge-side draft model to remain compatible with a large family of evolving cloud-side target models. By decoupling edge deployment from cloud-side model updates, FlexSpec eliminates the need for edge-side retraining or repeated model downloads, substantially reducing communication and maintenance costs. Furthermore, to accommodate time-varying wireless conditions and heterogeneous device constraints, we develop a channel-aware adaptive speculation mechanism that dynamically adjusts the speculative draft length based on real-time channel state information and device energy budgets. Extensive experiments demonstrate that FlexSpec achieves superior performance compared to conventional SD approaches in terms of inference efficiency.

preprint2026arXiv

MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory. This coarse-grained credit assignment fails to distinguish effective tool calls from redundant or erroneous ones, particularly in long-horizon multi-turn scenarios. To address this, we propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation. Specifically, we formulate credit assignment as a bipartite matching problem between predicted and ground-truth traces, utilizing two assignment strategies to derive dense turn-level rewards. Furthermore, to balance local step precision with global task success, we introduce a dual-level advantage estimation scheme that integrates turn-level and trajectory-level signals, assigning distinct advantage values to individual interaction turns. Extensive experiments on three benchmarks demonstrate the superiority of MatchTIR. Notably, our 4B model surpasses the majority of 8B competitors, particularly in long-horizon and multi-turn tasks. Our codes are available at https://github.com/quchangle1/MatchTIR.

preprint2026arXiv

Measuring Maximum Activations in Open Large Language Models

The dynamic range of activations is a first-order constraint for low-bit quantization, activation scaling, and stable LLM inference. Prior work characterized outlier features and massive activations on pre-2024 LLaMA-style models, and the downstream activation-quantization stack inherits that picture without revisiting it for the post-LLaMA open-model boom. We ask the deployment-oriented question: how large can activations get in modern open LLMs, and how does this magnitude vary across families, generations, and training stages? Under a unified pipeline (5,000-sample multi-domain corpus, family-specific tokenization, identical hooks across embeddings, hidden states, attention, MLP/MoE, SwiGLU gates, and final norm), we measure global and layerwise maxima on 27 checkpoints from 8 open families spanning dense, MoE, vision-language, intermediate-training, and instruction-tuned variants. We find that (i) global maxima span over nearly four orders of magnitude at comparable parameter counts, with Qwen3.5 and MoE checkpoints in the 10^2 to 10^3 range and Gemma3-27B-it reaching ~7 x 10^5; (ii) cross-family and cross-generation comparisons break simple monotonic scaling; and (iii) MoE checkpoints exhibit 14.0-23.4x lower peaks than matched-scale dense counterparts, while the residual stream carries the global maximum in 22/24 checkpoints. A lightweight INT-8 sanity check shows that measured maxima co-vary with low-bit reconstruction error via activation-scale selection. We conclude that maximum activation magnitude is a model property tied to family, architecture, and training stage - not a simple byproduct of size - and should be measured and reported alongside any open-weight release before low-bit deployment. The code is publicly available at https://github.com/clx1415926/Max_act_llm.

preprint2026arXiv

QianfanHuijin Technical Report: A Novel Multi-Stage Training Paradigm for Finance Industrial LLMs

Domain-specific enhancement of Large Language Models (LLMs) within the financial context has long been a focal point of industrial application. While previous models such as BloombergGPT and Baichuan-Finance primarily focused on knowledge enhancement, the deepening complexity of financial services has driven a growing demand for models that possess not only domain knowledge but also robust financial reasoning and agentic capabilities. In this paper, we present QianfanHuijin, a financial domain LLM, and propose a generalizable multi-stage training paradigm for industrial model enhancement. Our approach begins with Continual Pre-training (CPT) on financial corpora to consolidate the knowledge base. This is followed by a fine-grained Post-training pipeline designed with increasing specificity: starting with Financial SFT, progressing to Finance Reasoning RL and Finance Agentic RL, and culminating in General RL aligned with real-world business scenarios. Empirical results demonstrate that QianfanHuijin achieves superior performance across various authoritative financial benchmarks. Furthermore, ablation studies confirm that the targeted Reasoning RL and Agentic RL stages yield significant gains in their respective capabilities. These findings validate our motivation and suggest that this fine-grained, progressive post-training methodology is poised to become a mainstream paradigm for various industrial-enhanced LLMs.

preprint2026arXiv

RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search

In commercial web search, aligning content freshness with user intent remains challenging due to the highly varied lifespans of information. Traditional industrial approaches rely on static time-window filtering, resulting in "one-size-fits-all" rankings where content may be chronologically recent but semantically expired. To address the limitation, we present a novel Large Language Models (LLMs)-based Query-Aware Dynamic Content Expiration Prediction Framework deployed in Baidu search, reformulating timeliness as a dynamic validity inference task. Our framework extracts fine-grained temporal contexts from documents and leverages LLMs to deduce a query-specific "validity horizon"-a semantic boundary defining when information becomes obsolete based on user intent. Integrated with robust hallucination mitigation strategies to ensure reliability, our approach has been evaluated through offline and online A/B testing on live production traffic. Results demonstrate significant improvements in search freshness and user experience metrics, validating the effectiveness of LLM-driven reasoning for solving semantic expiration at an industrial scale.

preprint2026arXiv

Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime

Agentic reinforcement learning (RL) for software engineering spends much of its compute on stateful trajectories whose grouped binary rewards are highly skewed and weakly contrastive. We frame this as pass-rate control and show that the binary reward-side signal is strongest near a 50% rollout pass rate under four criteria: reward entropy, group-filtering survival, leave-one-out (RLOO) advantage energy under Group Relative Policy Optimization (GRPO), and success-failure pair count. We propose Prefix Sampling (PS), which replays self-generated trajectory prefixes to steer skewed groups toward this regime: successful prefixes give mostly failing groups a head start, while failing prefixes handicap mostly passing groups. Replayed states are reconstructed through the existing rollout path, and replayed tokens are masked from the loss so optimization applies only to current-policy continuations. On SWE-bench Verified, PS reaches the baseline high-score regime within evaluation variability while delivering 2.01x and 1.55x end-to-end wall-clock speedups on Qwen3-14B and Qwen3-32B; the 14B peak improves from 0.274 to 0.295. AIME 2025 experiments on 4B and 8B show the same pass-rate-control pattern, and 4B ablations attribute gains to replay, bidirectional coverage, and adaptive control.

preprint2024arXiv

LLMRec: Large Language Models with Graph Augmentation for Recommendation

The problem of data sparsity has long been a challenge in recommendation systems, and previous studies have attempted to address this issue by incorporating side information. However, this approach often introduces side effects such as noise, availability issues, and low data quality, which in turn hinder the accurate modeling of user preferences and adversely impact recommendation performance. In light of the recent advancements in large language models (LLMs), which possess extensive knowledge bases and strong reasoning capabilities, we propose a novel framework called LLMRec that enhances recommender systems by employing three simple yet effective LLM-based graph augmentation strategies. Our approach leverages the rich content available within online platforms (e.g., Netflix, MovieLens) to augment the interaction graph in three ways: (i) reinforcing user-item interaction egde, (ii) enhancing the understanding of item node attributes, and (iii) conducting user node profiling, intuitively from the natural language perspective. By employing these strategies, we address the challenges posed by sparse implicit feedback and low-quality side information in recommenders. Besides, to ensure the quality of the augmentation, we develop a denoised data robustification mechanism that includes techniques of noisy implicit feedback pruning and MAE-based feature enhancement that help refine the augmented data and improve its reliability. Furthermore, we provide theoretical analysis to support the effectiveness of LLMRec and clarify the benefits of our method in facilitating model optimization. Experimental results on benchmark datasets demonstrate the superiority of our LLM-based augmentation approach over state-of-the-art techniques. To ensure reproducibility, we have made our code and augmented data publicly available at: https://github.com/HKUDS/LLMRec.git

preprint2024arXiv

Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks

Text-video retrieval is a challenging task that aims to identify relevant videos given textual queries. Compared to conventional textual retrieval, the main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content. Previous works primarily focus on aligning the query and the video by finely aggregating word-frame matching signals. Inspired by the human cognitive process of modularly judging the relevance between text and video, the judgment needs high-order matching signal due to the consecutive and complex nature of video contents. In this paper, we propose chunk-level text-video matching, where the query chunks are extracted to describe a specific retrieval unit, and the video chunks are segmented into distinct clips from videos. We formulate the chunk-level matching as n-ary correlations modeling between words of the query and frames of the video and introduce a multi-modal hypergraph for n-ary correlation modeling. By representing textual units and video frames as nodes and using hyperedges to depict their relationships, a multi-modal hypergraph is constructed. In this way, the query and the video can be aligned in a high-order semantic space. In addition, to enhance the model's generalization ability, the extracted features are fed into a variational inference component for computation, obtaining the variational representation under the Gaussian distribution. The incorporation of hypergraphs and variational inference allows our model to capture complex, n-ary interactions among textual and visual contents. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on the text-video retrieval task.

preprint2023arXiv

Feature-Level Debiased Natural Language Understanding

Natural language understanding (NLU) models often rely on dataset biases rather than intended task-relevant features to achieve high performance on specific datasets. As a result, these models perform poorly on datasets outside the training distribution. Some recent studies address this issue by reducing the weights of biased samples during the training process. However, these methods still encode biased latent features in representations and neglect the dynamic nature of bias, which hinders model prediction. We propose an NLU debiasing method, named debiasing contrastive learning (DCT), to simultaneously alleviate the above problems based on contrastive learning. We devise a debiasing, positive sampling strategy to mitigate biased latent features by selecting the least similar biased positive samples. We also propose a dynamic negative sampling strategy to capture the dynamic influence of biases by employing a bias-only model to dynamically select the most similar biased negative samples. We conduct experiments on three NLU benchmark datasets. Experimental results show that DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance. We also verify that DCT can reduce biased latent features from the model's representation.

preprint2022arXiv

Approximated Doubly Robust Search Relevance Estimation

Extracting query-document relevance from the sparse, biased clickthrough log is among the most fundamental tasks in the web search system. Prior art mainly learns a relevance judgment model with semantic features of the query and document and ignores directly counterfactual relevance evaluation from the clicking log. Though the learned semantic matching models can provide relevance signals for tail queries as long as the semantic feature is available. However, such a paradigm lacks the capability to introspectively adjust the biased relevance estimation whenever it conflicts with massive implicit user feedback. The counterfactual evaluation methods, on the contrary, ensure unbiased relevance estimation with sufficient click information. However, they suffer from the sparse or even missing clicks caused by the long-tailed query distribution. In this paper, we propose to unify the counterfactual evaluating and learning approaches for unbiased relevance estimation on search queries with various popularities. Specifically, we theoretically develop a doubly robust estimator with low bias and variance, which intentionally combines the benefits of existing relevance evaluating and learning approaches. We further instantiate the proposed unbiased relevance estimation framework in Baidu search, with comprehensive practical solutions designed regarding the data pipeline for click behavior tracking and online relevance estimation with an approximated deep neural network. Finally, we present extensive empirical evaluations to verify the effectiveness of our proposed framework, finding that it is robust in practice and manages to improve online ranking performance substantially.

preprint2022arXiv

Contrastive Meta Learning with Behavior Multiplicity for Recommendation

A well-informed recommendation framework could not only help users identify their interested items, but also benefit the revenue of various online platforms (e.g., e-commerce, social media). Traditional recommendation models usually assume that only a single type of interaction exists between user and item, and fail to model the multiplex user-item relationships from multi-typed user behavior data, such as page view, add-to-favourite and purchase. While some recent studies propose to capture the dependencies across different types of behaviors, two important challenges have been less explored: i) Dealing with the sparse supervision signal under target behaviors (e.g., purchase). ii) Capturing the personalized multi-behavior patterns with customized dependency modeling. To tackle the above challenges, we devise a new model CML, Contrastive Meta Learning (CML), to maintain dedicated cross-type behavior dependency for different users. In particular, we propose a multi-behavior contrastive learning framework to distill transferable knowledge across different types of behaviors via the constructed contrastive loss. In addition, to capture the diverse multi-behavior patterns, we design a contrastive meta network to encode the customized behavior heterogeneity for different users. Extensive experiments on three real-world datasets indicate that our method consistently outperforms various state-of-the-art recommendation methods. Our empirical studies further suggest that the contrastive meta learning paradigm offers great potential for capturing the behavior multiplicity in recommendation. We release our model implementation at: https://github.com/weiwei1206/CML.git.

preprint2022arXiv

Enhanced Doubly Robust Learning for Debiasing Post-click Conversion Rate Estimation

Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems. However, accurately estimating the post-click conversion rate (CVR) is challenging due to the selection bias, i.e., the observed clicked events usually happen on users' preferred items. Currently, most existing methods utilize counterfactual learning to debias recommender systems. Among them, the doubly robust (DR) estimator has achieved competitive performance by combining the error imputation based (EIB) estimator and the inverse propensity score (IPS) estimator in a doubly robust way. However, inaccurate error imputation may result in its higher variance than the IPS estimator. Worse still, existing methods typically use simple model-agnostic methods to estimate the imputation error, which are not sufficient to approximate the dynamically changing model-correlated target (i.e., the gradient direction of the prediction model). To solve these problems, we first derive the bias and variance of the DR estimator. Based on it, a more robust doubly robust (MRDR) estimator has been proposed to further reduce its variance while retaining its double robustness. Moreover, we propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation. Besides, we empirically verify that the proposed learning scheme can further eliminate the high variance problem of the imputation learning. To evaluate its effectiveness, extensive experiments are conducted on a semi-synthetic dataset and two real-world datasets. The results demonstrate the superiority of the proposed approach over the state-of-the-art methods. The code is available at https://github.com/guosyjlu/MRDR-DL.

preprint2022arXiv

ERNIE-Search: Bridging Cross-Encoder with Dual-Encoder via Self On-the-fly Distillation for Dense Passage Retrieval

Neural retrievers based on pre-trained language models (PLMs), such as dual-encoders, have achieved promising performance on the task of open-domain question answering (QA). Their effectiveness can further reach new state-of-the-arts by incorporating cross-architecture knowledge distillation. However, most of the existing studies just directly apply conventional distillation methods. They fail to consider the particular situation where the teacher and student have different structures. In this paper, we propose a novel distillation method that significantly advances cross-architecture distillation for dual-encoders. Our method 1) introduces a self on-the-fly distillation method that can effectively distill late interaction (i.e., ColBERT) to vanilla dual-encoder, and 2) incorporates a cascade distillation process to further improve the performance with a cross-encoder teacher. Extensive experiments are conducted to validate that our proposed solution outperforms strong baselines and establish a new state-of-the-art on open-domain QA benchmarks.

preprint2022arXiv

Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation

Expanding an existing tourist photo from a partially captured scene to a full scene is one of the desired experiences for photography applications. Although photo extrapolation has been well studied, it is much more challenging to extrapolate a photo (i.e., selfie) from a narrow field of view to a wider one while maintaining a similar visual style. In this paper, we propose a factorized neural re-rendering model to produce photorealistic novel views from cluttered outdoor Internet photo collections, which enables the applications including controllable scene re-rendering, photo extrapolation and even extrapolated 3D photo generation. Specifically, we first develop a novel factorized re-rendering pipeline to handle the ambiguity in the decomposition of geometry, appearance and illumination. We also propose a composited training strategy to tackle the unexpected occlusion in Internet images. Moreover, to enhance photo-realism when extrapolating tourist photographs, we propose a novel realism augmentation process to complement appearance details, which automatically propagates the texture details from a narrow captured photo to the extrapolated neural rendered image. The experiments and photo editing examples on outdoor scenes demonstrate the superior performance of our proposed method in both photo-realism and downstream applications.

preprint2022arXiv

Geometry Contrastive Learning on Heterogeneous Graphs

Self-supervised learning (especially contrastive learning) methods on heterogeneous graphs can effectively get rid of the dependence on supervisory data. Meanwhile, most existing representation learning methods embed the heterogeneous graphs into a single geometric space, either Euclidean or hyperbolic. This kind of single geometric view is usually not enough to observe the complete picture of heterogeneous graphs due to their rich semantics and complex structures. Under these observations, this paper proposes a novel self-supervised learning method, termed as Geometry Contrastive Learning (GCL), to better represent the heterogeneous graphs when supervisory data is unavailable. GCL views a heterogeneous graph from Euclidean and hyperbolic perspective simultaneously, aiming to make a strong merger of the ability of modeling rich semantics and complex structures, which is expected to bring in more benefits for downstream tasks. GCL maximizes the mutual information between two geometric views by contrasting representations at both local-local and local-global semantic levels. Extensive experiments on four benchmarks data sets show that the proposed approach outperforms the strong baselines, including both unsupervised methods and supervised methods, on three tasks, including node classification, node clustering and similarity search.

preprint2022arXiv

Gumble Softmax For User Behavior Modeling

Recently, sequential recommendation systems are important in solving the information overload in many online services. Current methods in sequential recommendation focus on learning a fixed number of representations for each user at any time, with a single representation or multi representations for the user. However, when a user is exploring items on an e-commerce recommendation system, the number of this user's hobbies may change overtime (e.g. increase/reduce one more interest), affected by the user's evolving self needs. Moreover, different users may have various number of interests. In this paper, we argue that it is meaningful to explore a personalized dynamic number of user interests, and learn a dynamic group of user interest representations accordingly. We propose a sequential model with dynamic number of representations for recommendation systems (RDRSR). Specifically, RDRSR is composed of a dynamic interest discriminator (DID) module and a dynamic interest allocator (DIA) module. The DID module explores the number of a user's interests by learning the overall sequential characteristics with bi-directional self-attention and Gumbel-Softmax. The DIA module make the historical clicked items into a group of item groups and constructs user's dynamic interest representation. Additionally, experiments on the real-world datasets demonstrates our model's effectiveness.

preprint2022arXiv

Hypergraph Contrastive Collaborative Filtering

Collaborative Filtering (CF) has emerged as fundamental paradigms for parameterizing users and items into latent representation space, with their correlative patterns from interaction data. Among various CF techniques, the development of GNN-based recommender systems, e.g., PinSage and LightGCN, has offered the state-of-the-art performance. However, two key challenges have not been well explored in existing solutions: i) The over-smoothing effect with deeper graph-based CF architecture, may cause the indistinguishable user representations and degradation of recommendation results. ii) The supervision signals (i.e., user-item interactions) are usually scarce and skewed distributed in reality, which limits the representation power of CF paradigms. To tackle these challenges, we propose a new self-supervised recommendation framework Hypergraph Contrastive Collaborative Filtering (HCCF) to jointly capture local and global collaborative relations with a hypergraph-enhanced cross-view contrastive learning architecture. In particular, the designed hypergraph structure learning enhances the discrimination ability of GNN-based CF paradigm, so as to comprehensively capture the complex high-order dependencies among users. Additionally, our HCCF model effectively integrates the hypergraph structure encoding with self-supervised learning to reinforce the representation quality of recommender systems, based on the hypergraph-enhanced self-discrimination. Extensive experiments on three benchmark datasets demonstrate the superiority of our model over various state-of-the-art recommendation methods, and the robustness against sparse user interaction data. Our model implementation codes are available at https://github.com/akaxlh/HCCF.

preprint2022arXiv

Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking

Passage re-ranking is to obtain a permutation over the candidate passage set from retrieval stage. Re-rankers have been boomed by Pre-trained Language Models (PLMs) due to their overwhelming advantages in natural language understanding. However, existing PLM based re-rankers may easily suffer from vocabulary mismatch and lack of domain specific knowledge. To alleviate these problems, explicit knowledge contained in knowledge graph is carefully introduced in our work. Specifically, we employ the existing knowledge graph which is incomplete and noisy, and first apply it in passage re-ranking task. To leverage a reliable knowledge, we propose a novel knowledge graph distillation method and obtain a knowledge meta graph as the bridge between query and passage. To align both kinds of embedding in the latent space, we employ PLM as text encoder and graph neural network over knowledge meta graph as knowledge encoder. Besides, a novel knowledge injector is designed for the dynamic interaction between text and knowledge encoder. Experimental results demonstrate the effectiveness of our method especially in queries requiring in-depth domain knowledge.

preprint2022arXiv

On Length Divergence Bias in Textual Matching Models

Despite the remarkable success deep models have achieved in Textual Matching (TM) tasks, it still remains unclear whether they truly understand language or measure the semantic similarity of texts by exploiting statistical bias in datasets. In this work, we provide a new perspective to study this issue -- via the length divergence bias. We find the length divergence heuristic widely exists in prevalent TM datasets, providing direct cues for prediction. To determine whether TM models have adopted such heuristic, we introduce an adversarial evaluation scheme which invalidates the heuristic. In this adversarial setting, all TM models perform worse, indicating they have indeed adopted this heuristic. Through a well-designed probing experiment, we empirically validate that the bias of TM models can be attributed in part to extracting the text length information during training. To alleviate the length divergence bias, we propose an adversarial training method. The results demonstrate we successfully improve the robustness and generalization ability of models at the same time.

preprint2022arXiv

Sequential Recommendation with User Evolving Preference Decomposition

Modeling user sequential behaviors has recently attracted increasing attention in the recommendation domain. Existing methods mostly assume coherent preference in the same sequence. However, user personalities are volatile and easily changed, and there can be multiple mixed preferences underlying user behaviors. To solve this problem, in this paper, we propose a novel sequential recommender model via decomposing and modeling user independent preferences. To achieve this goal, we highlight three practical challenges considering the inconsistent, evolving and uneven nature of the user behavior, which are seldom noticed by the previous work. For overcoming these challenges in a unified framework, we introduce a reinforcement learning module to simulate the evolution of user preference. More specifically, the action aims to allocate each item into a sub-sequence or create a new one according to how the previous items are decomposed as well as the time interval between successive behaviors. The reward is associated with the final loss of the learning objective, aiming to generate sub-sequences which can better fit the training data. We conduct extensive experiments based on six real-world datasets across different domains. Compared with the state-of-the-art methods, empirical studies manifest that our model can on average improve the performance by about 8.21%, 10.08%, 10.32%, and 9.82% on the metrics of Precision, Recall, NDCG and MRR, respectively.

preprint2022arXiv

User behavior understanding in real world settings

How to extract meaningful information in user historical behavior plays a crucial role in recommendation. User behavior sequence often contains multiple conceptually distinct items that belong to different item groups and the number of the item groups is changing over time. It is necessary to learn a dynamic group of representations according the item groups in a user historical behavior. However, current works only learns a predefined and fixed number representations which includes single representation methods and multi representations methods from the user context that could lead to suboptimal recommendation quality. In this paper we propose a model that can automatically and adaptively generates a dynamic group of representations from the user behavior accordingly. To be specific, AutoRep is composed of an informative representation construct (IRC) module and a dynamic representations construct (DRC) module. The IRC module learns the overall sequential characteristics of user behavior with a bi-directional architecture transformer. The DRC module dynamically allocate the item in the user behavior into different item groups and form a dynamic group of representations in a differentiable method. Such design improves the model recommendation performance. We evaluate the proposed model on five benchmark datasets. The results show that AutoRep outperforms representative baselines. Further ablation study has been conducted to deepen our understandings of AutoRep, including the proposed module IRC and DRC.

preprint2021arXiv

SceneRec: Scene-Based Graph Neural Networks for Recommender Systems

Collaborative filtering has been largely used to advance modern recommender systems to predict user preference. A key component in collaborative filtering is representation learning, which aims to project users and items into a low dimensional space to capture collaborative signals. However, the scene information, which has effectively guided many recommendation tasks, is rarely considered in existing collaborative filtering methods. To bridge this gap, we focus on scene-based collaborative recommendation and propose a novel representation model SceneRec. SceneRec formally defines a scene as a set of pre-defined item categories that occur simultaneously in real-life situations and creatively designs an item-category-scene hierarchical structure to build a scene-based graph. In the scene-based graph, we adopt graph neural networks to learn scene-specific representation on each item node, which is further aggregated with latent representation learned from collaborative interactions to make recommendations. We perform extensive experiments on real-world E-commerce datasets and the results demonstrate the effectiveness of the proposed method.

preprint2021arXiv

User-Inspired Posterior Network for Recommendation Reason Generation

Recommendation reason generation, aiming at showing the selling points of products for customers, plays a vital role in attracting customers' attention as well as improving user experience. A simple and effective way is to extract keywords directly from the knowledge-base of products, i.e., attributes or title, as the recommendation reason. However, generating recommendation reason from product knowledge doesn't naturally respond to users' interests. Fortunately, on some E-commerce websites, there exists more and more user-generated content (user-content for short), i.e., product question-answering (QA) discussions, which reflect user-cared aspects. Therefore, in this paper, we consider generating the recommendation reason by taking into account not only the product attributes but also the customer-generated product QA discussions. In reality, adequate user-content is only possible for the most popular commodities, whereas large sums of long-tail products or new products cannot gather a sufficient number of user-content. To tackle this problem, we propose a user-inspired multi-source posterior transformer (MSPT), which induces the model reflecting the users' interests with a posterior multiple QA discussions module, and generating recommendation reasons containing the product attributes as well as the user-cared aspects. Experimental results show that our model is superior to traditional generative models. Additionally, the analysis also shows that our model can focus more on the user-cared aspects than baselines.

preprint2020arXiv

Adaptive Parameterization for Neural Dialogue Generation

Neural conversation systems generate responses based on the sequence-to-sequence (SEQ2SEQ) paradigm. Typically, the model is equipped with a single set of learned parameters to generate responses for given input contexts. When confronting diverse conversations, its adaptability is rather limited and the model is hence prone to generate generic responses. In this work, we propose an {\bf Ada}ptive {\bf N}eural {\bf D}ialogue generation model, \textsc{AdaND}, which manages various conversations with conversation-specific parameterization. For each conversation, the model generates parameters of the encoder-decoder by referring to the input context. In particular, we propose two adaptive parameterization mechanisms: a context-aware and a topic-aware parameterization mechanism. The context-aware parameterization directly generates the parameters by capturing local semantics of the given context. The topic-aware parameterization enables parameter sharing among conversations with similar topics by first inferring the latent topics of the given context and then generating the parameters with respect to the distributional topics. Extensive experiments conducted on a large-scale real-world conversational dataset show that our model achieves superior performance in terms of both quantitative metrics and human evaluations.

preprint2020arXiv

CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

We study the problem of applying spectral clustering to cluster multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity matrix that reflects the proximity of objects. For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apart while those of a dense cluster have to be sufficiently close. Following [16], we solve the problem of spectral clustering on multi-scale data by integrating the concept of objects' "reachability similarity" with a given distance-based similarity to derive an objects' coefficient matrix. We propose the algorithm CAST that applies trace Lasso to regularize the coefficient matrix. We prove that the resulting coefficient matrix has the "grouping effect" and that it exhibits "sparsity". We show that these two characteristics imply very effective spectral clustering. We evaluate CAST and 10 other clustering methods on a wide range of datasets w.r.t. various measures. Experimental results show that CAST provides excellent performance and is highly robust across test cases of multi-scale data.

preprint2020arXiv

Data Manipulation: Towards Effective Instance Learning for Neural Dialogue Generation via Learning to Augment and Reweight

Current state-of-the-art neural dialogue models learn from human conversations following the data-driven paradigm. As such, a reliable training corpus is the crux of building a robust and well-behaved dialogue model. However, due to the open-ended nature of human conversations, the quality of user-generated training data varies greatly, and effective training samples are typically insufficient while noisy samples frequently appear. This impedes the learning of those data-driven neural dialogue models. Therefore, effective dialogue learning requires not only more reliable learning samples, but also fewer noisy samples. In this paper, we propose a data manipulation framework to proactively reshape the data distribution towards reliable samples by augmenting and highlighting effective learning samples as well as reducing the effect of inefficient samples simultaneously. In particular, the data manipulation model selectively augments the training samples and assigns an importance weight to each instance to reform the training data. Note that, the proposed data manipulation framework is fully data-driven and learnable. It not only manipulates training samples to optimize the dialogue generation model, but also learns to increase its manipulation skills through gradient descent with validation samples. Extensive experiments show that our framework can improve the dialogue generation performance with respect to various automatic evaluation metrics and human judgments.

preprint2020arXiv

Deep reinforcement learning for search, recommendation, and online advertising: a survey

Search, recommendation, and online advertising are the three most important information-providing mechanisms on the web. These information seeking techniques, satisfying users' information needs by suggesting users personalized objects (information or services) at the appropriate time and place, play a crucial role in mitigating the information overload problem. With recent great advances in deep reinforcement learning (DRL), there have been increasing interests in developing DRL based information seeking techniques. These DRL based techniques have two key advantages -- (1) they are able to continuously update information seeking strategies according to users' real-time feedback, and (2) they can maximize the expected cumulative long-term reward from users where reward has different definitions according to information seeking applications such as click-through rate, revenue, user satisfaction and engagement. In this paper, we give an overview of deep reinforcement learning for search, recommendation, and online advertising from methodologies to applications, review representative algorithms, and discuss some appealing research directions.

preprint2020arXiv

Learning from Easy to Complex: Adaptive Multi-curricula Learning for Neural Dialogue Generation

Current state-of-the-art neural dialogue systems are mainly data-driven and are trained on human-generated responses. However, due to the subjectivity and open-ended nature of human conversations, the complexity of training dialogues varies greatly. The noise and uneven complexity of query-response pairs impede the learning efficiency and effects of the neural dialogue generation models. What is more, so far, there are no unified dialogue complexity measurements, and the dialogue complexity embodies multiple aspects of attributes---specificity, repetitiveness, relevance, etc. Inspired by human behaviors of learning to converse, where children learn from easy dialogues to complex ones and dynamically adjust their learning progress, in this paper, we first analyze five dialogue attributes to measure the dialogue complexity in multiple perspectives on three publicly available corpora. Then, we propose an adaptive multi-curricula learning framework to schedule a committee of the organized curricula. The framework is established upon the reinforcement learning paradigm, which automatically chooses different curricula at the evolving learning process according to the learning status of the neural dialogue generation model. Extensive experiments conducted on five state-of-the-art models demonstrate its learning efficiency and effectiveness with respect to 13 automatic evaluation metrics and human judgments.

preprint2020arXiv

Neural Interactive Collaborative Filtering

In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, i.e., recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we propose to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods.

preprint2020arXiv

Posterior-GAN: Towards Informative and Coherent Response Generation with Posterior Generative Adversarial Network

Neural conversational models learn to generate responses by taking into account the dialog history. These models are typically optimized over the query-response pairs with a maximum likelihood estimation objective. However, the query-response tuples are naturally loosely coupled, and there exist multiple responses that can respond to a given query, which leads the conversational model learning burdensome. Besides, the general dull response problem is even worsened when the model is confronted with meaningless response training instances. Intuitively, a high-quality response not only responds to the given query but also links up to the future conversations, in this paper, we leverage the query-response-future turn triples to induce the generated responses that consider both the given context and the future conversations. To facilitate the modeling of these triples, we further propose a novel encoder-decoder based generative adversarial learning framework, Posterior Generative Adversarial Network (Posterior-GAN), which consists of a forward and a backward generative discriminator to cooperatively encourage the generated response to be informative and coherent by two complementary assessment perspectives. Experimental results demonstrate that our method effectively boosts the informativeness and coherence of the generated response on both automatic and human evaluation, which verifies the advantages of considering two assessment perspectives.

preprint2020arXiv

Robust Reinforcement Learning with Wasserstein Constraint

Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics. Existing learning algorithms usually enable the robustness through disturbing the current state or simulating environmental parameters in a heuristic way, which lack quantified robustness to the system dynamics (i.e. transition probability). To overcome this issue, we leverage Wasserstein distance to measure the disturbance to the reference transition kernel. With Wasserstein distance, we are able to connect transition kernel disturbance to the state disturbance, i.e. reduce an infinite-dimensional optimization problem to a finite-dimensional risk-aware problem. Through the derived risk-aware optimal Bellman equation, we show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm--Wasserstein Robust Advantage Actor-Critic algorithm (WRAAC). The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.

preprint2020arXiv

Whole-Chain Recommendations

With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in developing RL-based recommender systems. In practical recommendation sessions, users will sequentially access multiple scenarios, such as the entrance pages and the item detail pages, and each scenario has its specific characteristics. However, the majority of existing RL-based recommender systems focus on optimizing one strategy for all scenarios or separately optimizing each strategy, which could lead to sub-optimal overall performance. In this paper, we study the recommendation problem with multiple (consecutive) scenarios, i.e., whole-chain recommendations. We propose a multi-agent RL-based approach (DeepChain), which can capture the sequential correlation among different scenarios and jointly optimize multiple recommendation strategies. To be specific, all recommender agents (RAs) share the same memory of users' historical behaviors, and they work collaboratively to maximize the overall reward of a session. Note that optimizing multiple recommendation strategies jointly faces two challenges in the existing model-free RL model - (i) it requires huge amounts of user behavior data, and (ii) the distribution of reward (users' feedback) are extremely unbalanced. In this paper, we introduce model-based RL techniques to reduce the training data requirement and execute more accurate strategy updates. The experimental results based on a real e-commerce platform demonstrate the effectiveness of the proposed framework.

preprint2018arXiv

A Survey on Dialogue Systems: Recent Advances and New Frontiers

Dialogue systems have attracted more and more attention. Recent advances on dialogue systems are overwhelmingly contributed by deep learning techniques, which have been employed to enhance a wide range of big data applications such as computer vision, natural language processing, and recommender systems. For dialogue systems, deep learning can leverage a massive amount of data to learn meaningful feature representations and response generation strategies, while requiring a minimum amount of hand-crafting. In this article, we give an overview to these recent advances on dialogue systems from various perspectives and discuss some possible research directions. In particular, we generally divide existing dialogue systems into task-oriented and non-task-oriented models, then detail how deep learning techniques help them with representative algorithms and finally discuss some appealing research directions that can bring the dialogue system research into a new frontier.