Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
35works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

35 published item(s)

preprint2026arXiv

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs

Reinforcement learning (RL) has achieved remarkable success in LLM reasoning, but whether it can also improve direct recall of parametric knowledge remains an open question. We study this question in a controlled zero-shot, one-hop, closed-book QA setting with no chain-of-thought, training only on binary correctness rewards and applying fact-level train-test deduplication to ensure gains reflect improved recall rather than reasoning or memorization. Across three model families and multiple factual QA benchmarks, RL yields ~27% average relative gains, surpassing both training- and inference-time baselines alike. Mechanistically, RL primarily redistributes probability mass over existing knowledge rather than acquiring new facts, moving correct answers from the low-probability tail into reliable greedy generations. Our data-attribution study reveals that the hardest examples are the most informative: those whose answers never appear in 128 pre-RL samples (only ~18% of training data) drive ~83% of the gain, since rare correct rollouts still emerge during training and get reinforced. Together, these findings broaden the role of RL beyond reasoning, repositioning it as a tool for unlocking rather than acquiring latent parametric knowledge.

preprint2026arXiv

The Evolution of Thought: Tracking LLM Overthinking via Reasoning Dynamics Analysis

Test-time scaling via explicit reasoning trajectories significantly boosts large language model (LLM) performance but often triggers overthinking. To explore this, we analyze reasoning through two lenses: Reasoning Length Dynamics, which reveals a compensatory trade-off between thinking and answer content length that eventually leads to thinking redundancy, and Reasoning Semantic Dynamics, which identifies semantic convergence and repetitive oscillations. These dynamics uncover an instance-specific Reasoning Completion Point (RCP), beyond which computation continues without further performance gain. Since the RCP varies across instances, we propose a Reasoning Completion Point Detector (RCPD), an inference-time early-exit method that identifies the RCP by monitoring the rank dynamics of termination tokens (e.g., </think>). Across AIME and GPQA benchmarks using Qwen3 and DeepSeek-R1, RCPD reduces token usage by up to 44% while preserving accuracy, offering a principled approach to efficient test-time scaling.

preprint2022arXiv

CHEX: CHannel EXploration for CNN Model Compression

Channel pruning has been broadly recognized as an effective technique to reduce the computation and memory cost of deep convolutional neural networks. However, conventional pruning methods have limitations in that: they are restricted to pruning process only, and they require a fully pre-trained large model. Such limitations may lead to sub-optimal model quality as well as excessive memory and training cost. In this paper, we propose a novel Channel Exploration methodology, dubbed as CHEX, to rectify these problems. As opposed to pruning-only strategy, we propose to repeatedly prune and regrow the channels throughout the training process, which reduces the risk of pruning important channels prematurely. More exactly: From intra-layer&#39;s aspect, we tackle the channel pruning problem via a well known column subset selection (CSS) formulation. From inter-layer&#39;s aspect, our regrowing stages open a path for dynamically re-allocating the number of channels across all the layers under a global channel sparsity constraint. In addition, all the exploration process is done in a single training from scratch without the need of a pre-trained large model. Experimental results demonstrate that CHEX can effectively reduce the FLOPs of diverse CNN architectures on a variety of computer vision tasks, including image classification, object detection, instance segmentation, and 3D vision. For example, our compressed ResNet-50 model on ImageNet dataset achieves 76% top1 accuracy with only 25% FLOPs of the original ResNet-50 model, outperforming previous state-of-the-art channel pruning methods. The checkpoints and code are available at here .

preprint2022arXiv

Debiasing Learning for Membership Inference Attacks Against Recommender Systems

Learned recommender systems may inadvertently leak information about their training data, leading to privacy violations. We investigate privacy threats faced by recommender systems through the lens of membership inference. In such attacks, an adversary aims to infer whether a user&#39;s data is used to train the target recommender. To achieve this, previous work has used a shadow recommender to derive training data for the attack model, and then predicts the membership by calculating difference vectors between users&#39; historical interactions and recommended items. State-of-the-art methods face two challenging problems: (1) training data for the attack model is biased due to the gap between shadow and target recommenders, and (2) hidden states in recommenders are not observational, resulting in inaccurate estimations of difference vectors. To address the above limitations, we propose a Debiasing Learning for Membership Inference Attacks against recommender systems (DL-MIA) framework that has four main components: (1) a difference vector generator, (2) a disentangled encoder, (3) a weight estimator, and (4) an attack model. To mitigate the gap between recommenders, a variational auto-encoder (VAE) based disentangled encoder is devised to identify recommender invariant and specific features. To reduce the estimation bias, we design a weight estimator, assigning a truth-level score for each difference vector to indicate estimation accuracy. We evaluate DL-MIA against both general recommenders and sequential recommenders on three real-world datasets. Experimental results show that DL-MIA effectively alleviates training and estimation biases simultaneously, and achieves state-of-the-art attack performance.

preprint2022arXiv

Effective Model Sparsification by Scheduled Grow-and-Prune Methods

Deep neural networks (DNNs) are effective in solving many real-world problems. Larger DNN models usually exhibit better quality (e.g., accuracy) but their excessive computation results in long inference time. Model sparsification can reduce the computation and memory cost while maintaining model quality. Most existing sparsification algorithms unidirectionally remove weights, while others randomly or greedily explore a small subset of weights in each layer for pruning. The limitations of these algorithms reduce the level of achievable sparsity. In addition, many algorithms still require pre-trained dense models and thus suffer from large memory footprint. In this paper, we propose a novel scheduled grow-and-prune (GaP) methodology without having to pre-train a dense model. It addresses the shortcomings of the previous works by repeatedly growing a subset of layers to dense and then pruning them back to sparse after some training. Experiments show that the models pruned using the proposed methods match or beat the quality of the highly optimized dense models at 80% sparsity on a variety of tasks, such as image classification, objective detection, 3D object part segmentation, and translation. They also outperform other state-of-the-art (SOTA) methods for model sparsification. As an example, a 90% non-uniform sparse ResNet-50 model obtained via GaP achieves 77.9% top-1 accuracy on ImageNet, improving the previous SOTA results by 1.5%. Code available at: https://github.com/boone891214/GaP.

preprint2022arXiv

Graph Neural Networks in Recommender Systems: A Survey

With the explosive growth of online information, recommender systems play a key role to alleviate such information overload. Due to the important application value of recommender systems, there have always been emerging works in this field. In recommender systems, the main challenge is to learn the effective user/item representations from their interactions and side information (if any). Recently, graph neural network (GNN) techniques have been widely utilized in recommender systems since most of the information in recommender systems essentially has graph structure and GNN has superiority in graph representation learning. This article aims to provide a comprehensive review of recent research efforts on GNN-based recommender systems. Specifically, we provide a taxonomy of GNN-based recommendation models according to the types of information used and recommendation tasks. Moreover, we systematically analyze the challenges of applying GNN on different types of data and discuss how existing works in this field address these challenges. Furthermore, we state new perspectives pertaining to the development of this field. We collect the representative papers along with their open-source implementations in https://github.com/wusw14/GNN-in-RS.

preprint2022arXiv

Neural Re-ranking in Multi-stage Recommender Systems: A Review

As the final stage of the multi-stage recommender system (MRS), re-ranking directly affects user experience and satisfaction by rearranging the input ranking lists, and thereby plays a critical role in MRS. With the advances in deep learning, neural re-ranking has become a trending topic and been widely applied in industrial applications. This review aims at integrating re-ranking algorithms into a broader picture, and paving ways for more comprehensive solutions for future research. For this purpose, we first present a taxonomy of current methods on neural re-ranking. Then we give a description of these methods along with the historic development according to their objectives. The network structure, personalization, and complexity are also discussed and compared. Next, we provide benchmarks of the major neural re-ranking models and quantitatively analyze their re-ranking performance. Finally, the review concludes with a discussion on future prospects of this field. A list of papers discussed in this review, the benchmark datasets, our re-ranking library LibRerank, and detailed parameter settings are publicly available at https://github.com/LibRerank-Community/LibRerank.

preprint2022arXiv

Recommendation Unlearning

Recommender systems provide essential web services by learning users&#39; personal preferences from collected data. However, in many cases, systems also need to forget some training data. From the perspective of privacy, several privacy regulations have recently been proposed, requiring systems to eliminate any impact of the data whose owner requests to forget. From the perspective of utility, if a system&#39;s utility is damaged by some bad data, the system needs to forget these data to regain utility. From the perspective of usability, users can delete noise and incorrect entries so that a system can provide more useful recommendations. While unlearning is very important, it has not been well-considered in existing recommender systems. Although there are some researches have studied the problem of machine unlearning in the domains of image and text data, existing methods can not been directly applied to recommendation as they are unable to consider the collaborative information. In this paper, we propose RecEraser, a general and efficient machine unlearning framework tailored to recommendation task. The main idea of RecEraser is to partition the training set into multiple shards and train a constituent model for each shard. Specifically, to keep the collaborative information of the data, we first design three novel data partition algorithms to divide training data into balanced groups based on their similarity. Then, considering that different shard models do not uniformly contribute to the final prediction, we further propose an adaptive aggregation method to improve the global model utility. Experimental results on three public benchmarks show that RecEraser can not only achieve efficient unlearning, but also outperform the state-of-the-art unlearning methods in terms of model utility. The source code can be found at https://github.com/chenchongthu/Recommendation-Unlearning

preprint2022arXiv

Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning

Weight pruning in deep neural networks (DNNs) can reduce storage and computation cost, but struggles to bring practical speedup to the model inference time. Tensor-cores can significantly boost the throughput of GPUs on dense computation, but exploiting tensor-cores for sparse DNNs is very challenging. Compared to existing CUDA-cores, tensor-cores require higher data reuse and matrix-shaped instruction granularity, both difficult to yield from sparse DNN kernels. Existing pruning approaches fail to balance the demands of accuracy and efficiency: random sparsity preserves the model quality well but prohibits tensor-core acceleration, while highly-structured block-wise sparsity can exploit tensor-cores but suffers from severe accuracy loss. In this work, we propose a novel sparse pattern, Shuffled Block-wise sparsity (Shfl-BW), designed to efficiently utilize tensor-cores while minimizing the constraints on the weight structure. Our insight is that row- and column-wise permutation provides abundant flexibility for the weight structure, while introduces negligible overheads using our GPU kernel designs. We optimize the GPU kernels for Shfl-BW in linear and convolution layers. Evaluations show that our techniques can achieve the state-of-the-art speed-accuracy trade-offs on GPUs. For example, with small accuracy loss, we can accelerate the computation-intensive layers of Transformer by 1.81, 4.18 and 1.90 times on NVIDIA V100, T4 and A100 GPUs respectively at 75% sparsity.

preprint2022arXiv

XDM: Improving Sequential Deep Matching with Unclicked User Behaviors for Recommender System

Deep learning-based sequential recommender systems have recently attracted increasing attention from both academia and industry. Most of industrial Embedding-Based Retrieval (EBR) system for recommendation share the similar ideas with sequential recommenders. Among them, how to comprehensively capture sequential user interest is a fundamental problem. However, most existing sequential recommendation models take as input clicked or purchased behavior sequences from user-item interactions. This leads to incomprehensive user representation and sub-optimal model performance, since they ignore the complete user behavior exposure data, i.e., items impressed yet unclicked by users. In this work, we attempt to incorporate and model those unclicked item sequences using a new learning approach in order to explore better sequential recommendation technique. An efficient triplet metric learning algorithm is proposed to appropriately learn the representation of unclicked items. Our method can be simply integrated with existing sequential recommendation models by a confidence fusion network and further gain better user representation. The offline experimental results based on real-world E-commerce data demonstrate the effectiveness and verify the importance of unclicked items in sequential recommendation. Moreover we deploy our new model (named XDM) into EBR of recommender system at Taobao, outperforming the deployed previous generation SDM.

preprint2021arXiv

Contrastive Learning for Sequential Recommendation

Sequential recommendation methods play a crucial role in modern recommender systems because of their ability to capture a user&#39;s dynamic interest from her/his historical interactions. Despite their success, we argue that these approaches usually rely on the sequential prediction task to optimize the huge amounts of parameters. They usually suffer from the data sparsity problem, which makes it difficult for them to learn high-quality user representations. To tackle that, inspired by recent advances of contrastive learning techniques in the computer version, we propose a novel multi-task model called \textbf{C}ontrastive \textbf{L}earning for \textbf{S}equential \textbf{Rec}ommendation~(\textbf{CL4SRec}). CL4SRec not only takes advantage of the traditional next item prediction task but also utilizes the contrastive learning framework to derive self-supervision signals from the original user behavior sequences. Therefore, it can extract more meaningful user patterns and further encode the user representation effectively. In addition, we propose three data augmentation approaches to construct self-supervision signals. Extensive experiments on four public datasets demonstrate that CL4SRec achieves state-of-the-art performance over existing baselines by inferring better user representations.

preprint2021arXiv

Explore User Neighborhood for Real-time E-commerce Recommendation

Recommender systems play a vital role in modern online services, such as Amazon and Taobao. Traditional personalized methods, which focus on user-item (UI) relations, have been widely applied in industrial settings, owing to their efficiency and effectiveness. Despite their success, we argue that these approaches ignore local information hidden in similar users. To tackle this problem, user-based methods exploit similar user relations to make recommendations in a local perspective. Nevertheless, traditional user-based methods, like userKNN and matrix factorization, are intractable to be deployed in the real-time applications since such transductive models have to be recomputed or retrained with any new interaction. To overcome this challenge, we propose a framework called self-complementary collaborative filtering~(SCCF) which can make recommendations with both global and local information in real time. On the one hand, it utilizes UI relations and user neighborhood to capture both global and local information. On the other hand, it can identify similar users for each user in real time by inferring user representations on the fly with an inductive model. The proposed framework can be seamlessly incorporated into existing inductive UI approach and benefit from user neighborhood with little additional computation. It is also the first attempt to apply user-based methods in real-time settings. The effectiveness and efficiency of SCCF are demonstrated through extensive offline experiments on four public datasets, as well as a large scale online A/B test in Taobao.

preprint2021arXiv

Graph Attention Collaborative Similarity Embedding for Recommender System

We present Graph Attention Collaborative Similarity Embedding (GACSE), a new recommendation framework that exploits collaborative information in the user-item bipartite graph for representation learning. Our framework consists of two parts: the first part is to learn explicit graph collaborative filtering information such as user-item association through embedding propagation with attention mechanism, and the second part is to learn implicit graph collaborative information such as user-user similarities and item-item similarities through auxiliary loss. We design a new loss function that combines BPR loss with adaptive margin and similarity loss for the similarities learning. Extensive experiments on three benchmarks show that our model is consistently better than the latest state-of-the-art models.

preprint2021arXiv

NGAT4Rec: Neighbor-Aware Graph Attention Network For Recommendation

Learning informative representations (aka. embeddings) of users and items is the core of modern recommender systems. Previous works exploit user-item relationships of one-hop neighbors in the user-item interaction graph to improve the quality of representation. Recently, the research of Graph Neural Network (GNN) for recommendation considers the implicit collaborative information of multi-hop neighbors to enrich the representation. However, most works of GNN for recommendation systems do not consider the relational information which implies the expression differences of different neighbors in the neighborhood explicitly. The influence of each neighboring item to the representation of the user&#39;s preference can be represented by the correlation between the item and neighboring items of the user. Symmetrically, for a given item, the correlation between one neighboring user and neighboring users can reflect the strength of signal about the item&#39;s characteristic. To modeling the implicit correlations of neighbors in graph embedding aggregating, we propose a Neighbor-Aware Graph Attention Network for recommendation task, termed NGAT4Rec. It employs a novel neighbor-aware graph attention layer that assigns different neighbor-aware attention coefficients to different neighbors of a given node by computing the attention among these neighbors pairwisely. Then NGAT4Rec aggregates the embeddings of neighbors according to the corresponding neighbor-aware attention coefficients to generate next layer embedding for every node. Furthermore, we combine more neighbor-aware graph attention layer to gather the influential signals from multi-hop neighbors. We remove feature transformation and nonlinear activation that proved to be useless on collaborative filtering. Extensive experiments on three benchmark datasets show that our model outperforms various state-of-the-art models consistently.

preprint2021arXiv

Towards Long-term Fairness in Recommendation

As Recommender Systems (RS) influence more and more people in their daily life, the issue of fairness in recommendation is becoming more and more important. Most of the prior approaches to fairness-aware recommendation have been situated in a static or one-shot setting, where the protected groups of items are fixed, and the model provides a one-time fairness solution based on fairness-constrained optimization. This fails to consider the dynamic nature of the recommender systems, where attributes such as item popularity may change over time due to the recommendation policy and user engagement. For example, products that were once popular may become no longer popular, and vice versa. As a result, the system that aims to maintain long-term fairness on the item exposure in different popularity groups must accommodate this change in a timely fashion. Novel to this work, we explore the problem of long-term fairness in recommendation and accomplish the problem through dynamic fairness learning. We focus on the fairness of exposure of items in different groups, while the division of the groups is based on item popularity, which dynamically changes over time in the recommendation process. We tackle this problem by proposing a fairness-constrained reinforcement learning algorithm for recommendation, which models the recommendation problem as a Constrained Markov Decision Process (CMDP), so that the model can dynamically adjust its recommendation policy to make sure the fairness requirement is always satisfied when the environment changes. Experiments on several real-world datasets verify our framework&#39;s superiority in terms of recommendation performance, short-term fairness, and long-term fairness.

preprint2021arXiv

Variation Control and Evaluation for Generative SlateRecommendations

Slate recommendation generates a list of items as a whole instead of ranking each item individually, so as to better model the intra-list positional biases and item relations. In order to deal with the enormous combinatorial space of slates, recent work considers a generative solution so that a slate distribution can be directly modeled. However, we observe that such approaches -- despite their proved effectiveness in computer vision -- suffer from a trade-off dilemma in recommender systems: when focusing on reconstruction, they easily over-fit the data and hardly generate satisfactory recommendations; on the other hand, when focusing on satisfying the user interests, they get trapped in a few items and fail to cover the item variation in slates. In this paper, we propose to enhance the accuracy-based evaluation with slate variation metrics to estimate the stochastic behavior of generative models. We illustrate that instead of reaching to one of the two undesirable extreme cases in the dilemma, a valid generative solution resides in a narrow &#34;elbow&#34; region in between. And we show that item perturbation can enforce slate variation and mitigate the over-concentration of generated slates, which expand the &#34;elbow&#34; performance to an easy-to-find region. We further propose to separate a pivot selection phase from the generation process so that the model can apply perturbation before generation. Empirical results show that this simple modification can provide even better variance with the same level of accuracy compared to post-generation perturbation methods.

preprint2020arXiv

A Unified DNN Weight Compression Framework Using Reweighted Optimization Methods

To address the large model size and intensive computation requirement of deep neural networks (DNNs), weight pruning techniques have been proposed and generally fall into two categories, i.e., static regularization-based pruning and dynamic regularization-based pruning. However, the former method currently suffers either complex workloads or accuracy degradation, while the latter one takes a long time to tune the parameters to achieve the desired pruning rate without accuracy loss. In this paper, we propose a unified DNN weight pruning framework with dynamically updated regularization terms bounded by the designated constraint, which can generate both non-structured sparsity and different kinds of structured sparsity. We also extend our method to an integrated framework for the combination of different DNN compression tasks.

preprint2020arXiv

CLEAR: Contrastive Learning for Sentence Representation

Pre-trained language models have proven their unique powers in capturing implicit language features. However, most pre-training approaches focus on the word-level training objective, while sentence-level objectives are rarely studied. In this paper, we propose Contrastive LEArning for sentence Representation (CLEAR), which employs multiple sentence-level augmentation strategies in order to learn a noise-invariant sentence representation. These augmentations include word and span deletion, reordering, and substitution. Furthermore, we investigate the key reasons that make contrastive learning effective through numerous experiments. We observe that different sentence augmentations during pre-training lead to different performance improvements on various downstream tasks. Our approach is shown to outperform multiple existing methods on both SentEval and GLUE benchmarks.

preprint2020arXiv

Computation on Sparse Neural Networks: an Inspiration for Future Hardware

Neural network models are widely used in solving many challenging problems, such as computer vision, personalized recommendation, and natural language processing. Those models are very computationally intensive and reach the hardware limit of the existing server and IoT devices. Thus, finding better model architectures with much less amount of computation while maximally preserving the accuracy is a popular research topic. Among various mechanisms that aim to reduce the computation complexity, identifying the zero values in the model weights and in the activations to avoid computing them is a promising direction. In this paper, we summarize the current status of the research on the computation of sparse neural networks, from the perspective of the sparse algorithms, the software frameworks, and the hardware accelerations. We observe that the search for the sparse structure can be a general methodology for high-quality model explorations, in addition to a strategy for high-efficiency model execution. We discuss the model accuracy influenced by the number of weight parameters and the structure of the model. The corresponding models are called to be located in the weight dominated and structure dominated regions, respectively. We show that for practically complicated problems, it is more beneficial to search large and sparse models in the weight dominated region. In order to achieve the goal, new approaches are required to search for proper sparse structures, and new sparse training hardware needs to be developed to facilitate fast iterations of sparse models.

preprint2020arXiv

Dynamics of an imprecise stochastic Holling II one-predator two-prey system with jumps

Groups in ecology are often affected by sudden environmental perturbations. Parameters of stochastic models are often imprecise due to various uncertainties. In this paper, we formulate a stochastic Holling II one-predator two-prey system with jumps and interval parameters. Firstly, we prove the existence and uniqueness of the positive solution. Moreover, the sufficient conditions for the extinction and persistence in the mean of the solution are obtained.

preprint2020arXiv

Dynamics of an imprecise stochastic Lotka Volterra food chain chemostat model with Levy jumps

Population dynamics are often affected by sudden environmental perturbations. Parameters of stochastic models are often imprecise due to various uncertainties. In this paper, we formulate a stochastic Lotka Volterra food chain chemostat model that includes Levy jumps and interval parameters. Firstly, we prove the existence and uniqueness of the positive solution. Moreover, the threshold between extinction and persistence in the mean of the microorganism is obtained. Finally, some simulations are carried out to demonstrate our theoretical results.

preprint2020arXiv

Dynamics of an imprecise stochastic multimolecular biochemical reaction model with Lévy jumps

Population dynamics are often affected by sudden environmental perturbations. Parameters of stochastic models are often imprecise due to various uncertainties. In this paper, we formulate a stochastic multimolecular biochemical reaction model that includes Lévy jumps and interval parameters. Firstly, we prove the existence and uniqueness of the positive solution. Moreover, the threshold between extinction and persistence of the reaction is obtained. Finally, some simulations are carried out to demonstrate our theoretical results.

preprint2020arXiv

Flat-top interleavers based on single MMIs

We demonstrate for the first time flat-top interleavers based on cascaded Mach-Zehnder interferometers (MZIs) which use only single multimode interferometers (MMIs) as power splitters. Our previous designs were based on 4-stage cascades of MZIs, where we used single MMIs and double MMIs to achieve 85:15 splitting ratio and 31:69 splitting ratio respectively. This time, we propose instead a greatly simplified 2-stage configuration using only single MMIs, including a standard 50:50 MMI, and two tapered MMIs to achieve 71:29 and 92:08 splitting ratios. We have designed the interleaver based on its geometrical representation on the Bloch sphere, then confirmed by efficient 2D simulations of the building blocks and of the whole structure, based on the eigenmode expansion method. We show how important is to take into account the phase relations between the outputs of all MMIs in order to make a working design. We have successfully fabricated devices with different channel spacing on our micron-scale silicon photonics platform, and measurement results confirmed their expected flat-top operation on a broad band. Using only single MMI splitters we can not only greatly outperform the bandwidth achieved by standard directional couplers, but we can also ensure much higher robustness to fabrication errors, also compared to previous demonstrations based on double MMIs. Indeed, when compared to those previous attempts, the new results prove tapered MMIs to be the most robust approach to achieve arbitrary splitting ratios.

preprint2020arXiv

Learning in the Frequency Domain

Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.

preprint2020arXiv

Learning Personalized Risk Preferences for Recommendation

The rapid growth of e-commerce has made people accustomed to shopping online. Before making purchases on e-commerce websites, most consumers tend to rely on rating scores and review information to make purchase decisions. With this information, they can infer the quality of products to reduce the risk of purchase. Specifically, items with high rating scores and good reviews tend to be less risky, while items with low rating scores and bad reviews might be risky to purchase. On the other hand, the purchase behaviors will also be influenced by consumers&#39; tolerance of risks, known as the risk attitudes. Economists have studied risk attitudes for decades. These studies reveal that people are not always rational enough when making decisions, and their risk attitudes may vary in different circumstances. Most existing works over recommendation systems do not consider users&#39; risk attitudes in modeling, which may lead to inappropriate recommendations to users. For example, suggesting a risky item to a risk-averse person or a conservative item to a risk-seeking person may result in the reduction of user experience. In this paper, we propose a novel risk-aware recommendation framework that integrates machine learning and behavioral economics to uncover the risk mechanism behind users&#39; purchasing behaviors. Concretely, we first develop statistical methods to estimate the risk distribution of each item and then draw the Nobel-award winning Prospect Theory into our model to learn how users choose from probabilistic alternatives that involve risks, where the probabilities of the outcomes are uncertain. Experiments on several e-commerce datasets demonstrate that our approach can achieve better performance than many classical recommendation approaches, and further analyses also verify the advantages of risk-aware recommendation beyond accuracy.

preprint2020arXiv

MLPerf Inference Benchmark

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark&#39;s flexibility and adaptability.

preprint2020arXiv

MTBRN: Multiplex Target-Behavior Relation Enhanced Network for Click-Through Rate Prediction

Click-through rate (CTR) prediction is a critical task for many industrial systems, such as display advertising and recommender systems. Recently, modeling user behavior sequences attracts much attention and shows great improvements in the CTR field. Existing works mainly exploit attention mechanism based on embedding product when considering relations between user behaviors and target item. However, this methodology lacks of concrete semantics and overlooks the underlying reasons driving a user to click on a target item. In this paper, we propose a new framework named Multiplex Target-Behavior Relation enhanced Network (MTBRN) to leverage multiplex relations between user behaviors and target item to enhance CTR prediction. Multiplex relations consist of meaningful semantics, which can bring a better understanding on users&#39; interests from different perspectives. To explore and model multiplex relations, we propose to incorporate various graphs (e.g., knowledge graph and item-item similarity graph) to construct multiple relational paths between user behaviors and target item. Then Bi-LSTM is applied to encode each path in the path extractor layer. A path fusion network and a path activation network are devised to adaptively aggregate and finally learn the representation of all paths for CTR prediction. Extensive offline and online experiments clearly verify the effectiveness of our framework.

preprint2020arXiv

Privileged Features Distillation at Taobao Recommendations

Features play an important role in the prediction tasks of e-commerce recommendations. To guarantee the consistency of off-line training and on-line serving, we usually utilize the same features that are both available. However, the consistency in turn neglects some discriminative features. For example, when estimating the conversion rate (CVR), i.e., the probability that a user would purchase the item if she clicked it, features like dwell time on the item detailed page are informative. However, CVR prediction should be conducted for on-line ranking before the click happens. Thus we cannot get such post-event features during serving. We define the features that are discriminative but only available during training as the privileged features. Inspired by the distillation techniques which bridge the gap between training and inference, in this work, we propose privileged features distillation (PFD). We train two models, i.e., a student model that is the same as the original one and a teacher model that additionally utilizes the privileged features. Knowledge distilled from the more accurate teacher is transferred to the student to improve its accuracy. During serving, only the student part is extracted and it relies on no privileged features. We conduct experiments on two fundamental prediction tasks at Taobao recommendations, i.e., click-through rate (CTR) at coarse-grained ranking and CVR at fine-grained ranking. By distilling the interacted features that are prohibited during serving for CTR and the post-event features for CVR, we achieve significant improvements over their strong baselines. During the on-line A/B tests, the click metric is improved by +5.0% in the CTR task. And the conversion metric is improved by +2.3% in the CVR task. Besides, by addressing several issues of training PFD, we obtain comparable training speed as the baselines without any distillation.

preprint2020arXiv

Ranking Enhanced Dialogue Generation

How to effectively utilize the dialogue history is a crucial problem in multi-turn dialogue generation. Previous works usually employ various neural network architectures (e.g., recurrent neural networks, attention mechanisms, and hierarchical structures) to model the history. However, a recent empirical study by Sankar et al. has shown that these architectures lack the ability of understanding and modeling the dynamics of the dialogue history. For example, the widely used architectures are insensitive to perturbations of the dialogue history, such as words shuffling, utterances missing, and utterances reordering. To tackle this problem, we propose a Ranking Enhanced Dialogue generation framework in this paper. Despite the traditional representation encoder and response generation modules, an additional ranking module is introduced to model the ranking relation between the former utterance and consecutive utterances. Specifically, the former utterance and consecutive utterances are treated as query and corresponding documents, and both local and global ranking losses are designed in the learning process. In this way, the dynamics in the dialogue history can be explicitly captured. To evaluate our proposed models, we conduct extensive experiments on three public datasets, i.e., bAbI, PersonaChat, and JDC. Experimental results show that our models produce better responses in terms of both quantitative measures and human judgments, as compared with the state-of-the-art dialogue generation models. Furthermore, we give some detailed experimental analysis to show where and how the improvements come from.

preprint2020arXiv

Regulator-based risk statistics for portfolios

Risk statistic is a critical factor not only for risk analysis but also for financial application. However, the traditional risk statistics may fail to describe the characteristics of regulator-based risk. In this paper, we consider the regulator-based risk statistics for portfolios. By further developing the properties related to regulator-based risk statistics, we are able to derive dual representation for such risk.

preprint2020arXiv

Regulator-based risk statistics with scenario analysis

As regulators pay more attentions to losses rather than gains, we are able to derive a new class of risk statistics, named regulator-based risk statistics with scenario analysis in this paper. This new class of risk statistics can be considered as a kind of risk extension of risk statistics introduced by Kou et al. \cite{11}, and also data-based versions of loss-based risk measures introduced by Cont et al. \cite{5} and Sun et al. \cite{12}.

preprint2020arXiv

Skeleton Based Action Recognition using a Stacked Denoising Autoencoder with Constraints of Privileged Information

Recently, with the availability of cost-effective depth cameras coupled with real-time skeleton estimation, the interest in skeleton-based human action recognition is renewed. Most of the existing skeletal representation approaches use either the joint location or the dynamics model. Differing from the previous studies, we propose a new method called Denoising Autoencoder with Temporal and Categorical Constraints (DAE_CTC)} to study the skeletal representation in a view of skeleton reconstruction. Based on the concept of learning under privileged information, we integrate action categories and temporal coordinates into a stacked denoising autoencoder in the training phase, to preserve category and temporal feature, while learning the hidden representation from a skeleton. Thus, we are able to improve the discriminative validity of the hidden representation. In order to mitigate the variation resulting from temporary misalignment, a new method of temporal registration, called Locally-Warped Sequence Registration (LWSR), is proposed for registering the sequences of inter- and intra-class actions. We finally represent the sequences using a Fourier Temporal Pyramid (FTP) representation and perform classification using a combination of LWSR registration, FTP representation, and a linear Support Vector Machine (SVM). The experimental results on three action data sets, namely MSR-Action3D, UTKinect-Action, and Florence3D-Action, show that our proposal performs better than many existing methods and comparably to the state of the art.

preprint2020arXiv

Understanding Echo Chambers in E-commerce Recommender Systems

Personalized recommendation benefits users in accessing contents of interests effectively. Current research on recommender systems mostly focuses on matching users with proper items based on user interests. However, significant efforts are missing to understand how the recommendations influence user preferences and behaviors, e.g., if and how recommendations result in \textit{echo chambers}. Extensive efforts have been made in examining the phenomenon in online media and social network systems. Meanwhile, there are growing concerns that recommender systems might lead to the self-reinforcing of user&#39;s interests due to narrowed exposure of items, which may be the potential cause of echo chamber. In this paper, we aim to analyze the echo chamber phenomenon in Alibaba Taobao -- one of the largest e-commerce platforms in the world. Echo chamber means the effect of user interests being reinforced through repeated exposure to similar contents. Based on the definition, we examine the presence of echo chamber in two steps. First, we explore whether user interests have been reinforced. Second, we check whether the reinforcement results from the exposure of similar contents. Our evaluations are enhanced with robust metrics, including cluster validity and statistical significance. Experiments are performed on extensive collections of real-world data consisting of user clicks, purchases, and browse logs from Alibaba Taobao. Evidence suggests the tendency of echo chamber in user click behaviors, while it is relatively mitigated in user purchase behaviors. Insights from the results guide the refinement of recommendation algorithms in real-world e-commerce systems.

preprint2019arXiv

SDM: Sequential Deep Matching Model for Online Large-scale Recommender System

Capturing users&#39; precise preferences is a fundamental problem in large-scale recommender system. Currently, item-based Collaborative Filtering (CF) methods are common matching approaches in industry. However, they are not effective to model dynamic and evolving preferences of users. In this paper, we propose a new sequential deep matching (SDM) model to capture users&#39; dynamic preferences by combining short-term sessions and long-term behaviors. Compared with existing sequence-aware recommendation methods, we tackle the following two inherent problems in real-world applications: (1) there could exist multiple interest tendencies in one session. (2) long-term preferences may not be effectively fused with current session interests. Long-term behaviors are various and complex, hence those highly related to the short-term session should be kept for fusion. We propose to encode behavior sequences with two corresponding components: multi-head self-attention module to capture multiple types of interests and long-short term gated fusion module to incorporate long-term preferences. Successive items are recommended after matching between sequential user behavior vector and item embedding vectors. Offline experiments on real-world datasets show the superior performance of the proposed SDM. Moreover, SDM has been successfully deployed on online large-scale recommender system at Taobao and achieves improvements in terms of a range of commercial metrics.

preprint2019arXiv

Thermal-null medium (TNM): a novel material to achieve feasible thermodynamics devices beyond conventional challenges

Recently, heat manipulation has gained the attention of scientific community due to its several applications. In this letter, based on transformation thermodynamic (TT) methodology, a novel material, which is called thermal-null medium (TNM), is proposed that enables us to design various thermal functionalities such as thermal bending devices, arbitrary shape heat concentrators and omnidirectional thermal cloaks. In contrary to the conventional TT-based conductivities, which are inhomogeneous and anisotropic, TNMs are homogeneous and easy to realize. In addition, the attained TNMs are independent of the device shape. That is if the geometry of the desired device is changed, there is no need to recalculate the necessitating conductivities. This feature of TNM will make it suitable for scenarios where re-configurability is of utmost importance. Several numerical simulations are carried out to demonstrate the TNM capability and its applications in directional bending devices, heat concentrators and thermal cloaks. The proposed TNM could open a new avenue for potential applications in solar thermal panels and thermal-electric devices.