Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
33works
0followers
21topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

33 published item(s)

preprint2026arXiv

Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents

The implicit policy of maintaining relatively stable acceptance rates at top AI conferences, despite exponentially growing submissions, introduces a critical structural vulnerability. This position paper characterizes a new systemic threat we term Agentic Denominator Gaming, in which a malicious actor deploys AI agents to generate and submit a large volume of superficially plausible but low-quality papers. Crucially, their objective is not the acceptance of low-quality papers, but rather to inflate the submission denominator and overwhelm reviewing capacity. Under a relatively stable acceptance rate, this dilution can systematically increase the publication probability of a small, targeted set of legitimate papers. We analyze the practical feasibility of this threat and its broader consequences, including intensified reviewer burnout, degraded review quality, and the emergence of industrialized automated agent mills. Finally, we propose and evaluate a range of mitigation strategies, and argue that durable protection will require system-level policy and incentive reforms, rather than relying primarily on technical detection alone.

preprint2025arXiv

Antarctic TianMu Staring Observation Project I: Overview and Implementation of the Prototype Telescope

Wide-field rapid sky surveys serve as critical observational methods for time-domain astronomical research. The Antarctic region, with several months of continuous dark nights annually, is an ideal site for time-domain astronomical observations. The Antarctic TianMu Staring Observation Project aims to deploy a fleet of small telescopes, adopting an array observation model to conduct time-domain optical observations in Antarctica, featuring wide-sky coverage, high-cadence sampling, long-period staring, and simultaneous multi-band measurements. Considering the severe challenges optical telescopes face in Antarctica, including extremely low temperatures, unattended operation, and limited power supply and network transmission, we have designed and developed the Antarctic TianMu prototype telescope based on drift-scan charge-coupled device technology. In October 2022, our prototype (with an aperture of 18 cm), named AT-Proto was transported to Zhongshan Station in Antarctica aboard China's 39th Antarctic Research Expedition. It has since operated stably and reliably in the frigid environment for over two years, demonstrating the significant advantages of this technology in polar astronomical observations. The experimental observation results of AT-Proto provide a solid foundation for the subsequent construction of a time-domain astronomy observation array in Antarctica.

preprint2023arXiv

Adaptive Control Strategy for Quadruped Robots in Actuator Degradation Scenarios

Quadruped robots have strong adaptability to extreme environments but may also experience faults. Once these faults occur, robots must be repaired before returning to the task, reducing their practical feasibility. One prevalent concern among these faults is actuator degradation, stemming from factors like device aging or unexpected operational events. Traditionally, addressing this problem has relied heavily on intricate fault-tolerant design, which demands deep domain expertise from developers and lacks generalizability. Learning-based approaches offer effective ways to mitigate these limitations, but a research gap exists in effectively deploying such methods on real-world quadruped robots. This paper introduces a pioneering teacher-student framework rooted in reinforcement learning, named Actuator Degradation Adaptation Transformer (ADAPT), aimed at addressing this research gap. This framework produces a unified control strategy, enabling the robot to sustain its locomotion and perform tasks despite sudden joint actuator faults, relying exclusively on its internal sensors. Empirical evaluations on the Unitree A1 platform validate the deployability and effectiveness of Adapt on real-world quadruped robots, and affirm the robustness and practicality of our approach.

preprint2022arXiv

A Graph-Enhanced Click Model for Web Search

To better exploit search logs and model users' behavior patterns, numerous click models are proposed to extract users' implicit interaction feedback. Most traditional click models are based on the probabilistic graphical model (PGM) framework, which requires manually designed dependencies and may oversimplify user behaviors. Recently, methods based on neural networks are proposed to improve the prediction accuracy of user behaviors by enhancing the expressive ability and allowing flexible dependencies. However, they still suffer from the data sparsity and cold-start problems. In this paper, we propose a novel graph-enhanced click model (GraphCM) for web search. Firstly, we regard each query or document as a vertex, and propose novel homogeneous graph construction methods for queries and documents respectively, to fully exploit both intra-session and inter-session information for the sparsity and cold-start problems. Secondly, following the examination hypothesis, we separately model the attractiveness estimator and examination predictor to output the attractiveness scores and examination probabilities, where graph neural networks and neighbor interaction techniques are applied to extract the auxiliary information encoded in the pre-constructed homogeneous graphs. Finally, we apply combination functions to integrate examination probabilities and attractiveness scores into click predictions. Extensive experiments conducted on three real-world session datasets show that GraphCM not only outperforms the state-of-art models, but also achieves superior performance in addressing the data sparsity and cold-start problems.

preprint2022arXiv

Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning

Deriving a good variable selection strategy in branch-and-bound is essential for the efficiency of modern mixed-integer programming (MIP) solvers. With MIP branching data collected during the previous solution process, learning to branch methods have recently become superior over heuristics. As branch-and-bound is naturally a sequential decision making task, one should learn to optimize the utility of the whole MIP solving process instead of being myopic on each step. In this work, we formulate learning to branch as an offline reinforcement learning (RL) problem, and propose a long-sighted hybrid search scheme to construct the offline MIP dataset, which values the long-term utilities of branching decisions. During the policy training phase, we deploy a ranking-based reward assignment scheme to distinguish the promising samples from the long-term or short-term view, and train the branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust, and can better generalize to large scales of MIP instances compared to the widely used heuristics and state-of-the-art learning-based branching models.

preprint2022arXiv

Context-aware Reranking with Utility Maximization for Recommendation

As a critical task for large-scale commercial recommender systems, reranking has shown the potential of improving recommendation results by uncovering mutual influence among items. Reranking rearranges items in the initial ranking lists from the previous ranking stage to better meet users' demands. However, rather than considering the context of initial lists as most existing methods do, an ideal reranking algorithm should consider the counterfactual context -- the position and the alignment of the items in the reranked lists. In this work, we propose a novel pairwise reranking framework, Context-aware Reranking with Utility Maximization for recommendation (CRUM), which maximizes the overall utility after reranking efficiently. Specifically, we first design a utility-oriented evaluator, which applies Bi-LSTM and graph attention mechanism to estimate the listwise utility via the counterfactual context modeling. Then, under the guidance of the evaluator, we propose a pairwise reranker model to find the most suitable position for each item by swapping misplaced item pairs. Extensive experiments on two benchmark datasets and a proprietary real-world dataset demonstrate that CRUM significantly outperforms the state-of-the-art models in terms of both relevance-based metrics and utility-based metrics.

preprint2022arXiv

DropNAS: Grouped Operation Dropout for Differentiable Architecture Search

Neural architecture search (NAS) has shown encouraging results in automating the architecture design. Recently, DARTS relaxes the search process with a differentiable formulation that leverages weight-sharing and SGD where all candidate operations are trained simultaneously. Our empirical results show that such procedure results in the co-adaption problem and Matthew Effect: operations with fewer parameters would be trained maturely earlier. This causes two problems: firstly, the operations with more parameters may never have the chance to express the desired function since those with less have already done the job; secondly, the system will punish those underperforming operations by lowering their architecture parameter, and they will get smaller loss gradients, which causes the Matthew Effect. In this paper, we systematically study these problems and propose a novel grouped operation dropout algorithm named DropNAS to fix the problems with DARTS. Extensive experiments demonstrate that DropNAS solves the above issues and achieves promising performance. Specifically, DropNAS achieves 2.26% test error on CIFAR-10, 16.39% on CIFAR-100 and 23.4% on ImageNet (with the same training hyperparameters as DARTS for a fair comparison). It is also observed that DropNAS is robust across variants of the DARTS search space. Code is available at https://github.com/wiljohnhong/DropNAS.

preprint2022arXiv

Efficient Policy Space Response Oracles

Policy Space Response Oracle methods (PSRO) provide a general solution to learn Nash equilibrium in two-player zero-sum games but suffer from two drawbacks: (1) the computation inefficiency due to the need for consistent meta-game evaluation via simulations, and (2) the exploration inefficiency due to finding the best response against a fixed meta-strategy at every epoch. In this work, we propose Efficient PSRO (EPSRO) that largely improves the efficiency of the above two steps. Central to our development is the newly-introduced subroutine of no-regret optimization on the unrestricted-restricted (URR) game. By solving URR at each epoch, one can evaluate the current game and compute the best response in one forward pass without the need for meta-game simulations. Theoretically, we prove that the solution procedures of EPSRO offer a monotonic improvement on the exploitability, which none of existing PSRO methods possess. Furthermore, we prove that the no-regret optimization has a regret bound of $\mathcal{O}(\sqrt{T\log{[(k^2+k)/2]}})$, where $k$ is the size of restricted policy set. Most importantly, a desirable property of EPSRO is that it is parallelizable, this allows for highly efficient exploration in the policy space that induces behavioral diversity. We test EPSRO on three classes of games, and report a 50x speedup in wall-time and 10x data efficiency while maintaining similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.

preprint2022arXiv

Generative Adversarial Exploration for Reinforcement Learning

Exploration is crucial for training the optimal reinforcement learning (RL) policy, where the key is to discriminate whether a state visiting is novel. Most previous work focuses on designing heuristic rules or distance metrics to check whether a state is novel without considering such a discrimination process that can be learned. In this paper, we propose a novel method called generative adversarial exploration (GAEX) to encourage exploration in RL via introducing an intrinsic reward output from a generative adversarial network, where the generator provides fake samples of states that help discriminator identify those less frequently visited states. Thus the agent is encouraged to visit those states which the discriminator is less confident to judge as visited. GAEX is easy to implement and of high training efficiency. In our experiments, we apply GAEX into DQN and the DQN-GAEX algorithm achieves convincing performance on challenging exploration problems, including the game Venture, Montezuma's Revenge and Super Mario Bros, without further fine-tuning on complicate learning algorithms. To our knowledge, this is the first work to employ GAN in RL exploration problems.

preprint2022arXiv

Learn over Past, Evolve for Future: Search-based Time-aware Recommendation with Sequential Behavior Data

The personalized recommendation is an essential part of modern e-commerce, where user's demands are not only conditioned by their profile but also by their recent browsing behaviors as well as periodical purchases made some time ago. In this paper, we propose a novel framework named Search-based Time-Aware Recommendation (STARec), which captures the evolving demands of users over time through a unified search-based time-aware model. More concretely, we first design a search-based module to retrieve a user's relevant historical behaviors, which are then mixed up with her recent records to be fed into a time-aware sequential network for capturing her time-sensitive demands. Besides retrieving relevant information from her personal history, we also propose to search and retrieve similar user's records as an additional reference. All these sequential records are further fused to make the final recommendation. Beyond this framework, we also develop a novel label trick that uses the previous labels (i.e., user's feedbacks) as the input to better capture the user's browsing pattern. We conduct extensive experiments on three real-world commercial datasets on click-through-rate prediction tasks against state-of-the-art methods. Experimental results demonstrate the superiority and efficiency of our proposed framework and techniques. Furthermore, results of online experiments on a daily item recommendation platform of Company X show that STARec gains average performance improvement of around 6% and 1.5% in its two main item recommendation scenarios on CTR metric respectively.

preprint2022arXiv

Multi-Level Interaction Reranking with User Behavior History

As the final stage of the multi-stage recommender system (MRS), reranking directly affects users' experience and satisfaction, thus playing a critical role in MRS. Despite the improvement achieved in the existing work, three issues are yet to be solved. First, users' historical behaviors contain rich preference information, such as users' long and short-term interests, but are not fully exploited in reranking. Previous work typically treats items in history equally important, neglecting the dynamic interaction between the history and candidate items. Second, existing reranking models focus on learning interactions at the item level while ignoring the fine-grained feature-level interactions. Lastly, estimating the reranking score on the ordered initial list before reranking may lead to the early scoring problem, thereby yielding suboptimal reranking performance. To address the above issues, we propose a framework named Multi-level Interaction Reranking (MIR). MIR combines low-level cross-item interaction and high-level set-to-list interaction, where we view the candidate items to be reranked as a set and the users' behavior history in chronological order as a list. We design a novel SLAttention structure for modeling the set-to-list interactions with personalized long-short term interests. Moreover, feature-level interactions are incorporated to capture the fine-grained influence among items. We design MIR in such a way that any permutation of the input items would not change the output ranking, and we theoretically prove it. Extensive experiments on three public and proprietary datasets show that MIR significantly outperforms the state-of-the-art models using various ranking and utility metrics.

preprint2022arXiv

Multi-Scale User Behavior Network for Entire Space Multi-Task Learning

Modelling the user's multiple behaviors is an essential part of modern e-commerce, whose widely adopted application is to jointly optimize click-through rate (CTR) and conversion rate (CVR) predictions. Most of existing methods overlook the effect of two key characteristics of the user's behaviors: for each item list, (i) contextual dependence refers to that the user's behaviors on any item are not purely determinated by the item itself but also are influenced by the user's previous behaviors (e.g., clicks, purchases) on other items in the same sequence; (ii) multiple time scales means that users are likely to click frequently but purchase periodically. To this end, we develop a new multi-scale user behavior network named Hierarchical rEcurrent Ranking On the Entire Space (HEROES) which incorporates the contextual information to estimate the user multiple behaviors in a multi-scale fashion. Concretely, we introduce a hierarchical framework, where the lower layer models the user's engagement behaviors while the upper layer estimates the user's satisfaction behaviors. The proposed architecture can automatically learn a suitable time scale for each layer to capture the dynamic user's behavioral patterns. Besides the architecture, we also introduce the Hawkes process to form a novel recurrent unit which can not only encode the items' features in the context but also formulate the excitation or discouragement from the user's previous behaviors. We further show that HEROES can be extended to build unbiased ranking systems through combinations with the survival analysis technique. Extensive experiments over three large-scale industrial datasets demonstrate the superiority of our model compared with the state-of-the-art methods.

preprint2022arXiv

Multi-View Graph Representation for Programming Language Processing: An Investigation into Algorithm Detection

Program representation, which aims at converting program source code into vectors with automatically extracted features, is a fundamental problem in programming language processing (PLP). Recent work tries to represent programs with neural networks based on source code structures. However, such methods often focus on the syntax and consider only one single perspective of programs, limiting the representation power of models. This paper proposes a multi-view graph (MVG) program representation method. MVG pays more attention to code semantics and simultaneously includes both data flow and control flow as multiple views. These views are then combined and processed by a graph neural network (GNN) to obtain a comprehensive program representation that covers various aspects. We thoroughly evaluate our proposed MVG approach in the context of algorithm detection, an important and challenging subfield of PLP. Specifically, we use a public dataset POJ-104 and also construct a new challenging dataset ALG-109 to test our method. In experiments, MVG outperforms previous methods significantly, demonstrating our model's strong capability of representing source code.

preprint2022arXiv

On Effective Scheduling of Model-based Reinforcement Learning

Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance. Inspired by the analysis, we propose a framework named AutoMBPO to automatically schedule the real data ratio as well as other hyperparameters in training model-based policy optimization (MBPO) algorithm, a representative running case of model-based methods. On several continuous control tasks, the MBPO instance trained with hyperparameters scheduled by AutoMBPO can significantly surpass the original one, and the real data ratio schedule found by AutoMBPO shows consistency with our theoretical analysis.

preprint2022arXiv

Towards Making the Most of BERT in Neural Machine Translation

GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (CTNMT) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed CTNMT consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show CTNMT gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score. The code and model can be downloaded from https://github.com/bytedance/neurst/ tree/master/examples/ctnmt.

preprint2022arXiv

Who to Watch Next: Two-side Interactive Networks for Live Broadcast Recommendation

With the prevalence of live broadcast business nowadays, a new type of recommendation service, called live broadcast recommendation, is widely used in many mobile e-commerce Apps. Different from classical item recommendation, live broadcast recommendation is to automatically recommend user anchors instead of items considering the interactions among triple-objects (i.e., users, anchors, items) rather than binary interactions between users and items. Existing methods based on binary objects, ranging from early matrix factorization to recently emerged deep learning, obtain objects' embeddings by mapping from pre-existing features. Directly applying these techniques would lead to limited performance, as they are failing to encode collaborative signals among triple-objects. In this paper, we propose a novel TWo-side Interactive NetworkS (TWINS) for live broadcast recommendation. In order to fully use both static and dynamic information on user and anchor sides, we combine a product-based neural network with a recurrent neural network to learn the embedding of each object. In addition, instead of directly measuring the similarity, TWINS effectively injects the collaborative effects into the embedding process in an explicit manner by modeling interactive patterns between the user's browsing history and the anchor's broadcast history in both item and anchor aspects. Furthermore, we design a novel co-retrieval technique to select key items among massive historic records efficiently. Offline experiments on real large-scale data show the superior performance of the proposed TWINS, compared to representative methods; and further results of online experiments on Diantao App show that TWINS gains average performance improvement of around 8% on ACTR metric, 3% on UCTR metric, 3.5% on UCVR metric.

preprint2021arXiv

Deterministic Time-Bin Entanglement between a Single Photon and an Atomic Ensemble

Hybrid matter-photon entanglement is the building block for quantum networks. It is very favorable if the entanglement can be prepared with a high probability. In this paper, we report the deterministic creation of entanglement between an atomic ensemble and a single photon by harnessing Rydberg blockade. We design a scheme that creates entanglement between a single photon's temporal modes and the Rydberg levels that host a collective excitation, using a process of cyclical retrieving and patching. The hybrid entanglement is tested via retrieving the atomic excitation as a second photon and performing correlation measurements, which suggest an entanglement fidelity of 87.8%. Our source of matter-photon entanglement will enable the entangling of remote quantum memories with much higher efficiency.

preprint2021arXiv

Entanglement of two quantum memories via fibers over dozens of kilometres

Quantum internet will enable a number of revolutionary applications. It relies on entanglement of remote quantum memories over long distances. Despite enormous progresses so far, the maximal physical separation achieved between two nodes is 1.3 km, and challenges for long distance remain. Here we make a significant step forward by entangling two atomic ensembles in one lab via photon transmission through metropolitan-scale fibers. We use cavity enhancement to create bright atom-photon entanglement, and harness quantum frequency conversion to shift the atomic wavelength to telecom. We realize entanglement over 22 km field-deployed fibers via two-photon interference, and entanglement over 50 km coiled fibers via single-photon interference. Our experiment can be extended to physically separated nodes with similar distance as a functional segment for atomic quantum networks, thus paving the way towards establishing atomic entanglement over many nodes and over much longer distance.

preprint2021arXiv

Towards Generalized Implementation of Wasserstein Distance in GANs

Wasserstein GANs (WGANs), built upon the Kantorovich-Rubinstein (KR) duality of Wasserstein distance, is one of the most theoretically sound GAN models. However, in practice it does not always outperform other variants of GANs. This is mostly due to the imperfect implementation of the Lipschitz condition required by the KR duality. Extensive work has been done in the community with different implementations of the Lipschitz constraint, which, however, is still hard to satisfy the restriction perfectly in practice. In this paper, we argue that the strong Lipschitz constraint might be unnecessary for optimization. Instead, we take a step back and try to relax the Lipschitz constraint. Theoretically, we first demonstrate a more general dual form of the Wasserstein distance called the Sobolev duality, which relaxes the Lipschitz constraint but still maintains the favorable gradient property of the Wasserstein distance. Moreover, we show that the KR duality is actually a special case of the Sobolev duality. Based on the relaxed duality, we further propose a generalized WGAN training scheme named Sobolev Wasserstein GAN (SWGAN), and empirically demonstrate the improvement of SWGAN over existing methods with extensive experiments.

preprint2020arXiv

A Deep Recurrent Survival Model for Unbiased Ranking

Position bias is a critical problem in information retrieval when dealing with implicit yet biased user feedback data. Unbiased ranking methods typically rely on causality models and debias the user feedback through inverse propensity weighting. While practical, these methods still suffer from two major problems. First, when inferring a user click, the impact of the contextual information, such as documents that have been examined, is often ignored. Second, only the position bias is considered but other issues resulted from user browsing behaviors are overlooked. In this paper, we propose an end-to-end Deep Recurrent Survival Ranking (DRSR), a unified framework to jointly model user's various behaviors, to (i) consider the rich contextual information in the ranking list; and (ii) address the hidden issues underlying user behaviors, i.e., to mine observe pattern in queries without any click (non-click queries), and to model tracking logs which cannot truly reflect the user browsing intents (untrusted observation). Specifically, we adopt a recurrent neural network to model the contextual information and estimates the conditional likelihood of user feedback at each position. We then incorporate survival analysis techniques with the probability chain rule to mathematically recover the unbiased joint probability of one user's various behaviors. DRSR can be easily incorporated with both point-wise and pair-wise learning objectives. The extensive experiments over two large-scale industrial datasets demonstrate the significant performance gains of our model comparing with the state-of-the-arts.

preprint2020arXiv

An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph

There is an influx of heterogeneous information network (HIN) based recommender systems in recent years since HIN is capable of characterizing complex graphs and contains rich semantics. Although the existing approaches have achieved performance improvement, while practical, they still face the following problems. On one hand, most existing HIN-based methods rely on explicit path reachability to leverage path-based semantic relatedness between users and items, e.g., metapath-based similarities. These methods are hard to use and integrate since path connections are sparse or noisy, and are often of different lengths. On the other hand, other graph-based methods aim to learn effective heterogeneous network representations by compressing node together with its neighborhood information into single embedding before prediction. This weakly coupled manner in modeling overlooks the rich interactions among nodes, which introduces an early summarization issue. In this paper, we propose an end-to-end Neighborhood-based Interaction Model for Recommendation (NIRec) to address the above problems. Specifically, we first analyze the significance of learning interactions in HINs and then propose a novel formulation to capture the interactive patterns between each pair of nodes through their metapath-guided neighborhoods. Then, to explore complex interactions between metapaths and deal with the learning complexity on large-scale networks, we formulate interaction in a convolutional way and learn efficiently with fast Fourier transform. The extensive experiments on four different types of heterogeneous graphs demonstrate the performance gains of NIRec comparing with state-of-the-arts. To the best of our knowledge, this is the first work providing an efficient neighborhood-based interaction model in the HIN-based recommendations.

preprint2020arXiv

AutoFIS: Automatic Feature Interaction Selection in Factorization Models for Click-Through Rate Prediction

Learning feature interactions is crucial for click-through rate (CTR) prediction in recommender systems. In most existing deep learning models, feature interactions are either manually designed or simply enumerated. However, enumerating all feature interactions brings large memory and computation cost. Even worse, useless interactions may introduce noise and complicate the training process. In this work, we propose a two-stage algorithm called Automatic Feature Interaction Selection (AutoFIS). AutoFIS can automatically identify important feature interactions for factorization models with computational cost just equivalent to training the target model to convergence. In the \emph{search stage}, instead of searching over a discrete set of candidate feature interactions, we relax the choices to be continuous by introducing the architecture parameters. By implementing a regularized optimizer over the architecture parameters, the model can automatically identify and remove the redundant feature interactions during the training process of the model. In the \emph{re-train stage}, we keep the architecture parameters serving as an attention unit to further boost the performance. Offline experiments on three large-scale datasets (two public benchmarks, one private) demonstrate that AutoFIS can significantly improve various FM based models. AutoFIS has been deployed onto the training platform of Huawei App Store recommendation service, where a 10-day online A/B test demonstrated that AutoFIS improved the DeepFM model by 20.3\% and 20.1\% in terms of CTR and CVR respectively.

preprint2020arXiv

GeneraLight: Improving Environment Generalization of Traffic Signal Control via Meta Reinforcement Learning

The heavy traffic congestion problem has always been a concern for modern cities. To alleviate traffic congestion, researchers use reinforcement learning (RL) to develop better traffic signal control (TSC) algorithms in recent years. However, most RL models are trained and tested in the same traffic flow environment, which results in a serious overfitting problem. Since the traffic flow environment in the real world keeps varying, these models can hardly be applied due to the lack of generalization ability. Besides, the limited number of accessible traffic flow data brings extra difficulty in testing the generalization ability of the models. In this paper, we design a novel traffic flow generator based on Wasserstein generative adversarial network to generate sufficient diverse and quality traffic flows and use them to build proper training and testing environments. Then we propose a meta-RL TSC framework GeneraLight to improve the generalization ability of TSC models. GeneraLight boosts the generalization performance by combining the idea of flow clustering and model-agnostic meta-learning. We conduct extensive experiments on multiple real-world datasets to show the superior performance of GeneraLight on generalizing to different traffic flows.

preprint2020arXiv

GIKT: A Graph-based Interaction Model for Knowledge Tracing

With the rapid development in online education, knowledge tracing (KT) has become a fundamental problem which traces students' knowledge status and predicts their performance on new questions. Questions are often numerous in online education systems, and are always associated with much fewer skills. However, the previous literature fails to involve question information together with high-order question-skill correlations, which is mostly limited by data sparsity and multi-skill problems. From the model perspective, previous models can hardly capture the long-term dependency of student exercise history, and cannot model the interactions between student-questions, and student-skills in a consistent way. In this paper, we propose a Graph-based Interaction model for Knowledge Tracing (GIKT) to tackle the above probems. More specifically, GIKT utilizes graph convolutional network (GCN) to substantially incorporate question-skill correlations via embedding propagation. Besides, considering that relevant questions are usually scattered throughout the exercise history, and that question and skill are just different instantiations of knowledge, GIKT generalizes the degree of students' master of the question to the interactions between the student's current state, the student's history related exercises, the target question, and related skills. Experiments on three datasets demonstrate that GIKT achieves the new state-of-the-art performance, with at least 1% absolute AUC improvement.

preprint2020arXiv

Infomax Neural Joint Source-Channel Coding via Adversarial Bit Flip

Although Shannon theory states that it is asymptotically optimal to separate the source and channel coding as two independent processes, in many practical communication scenarios this decomposition is limited by the finite bit-length and computational power for decoding. Recently, neural joint source-channel coding (NECST) is proposed to sidestep this problem. While it leverages the advancements of amortized inference and deep learning to improve the encoding and decoding process, it still cannot always achieve compelling results in terms of compression and error correction performance due to the limited robustness of its learned coding networks. In this paper, motivated by the inherent connections between neural joint source-channel coding and discrete representation learning, we propose a novel regularization method called Infomax Adversarial-Bit-Flip (IABF) to improve the stability and robustness of the neural joint source-channel coding scheme. More specifically, on the encoder side, we propose to explicitly maximize the mutual information between the codeword and data; while on the decoder side, the amortized reconstruction is regularized within an adversarial framework. Extensive experiments conducted on various real-world datasets evidence that our IABF can achieve state-of-the-art performances on both compression and error correction benchmarks and outperform the baselines by a significant margin.

preprint2020arXiv

Interactive Recommender System via Knowledge Graph-enhanced Reinforcement Learning

Interactive recommender system (IRS) has drawn huge attention because of its flexible recommendation strategy and the consideration of optimal long-term user experiences. To deal with the dynamic user preference and optimize accumulative utilities, researchers have introduced reinforcement learning (RL) into IRS. However, RL methods share a common issue of sample efficiency, i.e., huge amount of interaction data is required to train an effective recommendation policy, which is caused by the sparse user responses and the large action space consisting of a large number of candidate items. Moreover, it is infeasible to collect much data with explorative policies in online environments, which will probably harm user experience. In this work, we investigate the potential of leveraging knowledge graph (KG) in dealing with these issues of RL methods for IRS, which provides rich side information for recommendation decision making. Instead of learning RL policies from scratch, we make use of the prior knowledge of the item correlation learned from KG to (i) guide the candidate selection for better candidate item retrieval, (ii) enrich the representation of items and user states, and (iii) propagate user preferences among the correlated items over KG to deal with the sparsity of user feedback. Comprehensive experiments have been conducted on two real-world datasets, which demonstrate the superiority of our approach with significant improvements against state-of-the-arts.

preprint2020arXiv

Large-Scale Optimal Transport via Adversarial Training with Cycle-Consistency

Recent advances in large-scale optimal transport have greatly extended its application scenarios in machine learning. However, existing methods either not explicitly learn the transport map or do not support general cost function. In this paper, we propose an end-to-end approach for large-scale optimal transport, which directly solves the transport map and is compatible with general cost function. It models the transport map via stochastic neural networks and enforces the constraint on the marginal distributions via adversarial training. The proposed framework can be further extended towards learning Monge map or optimal bijection via adopting cycle-consistency constraint(s). We verify the effectiveness of the proposed method and demonstrate its superior performance against existing methods with large-scale real-world applications, including domain adaptation, image-to-image translation, and color transfer.

preprint2020arXiv

Long-time dynamics of classical Patlak-Keller-Segel equation

When the spatial dimension $n =2$, it has been well-known that a global mild solution to classical Patlak-Keller-Segel equation (PKS equation for short) exists if and only if its initial total mass is not in supercritical regime. However, to study long-time behavior of a global mild solution to $2$D PKS equation usually requires finite-free-energy and finite-second-moment assumptions on initial data. In this article, we introduce a novel argument to push and stretch a space-time strip. By this way, we gain $L^1$-compactness of PKS equation expressed under similarity variables. As a consequence, we obtain global dynamics of 2D PKS equation in subcritical regime with no additional assumptions. As for the higher dimensional case in which the spatial dimension $n \geq 3$, we also characterize the long-time asymptotics of global mild solutions to PKS equation. With a finite-total-mass assumption on density of cells, any global mild solution to PKS equation will approach a self-similar profile when time is large, provided that there is a sequence of time going to infinity on which $L^\infty$-norm of the density of cells converges to zero. The self-similar profile in the higher dimensional case is given by the function $M\mathcal{G}_n$, where $M$ is the total mass of cells and $\mathcal{G}_n$ denotes the standard $n$-dimensional Gaussian probability density. Convergence rates to self-similar profiles are also discussed in any dimensions. Particularly in the higher dimensional case, the general convergence rate for $L^1$-initial data can be improved if the initial data has a finite second moment. In fact, when time is large and $n \geq 3$, we provide, in an optimal way, a higher-order approximation of global mild solutions to PKS equation if the initial density has a finite second moment. All convergence rates studied in this article are under the $L^p$-norm with $p \in [1,\infty]$.

preprint2020arXiv

Lowest Degree Decomposition of Complex Networks

The heterogeneous structure implies that a very few nodes may play the critical role in maintaining structural and functional properties of a large-scale network. Identifying these vital nodes is one of the most important tasks in network science, which allow us to better conduct successful social advertisements, immunize a network against epidemics, discover drug target candidates and essential proteins, and prevent cascading breakdowns in power grids, financial markets and ecological systems. Inspired by the nested nature of real networks, we propose a decomposition method where at each step the nodes with the lowest degree are pruned. We have strictly proved that this so-called lowest degree decomposition (LDD) is a subdivision of the famous k-core decomposition. Extensive numerical analyses on epidemic spreading, synchronization and nonlinear mutualistic dynamics show that the LDD can more accurately find out the most influential spreaders, the most efficient controllers and the most vulnerable species than k-core decomposition and other well-known indices. The present method only makes use of local topological information, and thus has high potential to become a powerful tool for network analysis.

preprint2020arXiv

Multi-Agent Interactions Modeling with Correlated Policies

In multi-agent systems, complex interacting behaviors arise due to the high correlations among agents. However, previous work on modeling multi-agent interactions from demonstrations is primarily constrained by assuming the independence among policies and their reward structures. In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework with explicit modeling of correlated policies by approximating opponents' policies, which can recover agents' policies that can regenerate similar interactions. Consequently, we develop a Decentralized Adversarial Imitation Learning algorithm with Correlated policies (CoDAIL), which allows for decentralized training and execution. Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators and outperforms state-of-the-art multi-agent imitation learning methods. Our code is available at \url{https://github.com/apexrl/CoDAIL}.

preprint2020arXiv

User Behavior Retrieval for Click-Through Rate Prediction

Click-through rate (CTR) prediction plays a key role in modern online personalization services. In practice, it is necessary to capture user's drifting interests by modeling sequential user behaviors to build an accurate CTR prediction model. However, as the users accumulate more and more behavioral data on the platforms, it becomes non-trivial for the sequential models to make use of the whole behavior history of each user. First, directly feeding the long behavior sequence will make online inference time and system load infeasible. Second, there is much noise in such long histories to fail the sequential model learning. The current industrial solutions mainly truncate the sequences and just feed recent behaviors to the prediction model, which leads to a problem that sequential patterns such as periodicity or long-term dependency are not embedded in the recent several behaviors but in far back history. To tackle these issues, in this paper we consider it from the data perspective instead of just designing more sophisticated yet complicated models and propose User Behavior Retrieval for CTR prediction (UBR4CTR) framework. In UBR4CTR, the most relevant and appropriate user behaviors will be firstly retrieved from the entire user history sequence using a learnable search method. These retrieved behaviors are then fed into a deep model to make the final prediction instead of simply using the most recent ones. It is highly feasible to deploy UBR4CTR into industrial model pipeline with low cost. Experiments on three real-world large-scale datasets demonstrate the superiority and efficacy of our proposed framework and models.

preprint2019arXiv

Identifying significant edges via neighborhood information

Heterogeneous nature of real networks implies that different edges play different roles in network structure and functions, and thus to identify significant edges is of high value in both theoretical studies and practical applications. We propose the so-called second-order neighborhood (SN) index to quantify an edge's significance in a network. We compare SN index with many other benchmark methods based on 15 real networks via edge percolation. Results show that the proposed SN index outperforms other well-known methods.