Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
35works
0followers
22topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

35 published item(s)

preprint2026arXiv

AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework

AI inference is becoming a persistent and geographically distributed source of electricity demand. Unlike many traditional electrical loads, inference workloads can sometimes be executed away from the user-facing service location, provided that latency, state locality, capacity, and regulatory constraints remain acceptable. This paper studies when such digital relocation of computation can be interpreted as latency-constrained relocation of electricity demand. We develop an energy-geography framework for geo-distributed AI inference. The framework models a three-layer architecture of clients, service nodes, and compute nodes, and formulates inference placement as a constrained optimization problem over electricity prices, marginal carbon intensity, power usage effectiveness, compute capacity, network latency, and migration frictions. The key object is the energy-latency frontier: the marginal cost and carbon benefit unlocked by relaxing inference latency budgets. The paper makes four contributions. First, it distinguishes physical electricity transmission from digital relocation of electricity-consuming computation. Second, it formulates a geo-distributed inference placement model with feasibility masks and migration frictions. Third, it introduces operational metrics, including relocatable inference demand, energy return on latency, carbon return on latency, and a relocation break-even condition. Fourth, it provides a transparent stylized simulation over representative global compute regions to show how heterogeneous latency tolerance separates workloads into local, regional, and energy-oriented execution layers. The results show that latency relaxation expands feasible geography, while migration frictions, egress costs, state locality, legal constraints, and capacity limits can sharply reduce realized benefits.

preprint2026arXiv

Are VLMs Seeing or Just Saying? Uncovering the Illusion of Visual Re-examination

Vision-Language Models (VLMs) often produce self-reflective statements like "let me check the figure again" during reasoning. Do such statements trigger genuine visual re-examination, or are they merely learned textual patterns? We investigate this via VisualSwap, an image-swap probing framework: after a model reasons over an image, we replace it with a visually similar but semantically different one and test whether the model notices. We introduce VS-Bench, 800 image pairs curated from MathVista, MathVerse, MathVision, and MMMU-Pro. Experiments on Qwen3-VL, Kimi-VL, and ERNIE-VL reveal a striking failure: models overwhelmingly miss the swap, with accuracy dropping by up to 60%. Counterintuitively, thinking models are nearly 3x more vulnerable than their instructed counterparts, and scaling offers no mitigation. Multi-turn user instructions restore visual grounding, but self-generated reflective statements during continuous generation do not. Attention analysis explains why: user instructions substantially elevate attention to visual tokens, whereas self-reflection does not. Current VLMs tend to say rather than actually see when claiming to perform visual re-examination. Our code and dataset are available at the project page: https://visualswap.github.io

preprint2026arXiv

Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning

The central challenge of AI for Science is not reasoning alone, but the ability to create computational methods in an open-ended scientific world. Existing LLM-based agents rely on static, pre-defined tool libraries, a paradigm that fundamentally fails in scientific domains where tools are sparse, heterogeneous, and intrinsically incomplete. In this paper, we propose Test-Time Tool Evolution (TTE), a new paradigm that enables agents to synthesize, verify, and evolve executable tools during inference. By transforming tools from fixed resources into problem-driven artifacts, TTE overcomes the rigidity and long-tail limitations of static tool libraries. To facilitate rigorous evaluation, we introduce SciEvo, a benchmark comprising 1,590 scientific reasoning tasks supported by 925 automatically evolved tools. Extensive experiments show that TTE achieves state-of-the-art performance in both accuracy and tool efficiency, while enabling effective cross-domain adaptation of computational tools. The code and benchmark have been released at https://github.com/lujiaxuan0520/Test-Time-Tool-Evol.

preprint2026arXiv

Data-centric Prompt Tuning for Dynamic Graphs

Dynamic graphs have attracted increasing attention due to their ability to model complex and evolving relationships in real-world scenarios. Traditional approaches typically pre-train models using dynamic link prediction and directly apply the resulting node temporal embeddings to specific downstream tasks. However, the significant differences among downstream tasks often lead to performance degradation, especially under few-shot settings. Prompt tuning has emerged as an effective solution to this problem. Existing prompting methods are often strongly coupled with specific model architectures or pretraining tasks, which makes it difficult to adapt to recent or future model designs. Moreover, their exclusive focus on modifying node or temporal features while neglecting spatial structural information leads to limited expressiveness and degraded performance. To address these limitations, we propose DDGPrompt, a data-centric prompting framework designed to effectively refine pre-trained node embeddings at the input data level, enabling better adaptability to diverse downstream tasks. We first define a unified node expression feature matrix that aggregates all relevant temporal and structural information of each node, ensuring compatibility with a wide range of dynamic graph models. Then, we introduce three prompt matrices (temporal bias, edge weight, and feature mask) to adjust the feature matrix completely, achieving task-specific adaptation of node embeddings. We evaluate DDGPrompt under a strict few-shot setting on four public dynamic graph datasets. Experimental results demonstrate that our method significantly outperforms traditional methods and prompting approaches in scenarios with limited labels and cold-start conditions.

preprint2026arXiv

Latent Action Reparameterization for Efficient Agent Inference

Large language model (LLM) agents often rely on long sequences of low-level textual actions, resulting in large effective decision horizons and high inference cost. While prior work has focused on improving inference efficiency through system-level optimizations or prompt engineering, we argue that a key bottleneck lies in the representation of the action space itself. We propose Latent Action Reparameterization (LAR), a framework that learns a compact latent action space in which each latent action corresponds to a multi-step semantic behavior. By reparameterizing agent actions into latent units, LAR enables decision making over a shorter effective horizon while preserving the expressiveness of the original action space. Unlike hand-crafted macros or hierarchical controllers, latent actions are learned from agent trajectories and integrated directly into the model, allowing both planning and execution to operate over abstract action representations. Across a range of LLM-based agent benchmarks, LAR significantly reduces the effective action horizon and improves inference efficiency under fixed compute budgets. As a consequence, our approach achieves substantial reductions in action tokens and corresponding wall-clock inference time, while maintaining or improving task success rates. These results suggest that action representation learning is a critical and underexplored factor in scaling efficient LLM agent inference, complementary to advances in model architecture and hardware.

preprint2026arXiv

RS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agents

The rise of multi-modal large language models (MLLMs) is shifting remote sensing (RS) intelligence from "see" to "action", as OpenClaw-style frameworks enable agents to autonomously operate massive RS image-processing tools for complex tasks. Existing RS agents adopt a passive selection paradigm for tool invocation, relying on either full tool registration (Flat) or retrieval-augmented generation (RAG). However, in the massive and multi-source heterogeneous RS tool ecosystem, such passive mechanisms struggle to dynamically balance "context load" and "toolset completeness" throughout task reasoning, thus exhibiting inherent limitations: full tool registration triggers context space deficits during long-horizon tasks, whereas RAG retrieval may omit critical tools in essential steps. To overcome these bottlenecks, this paper redefines tool selection by arguing that the agent should act as an active explorer within the tool space. Based on this perspective, we propose RS-Claw, a novel RS agent architecture. By leveraging Skill encapsulation technology at the tool end, this architecture hierarchically structures tool descriptions, enabling the agent to execute on-demand sequential decision-making: initially selecting relevant skill branches by reading only tool summaries, then dynamically loading detailed descriptions, and ultimately achieving precise invocation. This active paradigm not only significantly liberates the agent's context space but also effectively ensures the accurate hit rate of critical tools during long-horizon reasoning. Systematic experiments on the Earth-Bench benchmark demonstrate that RS-Claw's active exploration mechanism effectively filters semantic noise and substantially frees up reasoning space, achieving an input token compression ratio of up to 86%, and comprehensively outperforming existing Flat and RAG baselines across complex reasoning evaluations.

preprint2026arXiv

Single-orientation Crystalline Domains of Active Brownian Particles Lead to Collective Motions

Active Brownian particles, even without attractive and anisotropic inter-particle interactions, can form a high-density phase featuring structure-ordered domains as well as collective motion regions under thermal noise. However, the mechanism, particularly the relationship between the motion and structure, remains unclear. In this study, we show that the motion-correlation regions are spatially coincident with the single-orientation crystalline domains. Each domain translates or rotates as a whole due to the net active force or torque acting upon it, allowing relative motions between these crystalline domains. The particles at domain boundaries usually have the active forces pointing inward, which helps to stabilize these structure-ordered domains and their corresponding collective motion regions.

preprint2026arXiv

Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation

Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized under one objective. We introduce Uno-Orchestra, a unified orchestration policy that selectively decomposes a task and dispatches each subtask to an admissible (model, primitive) pair, with both decisions learned together from curated RL trajectories grounded in real worker interactions. Against 22 baselines on a 13-benchmark suite spanning math, code, knowledge, long-context, and agentic tool-use, Uno-Orchestra reaches 77.0% macro pass@1, roughly 16% above the strongest workflow baseline, at roughly an order of magnitude lower per-query cost, advancing the accuracy-efficiency frontier of selective delegation.

preprint2024arXiv

Deep Learning-Based Knowledge Injection for Metaphor Detection: A Comprehensive Review

Metaphor as an advanced cognitive modality works by extracting familiar concepts in the target domain in order to understand vague and abstract concepts in the source domain. This helps humans to quickly understand and master new domains and thus adapt to changing environments. With the continuous development of metaphor research in the natural language community, many studies using knowledge-assisted models to detect textual metaphors have emerged in recent years. Compared to not using knowledge, systems that introduce various kinds of knowledge achieve greater performance gains and reach SOTA in a recent study. Based on this, the goal of this paper is to provide a comprehensive review of research advances in the application of deep learning for knowledge injection in metaphor detection tasks. We will first systematically summarize and generalize the mainstream knowledge and knowledge injection principles. Then, the datasets, evaluation metrics, and benchmark models used in metaphor detection tasks are examined. Finally, we explore the current issues facing knowledge injection methods and provide an outlook on future research directions.

preprint2023arXiv

SAS: Self-Augmentation Strategy for Language Model Pre-training

The core of self-supervised learning for pre-training language models includes pre-training task design as well as appropriate data augmentation. Most data augmentations in language model pre-training are context-independent. A seminal contextualized augmentation was recently proposed in ELECTRA and achieved state-of-the-art performance by introducing an auxiliary generation network (generator) to produce contextualized data augmentation for the training of a main discrimination network (discriminator). This design, however, introduces extra computation cost of the generator and a need to adjust the relative capability between the generator and the discriminator. In this paper, we propose a self-augmentation strategy (SAS) where a single network is utilized for both regular pre-training and contextualized data augmentation for the training in later epochs. Essentially, this strategy eliminates a separate generator and uses the single network to jointly conduct two pre-training tasks with MLM (Masked Language Modeling) and RTD (Replaced Token Detection) heads. It avoids the challenge to search for an appropriate size of the generator, which is critical to the performance as evidenced in ELECTRA and its subsequent variant models. In addition, SAS is a general strategy that can be seamlessly combined with many new techniques emerging recently or in the future, such as the disentangled attention mechanism from DeBERTa. Our experiments show that SAS is able to outperform ELECTRA and other state-of-the-art models in the GLUE tasks with similar or less computation cost.

preprint2022arXiv

An Adaptive Contrastive Learning Model for Spike Sorting

Brain-computer interfaces (BCIs), is ways for electronic devices to communicate directly with the brain. For most medical-type brain-computer interface tasks, the activity of multiple units of neurons or local field potentials is sufficient for decoding. But for BCIs used in neuroscience research, it is important to separate out the activity of individual neurons. With the development of large-scale silicon technology and the increasing number of probe channels, artificially interpreting and labeling spikes is becoming increasingly impractical. In this paper, we propose a novel modeling framework: Adaptive Contrastive Learning Model that learns representations from spikes through contrastive learning based on the maximizing mutual information loss function as a theoretical basis. Based on the fact that data with similar features share the same labels whether they are multi-classified or binary-classified. With this theoretical support, we simplify the multi-classification problem into multiple binary-classification, improving both the accuracy and the runtime efficiency. Moreover, we also introduce a series of enhancements for the spikes, while solving the problem that the classification effect is affected because of the overlapping spikes.

preprint2022arXiv

Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration

Despite Graph Neural Networks (GNNs) have achieved remarkable accuracy, whether the results are trustworthy is still unexplored. Previous studies suggest that many modern neural networks are over-confident on the predictions, however, surprisingly, we discover that GNNs are primarily in the opposite direction, i.e., GNNs are under-confident. Therefore, the confidence calibration for GNNs is highly desired. In this paper, we propose a novel trustworthy GNN model by designing a topology-aware post-hoc calibration function. Specifically, we first verify that the confidence distribution in a graph has homophily property, and this finding inspires us to design a calibration GNN model (CaGCN) to learn the calibration function. CaGCN is able to obtain a unique transformation from logits of GNNs to the calibrated confidence for each node, meanwhile, such transformation is able to preserve the order between classes, satisfying the accuracy-preserving property. Moreover, we apply the calibration GNN to self-training framework, showing that more trustworthy pseudo labels can be obtained with the calibrated confidence and further improve the performance. Extensive experiments demonstrate the effectiveness of our proposed model in terms of both calibration and accuracy.

preprint2022arXiv

Evaluating Modules in Graph Contrastive Learning

The recent emergence of contrastive learning approaches facilitates the application on graph representation learning (GRL), introducing graph contrastive learning (GCL) into the literature. These methods contrast semantically similar and dissimilar sample pairs to encode the semantics into node or graph embeddings. However, most existing works only performed \textbf{model-level} evaluation, and did not explore the combination space of modules for more comprehensive and systematic studies. For effective \textbf{module-level} evaluation, we propose a framework that decomposes GCL models into four modules: (1) a \textbf{sampler} to generate anchor, positive and negative data samples (nodes or graphs); (2) an \textbf{encoder} and a \textbf{readout} function to get sample embeddings; (3) a \textbf{discriminator} to score each sample pair (anchor-positive and anchor-negative); and (4) an \textbf{estimator} to define the loss function. Based on this framework, we conduct controlled experiments over a wide range of architectural designs and hyperparameter settings on node and graph classification tasks. Specifically, we manage to quantify the impact of a single module, investigate the interaction between modules, and compare the overall performance with current model architectures. Our key findings include a set of module-level guidelines for GCL, e.g., simple samplers from LINE and DeepWalk are strong and robust; an MLP encoder associated with Sum readout could achieve competitive performance on graph classification. Finally, we release our implementations and results as OpenGCL, a modularized toolkit that allows convenient reproduction, standard model and module evaluation, and easy extension. OpenGCL is available at \url{https://github.com/thunlp/OpenGCL}.

preprint2022arXiv

Gain and loss induced topological insulating phase in a non Hermitian electrical circuit

There have been considerable efforts devoted to the study of topological phases in certain non-Hermitian systems that possess real eigenfrequencies in the presence of gain and loss. However, it is challenging to experimentally realize such non-Hermitian topological insulators in either quantum or photonic systems, due to the difficulties in introducing controlled gain and loss. On the other hand, the wide choices of active circuit components provide us with unprecedented convenience and flexibility in engineering non-Hermitian topological insulators in electrical circuits. Here, we report experimental realization of a one-dimensional (1D) non-Hermitian topological circuit which exhibits topologically protected edge state purely induced by gain and loss. We show that by tuning the value of the positive/negative resistors in the circuit, our system can switch between different topological phase regions. The topological edge states and interface states are observed at the circuit edge and at the interface between a trivial and nontrivial circuit, which are manifested by a prominent impedance peak at the mid-gap frequency topologically robust to variations of circuit parameters. Our work opens a new gateway towards actively controllable topological systems.

preprint2022arXiv

GReS: Graphical Cross-domain Recommendation for Supply Chain Platform

Supply Chain Platforms (SCPs) provide downstream industries with numerous raw materials. Compared with traditional e-commerce platforms, data in SCPs is more sparse due to limited user interests. To tackle the data sparsity problem, one can apply Cross-Domain Recommendation (CDR) which improves the recommendation performance of the target domain with the source domain information. However, applying CDR to SCPs directly ignores the hierarchical structure of commodities in SCPs, which reduce the recommendation performance. To leverage this feature, in this paper, we take the catering platform as an example and propose GReS, a graphical cross-domain recommendation model. The model first constructs a tree-shaped graph to represent the hierarchy of different nodes of dishes and ingredients, and then applies our proposed Tree2vec method combining GCN and BERT models to embed the graph for recommendations. Experimental results on a commercial dataset show that GReS significantly outperforms state-of-the-art methods in Cross-Domain Recommendation for Supply Chain Platforms.

preprint2022arXiv

GUIM -- General User and Item Embedding with Mixture of Representation in E-commerce

Our goal is to build general representation (embedding) for each user and each product item across Alibaba's businesses, including Taobao and Tmall which are among the world's biggest e-commerce websites. The representation of users and items has been playing a critical role in various downstream applications, including recommendation system, search, marketing, demand forecasting and so on. Inspired from the BERT model in natural language processing (NLP) domain, we propose a GUIM (General User Item embedding with Mixture of representation) model to achieve the goal with massive, structured, multi-modal data including the interactions among hundreds of millions of users and items. We utilize mixture of representation (MoR) as a novel representation form to model the diverse interests of each user. In addition, we use the InfoNCE from contrastive learning to avoid intractable computational costs due to the numerous size of item (token) vocabulary. Finally, we propose a set of representative downstream tasks to serve as a standard benchmark to evaluate the quality of the learned user and/or item embeddings, analogous to the GLUE benchmark in NLP domain. Our experimental results in these downstream tasks clearly show the comparative value of embeddings learned from our GUIM model.

preprint2022arXiv

Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives

Following SimCSE, contrastive learning based methods have achieved the state-of-the-art (SOTA) performance in learning sentence embeddings. However, the unsupervised contrastive learning methods still lag far behind the supervised counterparts. We attribute this to the quality of positive and negative samples, and aim to improve both. Specifically, for positive samples, we propose switch-case augmentation to flip the case of the first letter of randomly selected words in a sentence. This is to counteract the intrinsic bias of pre-trained token embeddings to frequency, word cases and subwords. For negative samples, we sample hard negatives from the whole dataset based on a pre-trained language model. Combining the above two methods with SimCSE, our proposed Contrastive learning with Augmented and Retrieved Data for Sentence embedding (CARDS) method significantly surpasses the current SOTA on STS benchmarks in the unsupervised setting.

preprint2022arXiv

KDD CUP 2022 Wind Power Forecasting Team 88VIP Solution

KDD CUP 2022 proposes a time-series forecasting task on spatial dynamic wind power dataset, in which the participants are required to predict the future generation given the historical context factors. The evaluation metrics contain RMSE and MAE. This paper describes the solution of Team 88VIP, which mainly comprises two types of models: a gradient boosting decision tree to memorize the basic data patterns and a recurrent neural network to capture the deep and latent probabilistic transitions. Ensembling these models contributes to tackle the fluctuation of wind power, and training submodels targets on the distinguished properties in heterogeneous timescales of forecasting, from minutes to days. In addition, feature engineering, imputation techniques and the design of offline evaluation are also described in details. The proposed solution achieves an overall online score of -45.213 in Phase 3.

preprint2022arXiv

Nonseparable Symplectic Neural Networks

Predicting the behaviors of Hamiltonian systems has been drawing increasing attention in scientific machine learning. However, the vast majority of the literature was focused on predicting separable Hamiltonian systems with their kinematic and potential energy terms being explicitly decoupled while building data-driven paradigms to predict nonseparable Hamiltonian systems that are ubiquitous in fluid dynamics and quantum mechanics were rarely explored. The main computational challenge lies in the effective embedding of symplectic priors to describe the inherently coupled evolution of position and momentum, which typically exhibits intricate dynamics. To solve the problem, we propose a novel neural network architecture, Nonseparable Symplectic Neural Networks (NSSNNs), to uncover and embed the symplectic structure of a nonseparable Hamiltonian system from limited observation data. The enabling mechanics of our approach is an augmented symplectic time integrator to decouple the position and momentum energy terms and facilitate their evolution. We demonstrated the efficacy and versatility of our method by predicting a wide range of Hamiltonian systems, both separable and nonseparable, including chaotic vortical flows. We showed the unique computational merits of our approach to yield long-term, accurate, and robust predictions for large-scale Hamiltonian systems by rigorously enforcing symplectomorphism.

preprint2022arXiv

Projection-free Graph-based Classifier Learning using Gershgorin Disc Perfect Alignment

In semi-supervised graph-based binary classifier learning, a subset of known labels $\hat{x}_i$ are used to infer unknown labels, assuming that the label signal $\mathbf{x}$ is smooth with respect to a similarity graph specified by a Laplacian matrix. When restricting labels $x_i$ to binary values, the problem is NP-hard. While a conventional semi-definite programming relaxation (SDR) can be solved in polynomial time using, for example, the alternating direction method of multipliers (ADMM), the complexity of projecting a candidate matrix $\mathbf{M}$ onto the positive semi-definite (PSD) cone ($\mathbf{M} \succeq 0$) per iteration remains high. In this paper, leveraging a recent linear algebraic theory called Gershgorin disc perfect alignment (GDPA), we propose a fast projection-free method by solving a sequence of linear programs (LP) instead. Specifically, we first recast the SDR to its dual, where a feasible solution $\mathbf{H} \succeq 0$ is interpreted as a Laplacian matrix corresponding to a balanced signed graph minus the last node. To achieve graph balance, we split the last node into two, each retains the original positive / negative edges, resulting in a new Laplacian $\bar{\mathbf{H}}$. We repose the SDR dual for solution $\bar{\mathbf{H}}$, then replace the PSD cone constraint $\bar{\mathbf{H}} \succeq 0$ with linear constraints derived from GDPA -- sufficient conditions to ensure $\bar{\mathbf{H}}$ is PSD -- so that the optimization becomes an LP per iteration. Finally, we extract predicted labels from converged solution $\bar{\mathbf{H}}$. Experiments show that our algorithm enjoyed a $28\times$ speedup over the next fastest scheme while achieving comparable label prediction performance.

preprint2022arXiv

Space4HGNN: A Novel, Modularized and Reproducible Platform to Evaluate Heterogeneous Graph Neural Network

Heterogeneous Graph Neural Network (HGNN) has been successfully employed in various tasks, but we cannot accurately know the importance of different design dimensions of HGNNs due to diverse architectures and applied scenarios. Besides, in the research community of HGNNs, implementing and evaluating various tasks still need much human effort. To mitigate these issues, we first propose a unified framework covering most HGNNs, consisting of three components: heterogeneous linear transformation, heterogeneous graph transformation, and heterogeneous message passing layer. Then we build a platform Space4HGNN by defining a design space for HGNNs based on the unified framework, which offers modularized components, reproducible implementations, and standardized evaluation for HGNNs. Finally, we conduct experiments to analyze the effect of different designs. With the insights found, we distill a condensed design space and verify its effectiveness.

preprint2022arXiv

Spatial Autoregressive Coding for Graph Neural Recommendation

Graph embedding methods including traditional shallow models and deep Graph Neural Networks (GNNs) have led to promising applications in recommendation. Nevertheless, shallow models especially random-walk-based algorithms fail to adequately exploit neighbor proximity in sampled subgraphs or sequences due to their optimization paradigm. GNN-based algorithms suffer from the insufficient utilization of high-order information and easily cause over-smoothing problems when stacking too much layers, which may deteriorate the recommendations of low-degree (long-tail) items, limiting the expressiveness and scalability. In this paper, we propose a novel framework SAC, namely Spatial Autoregressive Coding, to solve the above problems in a unified way. To adequately leverage neighbor proximity and high-order information, we design a novel spatial autoregressive paradigm. Specifically, we first randomly mask multi-hop neighbors and embed the target node by integrating all other surrounding neighbors with an explicit multi-hop attention. Then we reinforce the model to learn a neighbor-predictive coding for the target node by contrasting the coding and the masked neighbors' embedding, equipped with a new hard negative sampling strategy. To learn the minimal sufficient representation for the target-to-neighbor prediction task and remove the redundancy of neighbors, we devise Neighbor Information Bottleneck by maximizing the mutual information between target predictive coding and the masked neighbors' embedding, and simultaneously constraining those between the coding and surrounding neighbors' embedding. Experimental results on both public recommendation datasets and a real scenario web-scale dataset Douyin-Friend-Recommendation demonstrate the superiority of SAC compared with state-of-the-art methods.

preprint2021arXiv

CoRe: An Efficient Coarse-refined Training Framework for BERT

In recent years, BERT has made significant breakthroughs on many natural language processing tasks and attracted great attentions. Despite its accuracy gains, the BERT model generally involves a huge number of parameters and needs to be trained on massive datasets, so training such a model is computationally very challenging and time-consuming. Hence, training efficiency should be a critical issue. In this paper, we propose a novel coarse-refined training framework named CoRe to speed up the training of BERT. Specifically, we decompose the training process of BERT into two phases. In the first phase, by introducing fast attention mechanism and decomposing the large parameters in the feed-forward network sub-layer, we construct a relaxed BERT model which has much less parameters and much lower model complexity than the original BERT, so the relaxed model can be quickly trained. In the second phase, we transform the trained relaxed BERT model into the original BERT and further retrain the model. Thanks to the desired initialization provided by the relaxed model, the retraining phase requires much less training steps, compared with training an original BERT model from scratch with a random initialization. Experimental results show that the proposed CoRe framework can greatly reduce the training time without reducing the performance.

preprint2021arXiv

Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework

Semi-supervised learning on graphs is an important problem in the machine learning area. In recent years, state-of-the-art classification methods based on graph neural networks (GNNs) have shown their superiority over traditional ones such as label propagation. However, the sophisticated architectures of these neural models will lead to a complex prediction mechanism, which could not make full use of valuable prior knowledge lying in the data, e.g., structurally correlated nodes tend to have the same class. In this paper, we propose a framework based on knowledge distillation to address the above issues. Our framework extracts the knowledge of an arbitrary learned GNN model (teacher model), and injects it into a well-designed student model. The student model is built with two simple prediction mechanisms, i.e., label propagation and feature transformation, which naturally preserves structure-based and feature-based prior knowledge, respectively. In specific, we design the student model as a trainable combination of parameterized label propagation and feature transformation modules. As a result, the learned student can benefit from both prior knowledge and the knowledge in GNN teachers for more effective predictions. Moreover, the learned student model has a more interpretable prediction process than GNNs. We conduct experiments on five public benchmark datasets and employ seven GNN models including GCN, GAT, APPNP, SAGE, SGC, GCNII and GLP as the teacher models. Experimental results show that the learned student model can consistently outperform its corresponding teacher model by 1.4% - 4.7% on average. Code and data are available at https://github.com/BUPT-GAMMA/CPF

preprint2021arXiv

Incorporating Vision Bias into Click Models for Image-oriented Search Engine

Most typical click models assume that the probability of a document to be examined by users only depends on position, such as PBM and UBM. It works well in various kinds of search engines. However, in a search engine where massive candidate documents display images as responses to the query, the examination probability should not only depend on position. The visual appearance of an image-oriented document also plays an important role in its opportunity to be examined. In this paper, we assume that vision bias exists in an image-oriented search engine as another crucial factor affecting the examination probability aside from position. Specifically, we apply this assumption to classical click models and propose an extended model, to better capture the examination probabilities of documents. We use regression-based EM algorithm to predict the vision bias given the visual features extracted from candidate documents. Empirically, we evaluate our model on a dataset developed from a real-world online image-oriented search engine, and demonstrate that our proposed model can achieve significant improvements over its baseline model in data fitness and sparsity handling.

preprint2021arXiv

Re-rank Coarse Classification with Local Region Enhanced Features for Fine-Grained Image Recognition

Fine-grained image recognition is very challenging due to the difficulty of capturing both semantic global features and discriminative local features. Meanwhile, these two features are not easy to be integrated, which are even conflicting when used simultaneously. In this paper, a retrieval-based coarse-to-fine framework is proposed, where we re-rank the TopN classification results by using the local region enhanced embedding features to improve the Top1 accuracy (based on the observation that the correct category usually resides in TopN results). To obtain the discriminative regions for distinguishing the fine-grained images, we introduce a weakly-supervised method to train a box generating branch with only image-level labels. In addition, to learn more effective semantic global features, we design a multi-level loss over an automatically constructed hierarchical category structure. Experimental results show that our method achieves state-of-the-art performance on three benchmarks: CUB-200-2011, Stanford Cars, and FGVC Aircraft. Also, visualizations and analysis are provided for better understanding.

preprint2020arXiv

Adaptive Graph Encoder for Attributed Graph Embedding

Attributed graph embedding, which learns vector representations from graph topology and node features, is a challenging task for graph analysis. Recently, methods based on graph convolutional networks (GCNs) have made great progress on this task. However,existing GCN-based methods have three major drawbacks. Firstly,our experiments indicate that the entanglement of graph convolutional filters and weight matrices will harm both the performance and robustness. Secondly, we show that graph convolutional filters in these methods reveal to be special cases of generalized Laplacian smoothing filters, but they do not preserve optimal low-pass characteristics. Finally, the training objectives of existing algorithms are usually recovering the adjacency matrix or feature matrix, which are not always consistent with real-world applications. To address these issues, we propose Adaptive Graph Encoder (AGE), a novel attributed graph embedding framework. AGE consists of two modules: (1) To better alleviate the high-frequency noises in the node features, AGE first applies a carefully-designed Laplacian smoothing filter. (2) AGE employs an adaptive encoder that iteratively strengthens the filtered features for better node embeddings. We conduct experiments using four public benchmark datasets to validate AGE on node clustering and link prediction tasks. Experimental results show that AGE consistently outperforms state-of-the-art graph embedding methods considerably on these tasks.

preprint2020arXiv

Few-shot Knowledge Transfer for Fine-grained Cartoon Face Generation

In this paper, we are interested in generating fine-grained cartoon faces for various groups. We assume that one of these groups consists of sufficient training data while the others only contain few samples. Although the cartoon faces of these groups share similar style, the appearances in various groups could still have some specific characteristics, which makes them differ from each other. A major challenge of this task is how to transfer knowledge among groups and learn group-specific characteristics with only few samples. In order to solve this problem, we propose a two-stage training process. First, a basic translation model for the basic group (which consists of sufficient data) is trained. Then, given new samples of other groups, we extend the basic model by creating group-specific branches for each new group. Group-specific branches are updated directly to capture specific appearances for each group while the remaining group-shared parameters are updated indirectly to maintain the distribution of intermediate feature space. In this manner, our approach is capable to generate high-quality cartoon faces for various groups.

preprint2020arXiv

Graph Metric Learning via Gershgorin Disc Alignment

We propose a fast general projection-free metric learning framework, where the minimization objective $\min_{\textbf{M} \in \mathcal{S}} Q(\textbf{M})$ is a convex differentiable function of the metric matrix $\textbf{M}$, and $\textbf{M}$ resides in the set $\mathcal{S}$ of generalized graph Laplacian matrices for connected graphs with positive edge weights and node degrees. Unlike low-rank metric matrices common in the literature, $\mathcal{S}$ includes the important positive-diagonal-only matrices as a special case in the limit. The key idea for fast optimization is to rewrite the positive definite cone constraint in $\mathcal{S}$ as signal-adaptive linear constraints via Gershgorin disc alignment, so that the alternating optimization of the diagonal and off-diagonal terms in $\textbf{M}$ can be solved efficiently as linear programs via Frank-Wolfe iterations. We prove that the Gershgorin discs can be aligned perfectly using the first eigenvector $\textbf{v}$ of $\textbf{M}$, which we update iteratively using Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) with warm start as diagonal / off-diagonal terms are optimized. Experiments show that our efficiently computed graph metric matrices outperform metrics learned using competing methods in terms of classification tasks.

preprint2020arXiv

Improving Sequence Modeling Ability of Recurrent Neural Networks via Sememes

Sememes, the minimum semantic units of human languages, have been successfully utilized in various natural language processing applications. However, most existing studies exploit sememes in specific tasks and few efforts are made to utilize sememes more fundamentally. In this paper, we propose to incorporate sememes into recurrent neural networks (RNNs) to improve their sequence modeling ability, which is beneficial to all kinds of downstream tasks. We design three different sememe incorporation methods and employ them in typical RNNs including LSTM, GRU and their bidirectional variants. In evaluation, we use several benchmark datasets involving PTB and WikiText-2 for language modeling, SNLI for natural language inference and another two datasets for sentiment analysis and paraphrase detection. Experimental results show evident and consistent improvement of our sememe-incorporated models compared with vanilla RNNs, which proves the effectiveness of our sememe incorporation methods. Moreover, we find the sememe-incorporated models have higher robustness and outperform adversarial training in defending adversarial attack. All the code and data of this work can be obtained at https://github.com/thunlp/SememeRNN.

preprint2020arXiv

Joint Demosaicking / Rectification of Fisheye Camera Images using Multi-color Graph Laplacian Regularization

To compose a 360 image from a rig with multiple fisheye cameras, a conventional processing pipeline first performs demosaicking on each fisheye camera's Bayer-patterned grid, then translates demosaicked pixels from the camera grid to a rectified image grid---thus performing two image interpolation steps in sequence. Hence interpolation errors can accumulate, and acquisition noise in the captured pixels can pollute neighbors in two consecutive processing stages. In this paper, we propose a joint processing framework that performs demosaicking and grid-to-grid mapping simultaneously---thus limiting noise pollution to one interpolation. Specifically, we first obtain a reverse mapping function from a regular on-grid location in the rectified image to an irregular off-grid location in the camera's Bayer-patterned image. For each pair of adjacent pixels in the rectified grid, we estimate its gradient using the pair's neighboring pixel gradients in three colors in the Bayer-patterned grid. We construct a similarity graph based on the estimated gradients, and interpolate pixels in the rectified grid directly via graph Laplacian regularization (GLR). Experiments show that our joint method outperforms several competing local methods that execute demosaicking and rectification in sequence, by up to 0.52 dB in PSNR and 0.086 in SSIM on the publicly available dataset, and by up to 5.53dB in PSNR and 0.411 in SSIM on the in-house constructed dataset.

preprint2020arXiv

MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space

As an essential step towards computer creativity, automatic poetry generation has gained increasing attention these years. Though recent neural models make prominent progress in some criteria of poetry quality, generated poems still suffer from the problem of poor diversity. Related literature researches show that different factors, such as life experience, historical background, etc., would influence composition styles of poets, which considerably contributes to the high diversity of human-authored poetry. Inspired by this, we propose MixPoet, a novel model that absorbs multiple factors to create various styles and promote diversity. Based on a semi-supervised variational autoencoder, our model disentangles the latent space into some subspaces, with each conditioned on one influence factor by adversarial training. In this way, the model learns a controllable latent variable to capture and mix generalized factor-related properties. Different factor mixtures lead to diverse styles and hence further differentiate generated poems from each other. Experiment results on Chinese poetry demonstrate that MixPoet improves both diversity and quality against three state-of-the-art models.

preprint2020arXiv

Self-organized synchronization of phonon lasers

Self-organized synchronization is a ubiquitous collective phenomenon, in which each unit adjusts their rhythms to achieve synchrony through mutual interactions. The optomechanical systems, due to their inherently engineerable nonlinearities, provide an ideal platform to study self-organized synchronization. Here, we demonstrate the self-organized synchronization of phonon lasers in a two-membrane-in-the-middle optomechanical system. The probe of individual membrane enables to monitor the real-time transient dynamics of synchronization, which reveal that the system enters into the synchronization regime via torus birth bifurcation line. The phase-locking phenomenon and the transition between in-phase and anti-phase regimes are directly observed. Moreover, such a system greatly facilitate the controllable synchronous states, and consequently a phononic memory is realized by tuning the system parameters. This result is an important step towards the future studies of many-body collective behaviors in multiresonator optomechanics with long distances, and might find potential applications in quantum information processing and complex networks.

preprint2020arXiv

The helicity uniqueness conjecture in 3D hydrodynamics

We prove that the helicity is the only regular Casimir function for the coadjoint action of the volume-preserving diffeomorphism group $\text{SDiff}(M)$ on smooth exact divergence-free vector fields on a closed three-dimensional manifold $M$. More precisely, any regular $C^1$ functional defined on the space of $C^\infty$ (more generally, $C^k$, $k\ge 4$) exact divergence-free vector fields and invariant under arbitrary volume-preserving diffeomorphisms can be expressed as a $C^1$ function of the helicity. This gives a complete description of Casimirs for adjoint and coadjoint actions of $\text{SDiff}(M)$ in 3D and completes the proof of Arnold-Khesin's 1998 conjecture for a manifold $M$ with trivial first homology group. Our proofs make use of different tools from the theory of dynamical systems, including normal forms for divergence-free vector fields, the Poincaré-Birkhoff theorem, and a division lemma for vector fields with hyperbolic zeros.

preprint2019arXiv

Graph Sampling for Matrix Completion Using Recurrent Gershgorin Disc Shift

Matrix completion algorithms fill missing entries in a large matrix given a subset of observed samples. However, how to best pre-select informative matrix entries given a sampling budget is largely unaddressed. In this paper, we propose a fast sample selection strategy for matrix completion from a graph signal processing perspective. Specifically, we first regularize the matrix reconstruction objective using a dual graph signal smoothness prior, resulting in a system of linear equations for solution. We then select appropriate samples to maximize the smallest eigenvalue $λ_{\min}$ of the coefficient matrix, thus maximizing the stability of the linear system. To efficiently solve this combinatorial problem, we derive a greedy sampling strategy, leveraging on Gershgorin circle theorem, that iteratively selects one sample (equivalent to shifting one Gershgorin disc) at a time corresponding to the largest magnitude entry in the first eigenvector of a modified graph Laplacian matrix. Our algorithm benefits computationally from warm start as the first eigenvectors of incremented Laplacian matrices are computed recurrently for more samples. To achieve computation scalability when sampling large matrices, we further rewrite the coefficient matrix as a sum of two separate components, each of which exhibits block-diagonal structure that we exploit for alternating block-wise sampling. Extensive experiments on both synthetic and real-world datasets show that our graph sampling algorithm substantially outperforms existing sampling schemes for matrix completion and reduces the completion error, when combined with a range of modern matrix completion algorithms.