Researcher profile

Can Xu

Can Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
13topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models

Recent advances in synergizing large reasoning models (LRMs) with retrieval-augmented generation (RAG) have shown promising results, yet two critical challenges remain: (1) reasoning models typically operate from a single, unchallenged perspective, limiting their ability to conduct deep, self-correcting reasoning over external documents, and (2) existing training paradigms rely excessively on outcome-oriented rewards, which provide insufficient signal for shaping the complex, multi-step reasoning process. To address these issues, we propose an Reasoner-Verifier framework named Adversarial Reasoning RAG (ARR). The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage that requires no external scoring model. This reward combines explicit observational signals with internal model uncertainty to jointly optimize reasoning fidelity and verification rigor. Experiments on multiple benchmarks demonstrate the effectiveness of our method.

preprint2023arXiv

Extracting optical parameters of Cu-Mn-Fe spinel oxide nanoparticles for optimizing air-stable, high-efficiency solar selective coatings

High-temperature Cu-Mn-Fe spinel-oxide nanoparticle solar selective absorber coatings are investigated experimentally and theoretically. A reliable, general approach to evaluate absorption coefficient spectra from the optical measurements of the nanoparticle-pigmented coatings is developed based on solving the inverse problem using four-flux-radiative method. The derived absorption properties of NP materials can be directly applied to predict the solar absorptance, optimize the nanoparticle-pigmented coatings, and analyze the thermal degradation, which agree well with the experimental results. The analysis reveals that the Cu-Mn-Fe spinel oxides are fundamentally indirect bandgap ranging from 1.7 to 2.1 eV, while iron-free CuMn2O4 is a direct bandgap material with Eg=1.84 eV. With the same coating thickness and nanoparticle load, the solar absorptance ranks in the order of Mn2O3 < MnFe2O4 < CuFe2O4 < CuFeMnO4 < CuMn2O4. The optimized spray-coated iron-free CuMn2O4 NP-pigmented coating demonstrates a high solar absorptance of 97%, a low emittance of 55%, a high optical-to-thermal energy conversion efficiency of ~93.5 % under 1000x solar concentration at 750 degrees C, and long-term endurance upon thermal cycling between 750°C and room temperature in air. The optical parameter analysis approach can be easily extended to other material systems to facilitate the searching and optimizing high-temperature pigmented-solar selective coatings.

preprint2023arXiv

Spinel Cu-Mn-Cr Oxide Nanoparticle-Pigmented Solar Selective Coatings Maintaining >94% Efficiency at 750 degrees C

High-temperature concentrating solar power (CSP) system is capable of harvesting and storing solar energy as heat towards cost-effective dispatchable solar electricity. Solar selective coating is a critical component to boost its efficiency by maximizing solar absorptance and minimizing thermal emittance losses. However, maintaining a high solar-thermal conversion efficiency >90% for long-term operation at >750 degrees C remains a significant challenge. Herein, we report spray-coated spinel Cu-Mn-Cr oxide nanoparticle-pigmented solar selective coatings on Inconel tube sections maintaining >94% efficiency at 750 degrees C and >92.5% at 800 degrees C under 1000x solar concentration after 60 simulated day-night thermal cycles in air, each cycle comprising 12h at 750 degrees C/800 degrees C and 12h cooling to 25 degrees C. The solar spectral selectivity is intrinsic to the band-to-band and d-d transitions of non-stoichiometric spinel Cu-Mn-Cr oxide nanoparticles by balancing the lattice site inversion of Cu2+ and Mn3+ on tetrahedral vs. octahedral sites. This feature offers a large fabrication tolerance in nanoparticle volume fraction and coating thickness, facilitating low-cost and scalable spray-coated high-efficiency solar selective absorbers for high-temperature CSP systems.

preprint2022arXiv

Contextual Fine-to-Coarse Distillation for Coarse-grained Response Selection in Open-Domain Conversations

We study the problem of coarse-grained response selection in retrieval-based dialogue systems. The problem is equally important with fine-grained response selection, but is less explored in existing literature. In this paper, we propose a Contextual Fine-to-Coarse (CFC) distilled model for coarse-grained response selection in open-domain conversations. In our CFC model, dense representations of query, candidate response and corresponding context is learned based on the multi-tower architecture, and more expressive knowledge learned from the one-tower architecture (fine-grained) is distilled into the multi-tower architecture (coarse-grained) to enhance the performance of the retriever. To evaluate the performance of our proposed model, we construct two new datasets based on the Reddit comments dump and Twitter corpus. Extensive experimental results on the two datasets show that the proposed methods achieve a significant improvement over all evaluation metrics compared with traditional baseline methods.

preprint2022arXiv

Learning to Ground Visual Objects for Visual Dialog

Visual dialog is challenging since it needs to answer a series of coherent questions based on understanding the visual environment. How to ground related visual objects is one of the key problems. Previous studies utilize the question and history to attend to the image and achieve satisfactory performance, however these methods are not sufficient to locate related visual objects without any guidance. The inappropriate grounding of visual objects prohibits the performance of visual dialog models. In this paper, we propose a novel approach to Learn to Ground visual objects for visual dialog, which employs a novel visual objects grounding mechanism where both prior and posterior distributions over visual objects are used to facilitate visual objects grounding. Specifically, a posterior distribution over visual objects is inferred from both context (history and questions) and answers, and it ensures the appropriate grounding of visual objects during the training process. Meanwhile, a prior distribution, which is inferred from context only, is used to approximate the posterior distribution so that appropriate visual objects can be grounded even without answers during the inference process. Experimental results on the VisDial v0.9 and v1.0 datasets demonstrate that our approach improves the previous strong models in both generative and discriminative settings by a significant margin.

preprint2022arXiv

LFGCF: Light Folksonomy Graph Collaborative Filtering for Tag-Aware Recommendation

Tag-aware recommendation is a task of predicting a personalized list of items for a user by their tagging behaviors. It is crucial for many applications with tagging capabilities like last.fm or movielens. Recently, many efforts have been devoted to improving Tag-aware recommendation systems (TRS) with Graph Convolutional Networks (GCN), which has become new state-of-the-art for the general recommendation. However, some solutions are directly inherited from GCN without justifications, which is difficult to alleviate the sparsity, ambiguity, and redundancy issues introduced by tags, thus adding to difficulties of training and degrading recommendation performance. In this work, we aim to simplify the design of GCN to make it more concise for TRS. We propose a novel tag-aware recommendation model named Light Folksonomy Graph Collaborative Filtering (LFGCF), which only includes the essential GCN components. Specifically, LFGCF first constructs Folksonomy Graphs from the records of user assigning tags and item getting tagged. Then we leverage the simple design of aggregation to learn the high-order representations on Folksonomy Graphs and use the weighted sum of the embeddings learned at several layers for information updating. We share tags embeddings to bridge the information gap between users and items. Besides, a regularization function named TransRT is proposed to better depict user preferences and item features. Extensive hyperparameters experiments and ablation studies on three real-world datasets show that LFGCF uses fewer parameters and significantly outperforms most baselines for the tag-aware top-N recommendations.

preprint2022arXiv

Multimodal Dialogue Response Generation

Responsing with image has been recognized as an important capability for an intelligent conversational agent. Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods. To fill in the gaps, we first present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response. Learning such a model often requires multimodal dialogues containing both texts and images which are difficult to obtain. Motivated by the challenge in practice, we consider multimodal dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of text-only dialogues and text-image pairs respectively, then the whole parameters can be well fitted using the limited training examples. Extensive experiments demonstrate our method achieves state-of-the-art results in both automatic and human evaluation, and can generate informative text and high-resolution image responses.

preprint2022arXiv

On spanning tree edge denpendences of graphs

Let $τ(G)$ and $τ_G(e)$ be the number of spanning trees of a connected graph $G$ and the number of spanning trees of $G$ containing edge $e$. The ratio $d_{G}(e)=τ_{G}(e)/τ(G)$ is called the spanning tree edge density of $e$, or simply density of $e$. The maximum density $\mbox{dep}(G)=\max\limits_{e\in E(G)}d_{G}(e)$ is called the spanning tree edge dependence of $G$, or simply dependence of $G$. Given a rational number $p/q\in (0,1)$, if there exists a graph $G$ and an edge $e\in E(G)$ such that $d_{G}(e)=p/q$, then we say the density $p/q$ is constructible. More specially, if there exists a graph $G$ such that $\mbox{dep}(G)=p/q$, then we say the dependence $p/q$ is constructible. In 2002, Ferrara, Gould, and Suffel raised the open problem of which rational densities and dependences are constructible. In 2016, Kahl provided constructions that show all rational densities and dependences are constructible. Moreover, He showed that all rational densities are constructible even if $G$ is restricted to bipartite graphs or planar graphs. He thus conjectured that all rational dependences are also constructible even if $G$ is restricted to bipartite graphs (Conjecture 1), or planar graphs (Conjecture 2). In this paper, by combinatorial and electric network approach, firstly, we show that all rational dependences are constructible via bipartite graphs, which confirms the first conjecture of Kahl. Secondly, we show that all rational dependences are constructible for planar multigraphs, which confirms Kahl&#39;s second conjecture for planar multigraphs. However, for (simple) planar graphs, we disprove the second conjecture of Kahl by showing that the dependence of any planar graph is larger than $\frac{1}{3}$. On the other hand, we construct a family of planar graphs that show all rational dependences $p/q>\frac{1}{2}$ are constructible via planar graphs.

preprint2022arXiv

PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based D}ata Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.

preprint2022arXiv

Recency Dropout for Recurrent Recommender Systems

Recurrent recommender systems have been successful in capturing the temporal dynamics in users&#39; activity trajectories. However, recurrent neural networks (RNNs) are known to have difficulty learning long-term dependencies. As a consequence, RNN-based recommender systems tend to overly focus on short-term user interests. This is referred to as the recency bias, which could negatively affect the long-term user experience as well as the health of the ecosystem. In this paper, we introduce the recency dropout technique, a simple yet effective data augmentation technique to alleviate the recency bias in recurrent recommender systems. We demonstrate the effectiveness of recency dropout in various experimental settings including a simulation study, offline experiments, as well as live experiments on a large-scale industrial recommendation platform.

preprint2022arXiv

Stylized Knowledge-Grounded Dialogue Generation via Disentangled Template Rewriting

Current Knowledge-Grounded Dialogue Generation (KDG) models specialize in producing rational and factual responses. However, to establish long-term relationships with users, the KDG model needs the capability to generate responses in a desired style or attribute. Thus, we study a new problem: Stylized Knowledge-Grounded Dialogue Generation (SKDG). It presents two challenges: (1) How to train a SKDG model where no <context, knowledge, stylized response> triples are available. (2) How to cohere with context and preserve the knowledge when generating a stylized response. In this paper, we propose a novel disentangled template rewriting (DTR) method which generates responses via combing disentangled style templates (from monolingual stylized corpus) and content templates (from KDG corpus). The entire framework is end-to-end differentiable and learned without supervision. Extensive experiments on two benchmarks indicate that DTR achieves a significant improvement on all evaluation metrics compared with previous state-of-the-art stylized dialogue generation methods. Besides, DTR achieves comparable performance with the state-of-the-art KDG methods in standard KDG evaluation setting.

preprint2022arXiv

TegTok: Augmenting Text Generation via Task-specific and Open-world Knowledge

Generating natural and informative texts has been a long-standing problem in NLP. Much effort has been dedicated into incorporating pre-trained language models (PLMs) with various open-world knowledge, such as knowledge graphs or wiki pages. However, their ability to access and manipulate the task-specific knowledge is still limited on downstream tasks, as this type of knowledge is usually not well covered in PLMs and is hard to acquire. To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework. Our model selects knowledge entries from two types of knowledge sources through dense retrieval and then injects them into the input encoding and output decoding stages respectively on the basis of PLMs. With the help of these two types of knowledge, our model can learn what and how to generate. Experiments on two text generation tasks of dialogue generation and question generation, and on two datasets show that our method achieves better performance than various baseline models.

preprint2022arXiv

Tiered synchronization in coupled oscillator populations with interaction delays and higher-order interactions

We study synchronization in large populations of coupled phase oscillators with time-delays, higher order interactions. With each of these effects individually giving rise to bistabiltiy between incoherence and synchronization via a subcriticality at the onset of synchronization and the development of a saddle node, we find that their combination yields another mechanism behind bistability, where supercriticality at onset may be maintained and instead the formation of two saddle nodes creates tiered synchronization, i.e., bistability between a weakly synchronized state and a strongly synchronized state. We demonstrate these findings by first deriving the low dimensional dynamics of the system and examining the system bifurcations using a stability and steady-state analysis.

preprint2022arXiv

Towards Robust Ranker for Text Retrieval

A ranker plays an indispensable role in the de facto &#39;retrieval & rerank&#39; pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranker. Thereby, we propose multiple retrievers as negative generators improve the ranker&#39;s robustness, where i) involving extensive out-of-distribution label noises renders the ranker against each noise distribution, and ii) diverse hard negatives from a joint distribution are relatively close to the ranker&#39;s negative distribution, leading to more challenging thus effective training. To evaluate our robust ranker (dubbed R$^2$anker), we conduct experiments in various settings on the popular passage retrieval benchmark, including BM25-reranking, full-ranking, retriever distillation, etc. The empirical results verify the new state-of-the-art effectiveness of our model.

preprint2021arXiv

Learning Matching Representations for Individualized Organ Transplantation Allocation

Organ transplantation is often the last resort for treating end-stage illness, but the probability of a successful transplantation depends greatly on compatibility between donors and recipients. Current medical practice relies on coarse rules for donor-recipient matching, but is short of domain knowledge regarding the complex factors underlying organ compatibility. In this paper, we formulate the problem of learning data-driven rules for organ matching using observational data for organ allocations and transplant outcomes. This problem departs from the standard supervised learning setup in that it involves matching the two feature spaces (i.e., donors and recipients), and requires estimating transplant outcomes under counterfactual matches not observed in the data. To address these problems, we propose a model based on representation learning to predict donor-recipient compatibility; our model learns representations that cluster donor features, and applies donor-invariant transformations to recipient features to predict outcomes for a given donor-recipient feature instance. Experiments on semi-synthetic and real-world datasets show that our model outperforms state-of-art allocation methods and policies executed by human experts.

preprint2020arXiv

Bifurcation analysis and structural stability of simplicial oscillator populations

We present an analytical description for the collective dynamics of oscillator ensembles with higher-order coupling encoded by simplicial structure, which serves as an illustrative and insightful paradigm for brain function and information storage. The novel dynamics of the system, including abrupt desynchronization and multistability, are rigorously characterized and the critical points that correspond to a continuum of first-order phase transitions are found to satisfy universal scaling properties. More importantly, the underlying bifurcation mechanism giving rise to multiple clusters with arbitrary ensemble size is characterized using a rigorous spectral analysis of the stable cluster states. As a consequence of $SO_2$ group symmetry, we show that the continuum of abrupt desynchronization transitions result from the instability of a collective mode under the nontrivial antisymmetric manifold in the high dimensional phase space.

preprint2020arXiv

Low-Resource Knowledge-Grounded Dialogue Generation

Responding with knowledge has been recognized as an important capability for an intelligent conversational agent. Yet knowledge-grounded dialogues, as training data for learning such a response generation model, are difficult to obtain. Motivated by the challenge in practice, we consider knowledge-grounded dialogue generation under a natural assumption that only limited training examples are available. In such a low-resource setting, we devise a disentangled response decoder in order to isolate parameters that depend on knowledge-grounded dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of ungrounded dialogues and unstructured documents, while the remaining small parameters can be well fitted using the limited training examples. Evaluation results on two benchmarks indicate that with only 1/8 training data, our model can achieve the state-of-the-art performance and generalize well on out-of-domain knowledge.

preprint2020arXiv

Spectrum of extensive multiclusters in the Kuramoto model with higher-order interactions

Globally coupled ensembles of phase oscillators serve as useful tools for modeling synchronization and collective behavior in a variety of applications. As interest in the effects of simplicial interactions (i.e., non-additive, higher-order interactions between three or more units) continues to grow we study an extension of the Kuramoto model where oscillators are coupled via three-way interactions that exhibits novel dynamical properties including clustering, multistability, and abrupt desynchronization transitions. Here we provide a rigorous description of the stability of various multicluster states by studying their spectral properties in the thermodynamic limit. Not unlike the classical Kuramoto model, a natural frequency distribution with infinite support yields a population of drifting oscillators, which in turn guarantees that a portion of the spectrum is located on the imaginary axes, resulting in neutrally stable or unstable solutions. On the other hand, a natural frequency distribution with finite support allows for a fully phase-locked state, whose spectrum is real and may be linearly stable or unstable.

preprint2019arXiv

Low-Resource Response Generation with Template Prior

We study open domain response generation with limited message-response pairs. The problem exists in real-world applications but is less explored by the existing work. Since the paired data now is no longer enough to train a neural generation model, we consider leveraging the large scale of unpaired data that are much easier to obtain, and propose response generation with both paired and unpaired data. The generation model is defined by an encoder-decoder architecture with templates as prior, where the templates are estimated from the unpaired data as a neural hidden semi-markov model. By this means, response generation learned from the small paired data can be aided by the semantic and syntactic knowledge in the large unpaired data. To balance the effect of the prior and the input message to response generation, we propose learning the whole generation model with an adversarial approach. Empirical studies on question response generation and sentiment response generation indicate that when only a few pairs are available, our model can significantly outperform several state-of-the-art response generation models in terms of both automatic and human evaluation.