Source author record

Xike Xie

Xike Xie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Databases Distributed, Parallel, and Cluster Computing Machine Learning Computation and Language eess.SP Information Retrieval math.NA Numerical Analysis Social and Information Networks

Catalog footprint

What is connected

7works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

TRACE: Tourism Recommendation with Accountable Citation Evidence

Tourism is a high-stakes setting for conversational recommender systems (CRS): a plausible-sounding suggestion can waste real money and trip time once a traveler acts on it. Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over entity mentions, and tourism-specific resources add spatial or knowledge-graph context, yet none of them couple multi-turn recommendation with verbatim review-span evidence and rejection recovery. This leaves an evaluation gap for tourism recommendation that is simultaneously trustworthy, verifiable, and adaptive: recommend the right point of interest (POI) for multi-aspect preferences (such as cuisine, price, atmosphere, walking distance), justify each suggestion with verifiable evidence from prior visitors so the traveler can act without trial and error, and recover when the first recommendation is rejected mid-dialogue. We introduce TRACE, where each item is a multi-turn tourism recommendation dialogue with review-span citations and explicit rejection turns: 10,000 dialogues over 2,400 Yelp POIs and 34,208 reviews across eight U.S. cities, paired with 14 retrieval, planning, and LLM baselines, along with 25 metrics organized under Accuracy, Grounding, and Recovery. Across these baselines, TRACE reveals the Three-Competency Gap: LLM Zero-Shot leads in closed-set Recall@1 and rejection recovery but cites less densely than retrievers; non-LLM retrievers achieve surface-verbatim grounding but with low accuracy; Multi-Review Synthesis fails at recovery. The Grounding Score agrees with human citation precision (Spearman rho=+0.80, p<10^-20), and paired t-tests reproduce the per-baseline ranking (p<0.01 on the dominant contrasts). TRACE reframes accountable tourism recommendation as a joint target (right POI, verifiable evidence, adaptive repair) rather than a single-axis leaderboard.

preprint2023arXiv

Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning

Interactive data exploration (IDE) is an effective way of comprehending big data, whose volume and complexity are beyond human abilities. The main goal of IDE is to discover user interest regions from a database through multi-rounds of user labelling. Existing IDEs adopt active-learning framework, where users iteratively discriminate or label the interestingness of selected tuples. The process of data exploration can be viewed as the process of training a classifier, which determines whether a database tuple is interesting to a user. An efficient exploration thus takes very few iterations of user labelling to reach the data region of interest. In this work, we consider the data exploration as the process of few-shot learning, where the classifier is learned with only a few training examples, or exploration iterations. To this end, we propose a learning-to-explore framework, based on meta-learning, which learns how to learn a classifier with automatically generated meta-tasks, so that the exploration process can be much shortened. Extensive experiments on real datasets show that our proposal outperforms existing explore-by-example solutions in terms of accuracy and efficiency.

preprint2022arXiv

Clustering-based Partitioning for Large Web Graphs

Graph partitioning plays a vital role in distributedlarge-scale web graph analytics, such as pagerank and labelpropagation. The quality and scalability of partitioning strategyhave a strong impact on such communication- and computation-intensive applications, since it drives the communication costand the workload balance among distributed computing nodes.Recently, the streaming model shows promise in optimizing graphpartitioning. However, existing streaming partitioning strategieseither lack of adequate quality or fall short in scaling with alarge number of partitions.In this work, we explore the property of web graph clusteringand propose a novel restreaming algorithm for vertex-cut parti-tioning. We investigate a series of techniques, which are pipelinedas three steps, streaming clustering, cluster partitioning, andpartition transformation. More, these techniques can be adaptedto a parallel mechanism for further acceleration of partitioning.Experiments on real datasets and real systems show that ouralgorithm outperforms state-of-the-art vertex-cut partitioningmethods in large-scale web graph processing. Surprisingly, theruntime cost of our method can be an order of magnitude lowerthan that of one-pass streaming partitioning algorithms, whenthe number of partitions is large.

preprint2022arXiv

GX-Plug: a Middleware for Plugging Accelerators to Distributed Graph Processing

Recently, research communities highlight the necessity of formulating a scalability continuum for large-scale graph processing, which gains the scale-out benefits from distributed graph systems, and the scale-up benefits from high-performance accelerators. To this end, we propose a middleware, called the GX-plug, for the ease of integrating the merits of both. As a middleware, the GX-plug is versatile in supporting different runtime environments, computation models, and programming models. More, for improving the middleware performance, we study a series of techniques, including pipeline shuffle, synchronization caching and skipping, and workload balancing, for intra-, inter-, and beyond-iteration optimizations, respectively. Experiments show that our middleware efficiently plugs accelerators to representative distributed graph systems, e.g., GraphX and Powergraph, with up-to 20x acceleration ratio.

preprint2021arXiv

STUaNet: Understanding uncertainty in spatiotemporal collective human mobility

The high dynamics and heterogeneous interactions in the complicated urban systems have raised the issue of uncertainty quantification in spatiotemporal human mobility, to support critical decision-makings in risk-aware web applications such as urban event prediction where fluctuations are of significant interests. Given the fact that uncertainty quantifies the potential variations around prediction results, traditional learning schemes always lack uncertainty labels, and conventional uncertainty quantification approaches mostly rely upon statistical estimations with Bayesian Neural Networks or ensemble methods. However, they have never involved any spatiotemporal evolution of uncertainties under various contexts, and also have kept suffering from the poor efficiency of statistical uncertainty estimation while training models with multiple times. To provide high-quality uncertainty quantification for spatiotemporal forecasting, we propose an uncertainty learning mechanism to simultaneously estimate internal data quality and quantify external uncertainty regarding various contextual interactions. To address the issue of lacking labels of uncertainty, we propose a hierarchical data turbulence scheme where we can actively inject controllable uncertainty for guidance, and hence provide insights to both uncertainty quantification and weak supervised learning. Finally, we re-calibrate and boost the prediction performance by devising a gated-based bridge to adaptively leverage the learned uncertainty into predictions. Extensive experiments on three real-world spatiotemporal mobility sets have corroborated the superiority of our proposed model in terms of both forecasting and uncertainty quantification.

preprint2020arXiv

Inductive Link Prediction for Nodes Having Only Attribute Information

Predicting the link between two nodes is a fundamental problem for graph data analytics. In attributed graphs, both the structure and attribute information can be utilized for link prediction. Most existing studies focus on transductive link prediction where both nodes are already in the graph. However, many real-world applications require inductive prediction for new nodes having only attribute information. It is more challenging since the new nodes do not have structure information and cannot be seen during the model training. To solve this problem, we propose a model called DEAL, which consists of three components: two node embedding encoders and one alignment mechanism. The two encoders aim to output the attribute-oriented node embedding and the structure-oriented node embedding, and the alignment mechanism aligns the two types of embeddings to build the connections between the attributes and links. Our model DEAL is versatile in the sense that it works for both inductive and transductive link prediction. Extensive experiments on several benchmark datasets show that our proposed model significantly outperforms existing inductive link prediction methods, and also outperforms the state-of-the-art methods on transductive link prediction.

preprint2020arXiv

RiskOracle: A Minute-level Citywide Traffic Accident Forecasting Framework

Real-time traffic accident forecasting is increasingly important for public safety and urban management (e.g., real-time safe route planning and emergency response deployment). Previous works on accident forecasting are often performed on hour levels, utilizing existed neural networks with static region-wise correlations taken into account. However, it is still challenging when the granularity of forecasting step improves as the highly dynamic nature of road network and inherent rareness of accident records in one training sample, which leads to biased results and zero-inflated issue. In this work, we propose a novel framework RiskOracle, to improve the prediction granularity to minute levels. Specifically, we first transform the zero-risk values in labels to fit the training network. Then, we propose the Differential Time-varying Graph neural network (DTGN) to capture the immediate changes of traffic status and dynamic inter-subregion correlations. Furthermore, we adopt multi-task and region selection schemes to highlight citywide most-likely accident subregions, bridging the gap between biased risk values and sporadic accident distribution. Extensive experiments on two real-world datasets demonstrate the effectiveness and scalability of our RiskOracle framework.

Xike Xie

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

TRACE: Tourism Recommendation with Accountable Citation Evidence

Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning

Clustering-based Partitioning for Large Web Graphs

GX-Plug: a Middleware for Plugging Accelerators to Distributed Graph Processing

STUaNet: Understanding uncertainty in spatiotemporal collective human mobility

Inductive Link Prediction for Nodes Having Only Attribute Information

RiskOracle: A Minute-level Citywide Traffic Accident Forecasting Framework