Researcher profile

Lexing Xie

Lexing Xie contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2022arXiv

Determinantal Point Process Likelihoods for Sequential Recommendation

Sequential recommendation is a popular task in academic research and close to real-world application scenarios, where the goal is to predict the next action(s) of the user based on his/her previous sequence of actions. In the training process of recommender systems, the loss function plays an essential role in guiding the optimization of recommendation models to generate accurate suggestions for users. However, most existing sequential recommendation techniques focus on designing algorithms or neural network architectures, and few efforts have been made to tailor loss functions that fit naturally into the practical application scenario of sequential recommender systems. Ranking-based losses, such as cross-entropy and Bayesian Personalized Ranking (BPR) are widely used in the sequential recommendation area. We argue that such objective functions suffer from two inherent drawbacks: i) the dependencies among elements of a sequence are overlooked in these loss formulations; ii) instead of balancing accuracy (quality) and diversity, only generating accurate results has been over emphasized. We therefore propose two new loss functions based on the Determinantal Point Process (DPP) likelihood, that can be adaptively applied to estimate the subsequent item or items. The DPP-distributed item set captures natural dependencies among temporal actions, and a quality vs. diversity decomposition of the DPP kernel pushes us to go beyond accuracy-oriented loss functions. Experimental results using the proposed loss functions on three real-world datasets show marked improvements over state-of-the-art sequential recommendation methods in both quality and diversity metrics.

preprint2022arXiv

Efficient Non-parametric Bayesian Hawkes Processes

In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the finite support assumption of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms -- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization -- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.

preprint2022arXiv

Mapping Topics in 100,000 Real-life Moral Dilemmas

Moral dilemmas play an important role in theorizing both about ethical norms and moral psychology. Yet thought experiments borrowed from the philosophical literature often lack the nuances and complexity of real life. We leverage 100,000 threads -- the largest collection to date -- from Reddit's r/AmItheAsshole to examine the features of everyday moral dilemmas. Combining topic modeling with evaluation from both expert and crowd-sourced workers, we discover 47 finer-grained, meaningful topics and group them into five meta-categories. We show that most dilemmas combine at least two topics, such as family and money. We also observe that the pattern of topic co-occurrence carries interesting information about the structure of everyday moral concerns: for example, the generation of moral dilemmas from nominally neutral topics, and interaction effects in which final verdicts do not line up with the moral concerns in the original stories in any simple way. Our analysis demonstrates the utility of a fine-grained data-driven approach to online moral dilemmas, and provides a valuable resource for researchers aiming to explore the intersection of practical and theoretical ethics.

preprint2022arXiv

Smallset Timelines: A Visual Representation of Data Preprocessing Decisions

Data preprocessing is a crucial stage in the data analysis pipeline, with both technical and social aspects to consider. Yet, the attention it receives is often lacking in research practice and dissemination. We present the Smallset Timeline, a visualisation to help reflect on and communicate data preprocessing decisions. A "Smallset" is a small selection of rows from the original dataset containing instances of dataset alterations. The Timeline is comprised of Smallset snapshots representing different points in the preprocessing stage and captions to describe the alterations visualised at each point. Edits, additions, and deletions to the dataset are highlighted with colour. We develop the R software package, smallsets, that can create Smallset Timelines from R and Python data preprocessing scripts. Constructing the figure asks practitioners to reflect on and revise decisions as necessary, while sharing it aims to make the process accessible to a diverse range of audiences. We present two case studies to illustrate use of the Smallset Timeline for visualising preprocessing decisions. Case studies include software defect data and income survey benchmark data, in which preprocessing affects levels of data loss and group fairness in prediction tasks, respectively. We envision Smallset Timelines as a go-to data provenance tool, enabling better documentation and communication of preprocessing tasks at large.

preprint2022arXiv

Whose Advantage? Measuring Attention Dynamics across YouTube and Twitter on Controversial Topics

The ideological asymmetries have been recently observed in contested online spaces, where conservative voices seem to be relatively more pronounced even though liberals are known to have the population advantage on digital platforms. Most prior research, however, focused on either one single platform or one single political topic. Whether an ideological group garners more attention across platforms and/or topics, and how the attention dynamics evolve over time, have not been explored. In this work, we present a quantitative study that links collective attention across two social platforms -- YouTube and Twitter, centered on online activities surrounding popular videos of three controversial political topics including Abortion, Gun control, and Black Lives Matter over 16 months. We propose several sets of video-centric metrics to characterize how online attention is accumulated for different ideological groups. We find that neither side is on a winning streak: left-leaning videos are overall more viewed, more engaging, but less tweeted than right-leaning videos. The attention time series unfold quicker for left-leaning videos, but span a longer time for right-leaning videos. Network analysis on the early adopters and tweet cascades show that the information diffusion for left-leaning videos tends to involve centralized actors; while that for right-leaning videos starts earlier in the attention lifecycle. In sum, our findings go beyond the static picture of ideological asymmetries in digital spaces and provide a set of methods to quantify attention dynamics across different social platforms.

preprint2021arXiv

AttentionFlow: Visualising Influence in Networks of Time Series

The collective attention on online items such as web pages, search terms, and videos reflects trends that are of social, cultural, and economic interest. Moreover, attention trends of different items exhibit mutual influence via mechanisms such as hyperlinks or recommendations. Many visualisation tools exist for time series, network evolution, or network influence; however, few systems connect all three. In this work, we present AttentionFlow, a new system to visualise networks of time series and the dynamic influence they have on one another. Centred around an ego node, our system simultaneously presents the time series on each node using two visual encodings: a tree ring for an overview and a line chart for details. AttentionFlow supports interactions such as overlaying time series of influence and filtering neighbours by time or flux. We demonstrate AttentionFlow using two real-world datasets, VevoMusic and WikiTraffic. We show that attention spikes in songs can be explained by external events such as major awards, or changes in the network such as the release of a new song. Separate case studies also demonstrate how an artist's influence changes over their career, and that correlated Wikipedia traffic is driven by cultural interests. More broadly, AttentionFlow can be generalised to visualise networks of time series on physical infrastructures such as road networks, or natural phenomena such as weather and geological measurements.

preprint2021arXiv

Radflow: A Recurrent, Aggregated, and Decomposable Model for Networks of Time Series

We propose a new model for networks of time series that influence each other. Graph structures among time series are found in diverse domains, such as web traffic influenced by hyperlinks, product sales influenced by recommendation, or urban transport volume influenced by road networks and weather. There has been recent progress in graph modeling and in time series forecasting, respectively, but an expressive and scalable approach for a network of series does not yet exist. We introduce Radflow, a novel model that embodies three key ideas: a recurrent neural network to obtain node embeddings that depend on time, the aggregation of the flow of influence from neighboring nodes with multi-head attention, and the multi-layer decomposition of time series. Radflow naturally takes into account dynamic networks where nodes and edges change over time, and it can be used for prediction and data imputation tasks. On real-world datasets ranging from a few hundred to a few hundred thousand nodes, we observe that Radflow variants are the best performing model across a wide range of settings. The recurrent component in Radflow also outperforms N-BEATS, the state-of-the-art time series model. We show that Radflow can learn different trends and seasonal patterns, that it is robust to missing nodes and edges, and that correlated temporal patterns among network neighbors reflect influence strength. We curate WikiTraffic, the largest dynamic network of time series with 366K nodes and 22M time-dependent links spanning five years. This dataset provides an open benchmark for developing models in this area, with applications that include optimizing resources for the web. More broadly, Radflow has the potential to improve forecasts in correlated time series networks such as the stock market, and impute missing measurements in geographically dispersed networks of natural phenomena.

preprint2021arXiv

UNIPoint: Universally Approximating Point Processes Intensities

Point processes are a useful mathematical tool for describing events over time, and so there are many recent approaches for representing and learning them. One notable open question is how to precisely describe the flexibility of point process models and whether there exists a general model that can represent all point processes. Our work bridges this gap. Focusing on the widely used event intensity function representation of point processes, we provide a proof that a class of learnable functions can universally approximate any valid intensity function. The proof connects the well known Stone-Weierstrass Theorem for function approximation, the uniform density of non-negative continuous functions using a transfer functions, the formulation of the parameters of a piece-wise continuous functions as a dynamic system, and a recurrent neural network implementation for capturing the dynamics. Using these insights, we design and implement UNIPoint, a novel neural point process model, using recurrent neural networks to parameterise sums of basis function upon each event. Evaluations on synthetic and real world datasets show that this simpler representation performs better than Hawkes process variants and more complex neural network-based approaches. We expect this result will provide a practical basis for selecting and tuning models, as well as furthering theoretical work on representational complexity and learnability.

preprint2020arXiv

ASNets: Deep Learning for Generalised Planning

In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets). The ASNet is a neural network architecture that exploits the relational structure of (P)PDDL planning problems to learn a common set of weights that can be applied to any problem in a domain. By mimicking the actions chosen by a traditional, non-learning planner on a handful of small problems in a domain, ASNets are able to learn a generalised reactive policy that can quickly solve much larger instances from the domain. This work extends the ASNet architecture to make it more expressive, while still remaining invariant to a range of symmetries that exist in PPDDL problems. We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study. Finally, we show that sparsity-inducing regularisation can produce ASNets that are compact enough for humans to understand, yielding insights into how the structure of ASNets allows them to generalise across a domain.

preprint2020arXiv

Comparative Document Summarisation via Classification

This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maximum mean discrepancy, as well as a simple yet effective gradient-based optimisation strategy. Our new formulation allows scalable evaluations of comparative summarisation as a classification task, both automatically and via crowd-sourcing. To this end, we evaluate comparative summarisation methods on a newly curated collection of controversial news topics over 13 months. We observe that gradient-based optimisation outperforms discrete and baseline approaches in 14 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimisation elicit 7% more accurate classification from human workers than discrete optimisation. Our result contrasts with recent literature on submodular data subset selection that favours discrete optimisation. We posit that our formulation of comparative summarisation will prove useful in a diverse range of use cases such as comparing content sources, authors, related topics, or distinct view points.

preprint2020arXiv

Describing and Predicting Online Items with Reshare Cascades via Dual Mixture Self-exciting Processes

It is well-known that online behavior is long-tailed, with most cascaded actions being short and a few being very long. A prominent drawback in generative models for online events is the inability to describe unpopular items well. This work addresses these shortcomings by proposing dual mixture self-exciting processes to jointly learn from groups of cascades. We first start from the observation that maximum likelihood estimates for content virality and influence decay are separable in a Hawkes process. Next, our proposed model, which leverages a Borel mixture model and a kernel mixture model, jointly models the unfolding of a heterogeneous set of cascades. When applied to cascades of the same online items, the model directly characterizes their spread dynamics and supplies interpretable quantities, such as content virality and content influence decay, as well as methods for predicting the final content popularities. On two retweet cascade datasets -- one relating to YouTube videos and the second relating to controversial news articles -- we show that our models capture the differences between online items at the granularity of items, publishers and categories. In particular, we are able to distinguish between far-right, conspiracy, controversial and reputable online news articles based on how they diffuse through social media, achieving an F1 score of 0.945. On holdout datasets, we show that the dual mixture model provides, for reshare diffusion cascades especially unpopular ones, better generalization performance and, for online items, accurate item popularity predictions.

preprint2020arXiv

Estimating Attention Flow in Online Video Networks

Online videos have shown tremendous increase in Internet traffic. Most video hosting sites implement recommender systems, which connect the videos into a directed network and conceptually act as a source of pathways for users to navigate. At present, little is known about how human attention is allocated over such large-scale networks, and about the impacts of the recommender systems. In this paper, we first construct the Vevo network -- a YouTube video network with 60,740 music videos interconnected by the recommendation links, and we collect their associated viewing dynamics. This results in a total of 310 million views every day over a period of 9 weeks. Next, we present large-scale measurements that connect the structure of the recommendation network and the video attention dynamics. We use the bow-tie structure to characterize the Vevo network and we find that its core component (23.1% of the videos), which occupies most of the attention (82.6% of the views), is made out of videos that are mainly recommended among themselves. This is indicative of the links between video recommendation and the inequality of attention allocation. Finally, we address the task of estimating the attention flow in the video recommendation network. We propose a model that accounts for the network effects for predicting video popularity, and we show it consistently outperforms the baselines. This model also identifies a group of artists gaining attention because of the recommendation network. Altogether, our observations and our models provide a new set of tools to better understand the impacts of recommender systems on collective social attention.

preprint2020arXiv

Modeling Information Cascades with Self-exciting Processes via Generalized Epidemic Models

Epidemic models and self-exciting processes are two types of models used to describe diffusion phenomena online and offline. These models were originally developed in different scientific communities, and their commonalities are under-explored. This work establishes, for the first time, a general connection between the two model classes via three new mathematical components. The first is a generalized version of stochastic Susceptible-Infected-Recovered (SIR) model with arbitrary recovery time distributions; the second is the relationship between the (latent and arbitrary) recovery time distribution, recovery hazard function, and the infection kernel of self-exciting processes; the third includes methods for simulating, fitting, evaluating and predicting the generalized process. On three large Twitter diffusion datasets, we conduct goodness-of-fit tests and holdout log-likelihood evaluation of self-exciting processes with three infection kernels --- exponential, power-law and Tsallis Q-exponential. We show that the modeling performance of the infection kernels varies with respect to the temporal structures of diffusions, and also with respect to user behavior, such as the likelihood of being bots. We further improve the prediction of popularity by combining two models that are identified as complementary by the goodness-of-fit tests.

preprint2020arXiv

Transform and Tell: Entity-Aware News Image Captioning

We propose an end-to-end model which generates captions for images embedded in news articles. News images present two key challenges: they rely on real-world knowledge, especially about named entities; and they typically have linguistically rich captions that include uncommon words. We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism. We tackle the second challenge with a state-of-the-art transformer language model that uses byte-pair-encoding to generate captions as a sequence of word parts. On the GoodNews dataset, our model outperforms the previous state of the art by a factor of four in CIDEr score (13 to 54). This performance gain comes from a unique combination of language models, word representation, image embeddings, face embeddings, object embeddings, and improvements in neural network design. We also introduce the NYTimes800k dataset which is 70% larger than GoodNews, has higher article quality, and includes the locations of images within articles as an additional contextual cue.

preprint2020arXiv

Variation across Scales: Measurement Fidelity under Twitter Data Sampling

A comprehensive understanding of data quality is the cornerstone of measurement studies in social media research. This paper presents in-depth measurements on the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades). By constructing complete tweet streams, we show that Twitter rate limit message is an accurate indicator for the volume of missing tweets. Sampling also differs significantly across timescales. While the hourly sampling rate is influenced by the diurnal rhythm in different time zones, the millisecond level sampling is heavily affected by the implementation choices. For Twitter entities such as users, we find the Bernoulli process with a uniform rate approximates the empirical distributions well. It also allows us to estimate the true ranking with the observed sample data. For networks on Twitter, their structures are altered significantly and some components are more likely to be preserved. For retweet cascades, we observe changes in distributions of tweet inter-arrival time and user influence, which will affect models that rely on these features. This work calls attention to noises and potential biases in social data, and provides a few tools to measure Twitter sampling effects.