Researcher profile

Xue-Qi Cheng

Xue-Qi Cheng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - Emerging
14works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2016arXiv

Prediction of "Forwarding Whom" Behavior in Information Diffusion

Follow-ship network among users underlies the diffusion dynamics of messages on online social networks. Generally, the structure of underlying social network determines the visibility of messages and the diffusion process. In this paper, we study forwarding behavior of individuals, taking Sina Weibo as an example. We investigate multiple exposures in information diffusion and the "forwarding whom" problem associated with multiple exposures. Finally, we model and predict the "forwarding whom" behavior of individuals, combining structural, temporal, historical, and content features. Experimental results demonstrate that our method achieves a high accuracy 91.3%.

preprint2015arXiv

Learning user-specific latent influence and susceptibility from information cascades

Predicting cascade dynamics has important implications for understanding information propagation and launching viral marketing. Previous works mainly adopt a pair-wise manner, modeling the propagation probability between pairs of users using n^2 independent parameters for n users. Consequently, these models suffer from severe overfitting problem, specially for pairs of users without direct interactions, limiting their prediction accuracy. Here we propose to model the cascade dynamics by learning two low-dimensional user-specific vectors from observed cascades, capturing their influence and susceptibility respectively. This model requires much less parameters and thus could combat overfitting problem. Moreover, this model could naturally model context-dependent factors like cumulative effect in information propagation. Extensive experiments on synthetic dataset and a large-scale microblogging dataset demonstrate that this model outperforms the existing pair-wise models at predicting cascade dynamics, cascade size, and "who will be retweeted".

preprint2015arXiv

Modeling and Predicting Popularity Dynamics of Microblogs using Self-Excited Hawkes Processes

The ability to model and predict the popularity dynamics of individual user generated items on online media has important implications in a wide range of areas. In this paper, we propose a probabilistic model using a Self-Excited Hawkes Process(SEHP) to characterize the process through which individual microblogs gain their popularity. This model explicitly captures the triggering effect of each forwarding, distinguishing itself from the reinforced Poisson process based model where all previous forwardings are simply aggregated as a single triggering effect. We validate the proposed model by applying it on Sina Weibo, the most popular microblogging network in China. Experimental results demonstrate that the SEHP model consistently outperforms the model based on reinforced Poisson process.

preprint2014arXiv

IMRank: Influence Maximization via Finding Self-Consistent Ranking

Influence maximization, fundamental for word-of-mouth marketing and viral marketing, aims to find a set of seed nodes maximizing influence spread on social network. Early methods mainly fall into two paradigms with certain benefits and drawbacks: (1)Greedy algorithms, selecting seed nodes one by one, give a guaranteed accuracy relying on the accurate approximation of influence spread with high computational cost; (2)Heuristic algorithms, estimating influence spread using efficient heuristics, have low computational cost but unstable accuracy. We first point out that greedy algorithms are essentially finding a self-consistent ranking, where nodes' ranks are consistent with their ranking-based marginal influence spread. This insight motivates us to develop an iterative ranking framework, i.e., IMRank, to efficiently solve influence maximization problem under independent cascade model. Starting from an initial ranking, e.g., one obtained from efficient heuristic algorithm, IMRank finds a self-consistent ranking by reordering nodes iteratively in terms of their ranking-based marginal influence spread computed according to current ranking. We also prove that IMRank definitely converges to a self-consistent ranking starting from any initial ranking. Furthermore, within this framework, a last-to-first allocating strategy and a generalization of this strategy are proposed to improve the efficiency of estimating ranking-based marginal influence spread for a given ranking. In this way, IMRank achieves both remarkable efficiency and high accuracy by leveraging simultaneously the benefits of greedy algorithms and heuristic algorithms. As demonstrated by extensive experiments on large scale real-world social networks, IMRank always achieves high accuracy comparable to greedy algorithms, with computational cost reduced dramatically, even about $10-100$ times faster than other scalable heuristics.

preprint2014arXiv

Temporal scaling in information propagation

For the study of information propagation, one fundamental problem is uncovering universal laws governing the dynamics of information propagation. This problem, from the microscopic perspective, is formulated as estimating the propagation probability that a piece of information propagates from one individual to another. Such a propagation probability generally depends on two major classes of factors: the intrinsic attractiveness of information and the interactions between individuals. Despite the fact that the temporal effect of attractiveness is widely studied, temporal laws underlying individual interactions remain unclear, causing inaccurate prediction of information propagation on evolving social networks. In this report, we empirically study the dynamics of information propagation, using the dataset from a population-scale social media website. We discover a temporal scaling in information propagation: the probability a message propagates between two individuals decays with the length of time latency since their latest interaction, obeying a power-law rule. Leveraging the scaling law, we further propose a temporal model to estimate future propagation probabilities between individuals, reducing the error rate of information propagation prediction from 6.7% to 2.6% and improving viral marketing with 9.7% incremental customers.

preprint2013arXiv

Cumulative Effect in Information Diffusion: A Comprehensive Empirical Study on Microblogging Network

Cumulative effect in social contagions underlies many studies on the spread of innovation, behaviors, and influence. However, few large-scale empirical studies are conducted to validate the existence of cumulative effect in the information diffusion on social networks. In this paper, using the population-scale dataset from the largest Chinese microblogging website, we conduct a comprehensive study on the cumulative effect in information diffusion. We base our study on the diffusion network of each message, where nodes are the involved users and links are the following relationships among them. We find that multiple exposures to the same message indeed increase the possibility of forwarding it. However, additional exposures cannot further improve the chance of forwarding when the number of exposures crosses its peak at two. This finding questions the cumulative effect hypothesis in information diffusion. Furthermore, to clarify the forwarding preference among users, we investigate both the structural motif of the diffusion network and the temporal pattern of information diffusion process among users. The patterns provide vital insight for understanding the variation of message popularity and explain the characteristics of diffusion networks.

preprint2012arXiv

Conquering the rating bound problem in neighborhood-based collaborative filtering: a function recovery approach

As an important tool for information filtering in the era of socialized web, recommender systems have witnessed rapid development in the last decade. As benefited from the better interpretability, neighborhood-based collaborative filtering techniques, such as item-based collaborative filtering adopted by Amazon, have gained a great success in many practical recommender systems. However, the neighborhood-based collaborative filtering method suffers from the rating bound problem, i.e., the rating on a target item that this method estimates is bounded by the observed ratings of its all neighboring items. Therefore, it cannot accurately estimate the unobserved rating on a target item, if its ground truth rating is actually higher (lower) than the highest (lowest) rating over all items in its neighborhood. In this paper, we address this problem by formalizing rating estimation as a task of recovering a scalar rating function. With a linearity assumption, we infer all the ratings by optimizing the low-order norm, e.g., the $l_1/2$-norm, of the second derivative of the target scalar function, while remaining its observed ratings unchanged. Experimental results on three real datasets, namely Douban, Goodreads and MovieLens, demonstrate that the proposed approach can well overcome the rating bound problem. Particularly, it can significantly improve the accuracy of rating estimation by 37% than the conventional neighborhood-based methods.

preprint2012arXiv

Modeling the clustering in citation networks

For the study of citation networks, a challenging problem is modeling the high clustering. Existing studies indicate that the promising way to model the high clustering is a copying strategy, i.e., a paper copies the references of its neighbour as its own references. However, the line of models highly underestimates the number of abundant triangles observed in real citation networks and thus cannot well model the high clustering. In this paper, we point out that the failure of existing models lies in that they do not capture the connecting patterns among existing papers. By leveraging the knowledge indicated by such connecting patterns, we further propose a new model for the high clustering in citation networks. Experiments on two real world citation networks, respectively from a special research area and a multidisciplinary research area, demonstrate that our model can reproduce not only the power-law degree distribution as traditional models but also the number of triangles, the high clustering coefficient and the size distribution of co-citation clusters as observed in these real networks.

preprint2011arXiv

Distinguishing manipulated stocks via trading network analysis

Manipulation is an important issue for both developed and emerging stock markets. For the study of manipulation, it is critical to analyze investor behavior in the stock market. In this paper, an analysis of the full transaction records of over a hundred stocks in a one-year period is conducted. For each stock, a trading network is constructed to characterize the relations among its investors. In trading networks, nodes represent investors and a directed link connects a stock seller to a buyer with the total trade size as the weight of the link, and the node strength is the sum of all edge weights of a node. For all these trading networks, we find that the node degree and node strength both have tails following a power-law distribution. Compared with non-manipulated stocks, manipulated stocks have a high lower bound of the power-law tail, a high average degree of the trading network and a low correlation between the price return and the seller-buyer ratio. These findings may help us to detect manipulated stocks.

preprint2011arXiv

Exploring the structural regularities in networks

In this paper, we consider the problem of exploring structural regularities of networks by dividing the nodes of a network into groups such that the members of each group have similar patterns of connections to other groups. Specifically, we propose a general statistical model to describe network structure. In this model, group is viewed as hidden or unobserved quantity and it is learned by fitting the observed network data using the expectation-maximization algorithm. Compared with existing models, the most prominent strength of our model is the high flexibility. This strength enables it to possess the advantages of existing models and overcomes their shortcomings in a unified way. As a result, not only broad types of structure can be detected without prior knowledge of what type of intrinsic regularities exist in the network, but also the type of identified structure can be directly learned from data. Moreover, by differentiating outgoing edges from incoming edges, our model can detect several types of structural regularities beyond competing models. Tests on a number of real world and artificial networks demonstrate that our model outperforms the state-of-the-art model at shedding light on the structural features of networks, including the overlapping community structure, multipartite structure and several other types of structure which are beyond the capability of existing models.

preprint2010arXiv

Bridgeness: A Local Index on Edge Significance in Maintaining Global Connectivity

Edges in a network can be divided into two kinds according to their different roles: some enhance the locality like the ones inside a cluster while others contribute to the global connectivity like the ones connecting two clusters. A recent study by Onnela et al uncovered the weak ties effects in mobile communication. In this article, we provide complementary results on document networks, that is, the edges connecting less similar nodes in content are more significant in maintaining the global connectivity. We propose an index named bridgeness to quantify the edge significance in maintaining connectivity, which only depends on local information of network topology. We compare the bridgeness with content similarity and some other structural indices according to an edge percolation process. Experimental results on document networks show that the bridgeness outperforms content similarity in characterizing the edge significance. Furthermore, extensive numerical results on disparate networks indicate that the bridgeness is also better than some well-known indices on edge significance, including the Jaccard coefficient, degree product and betweenness centrality.

preprint2010arXiv

Covariance, correlation matrix and the multi-scale community structure of networks

Empirical studies show that real world networks often exhibit multiple scales of topological descriptions. However, it is still an open problem how to identify the intrinsic multiple scales of networks. In this article, we consider detecting the multi-scale community structure of network from the perspective of dimension reduction. According to this perspective, a covariance matrix of network is defined to uncover the multi-scale community structure through the translation and rotation transformations. It is proved that the covariance matrix is the unbiased version of the well-known modularity matrix. We then point out that the translation and rotation transformations fail to deal with the heterogeneous network, which is very common in nature and society. To address this problem, a correlation matrix is proposed through introducing the rescaling transformation into the covariance matrix. Extensive tests on real world and artificial networks demonstrate that the correlation matrix significantly outperforms the covariance matrix, identically the modularity matrix, as regards identifying the multi-scale community structure of network. This work provides a novel perspective to the identification of community structure and thus various dimension reduction methods might be used for the identification of community structure. Through introducing the correlation matrix, we further conclude that the rescaling transformation is crucial to identify the multi-scale community structure of network, as well as the translation and rotation transformations.

preprint2010arXiv

Spectral methods for the detection of network community structure: a comparative analysis

Spectral analysis has been successfully applied at the detection of community structure of networks, respectively being based on the adjacency matrix, the standard Laplacian matrix, the normalized Laplacian matrix, the modularity matrix, the correlation matrix and several other variants of these matrices. However, the comparison between these spectral methods is less reported. More importantly, it is still unclear which matrix is more appropriate for the detection of community structure. This paper answers the question through evaluating the effectiveness of these five matrices against the benchmark networks with heterogeneous distributions of node degree and community size. Test results demonstrate that the normalized Laplacian matrix and the correlation matrix significantly outperform the other three matrices at identifying the community structure of networks. This indicates that it is crucial to take into account the heterogeneous distribution of node degree when using spectral analysis for the detection of community structure. In addition, to our surprise, the modularity matrix exhibits very similar performance to the adjacency matrix, which indicates that the modularity matrix does not gain desired benefits from using the configuration model as reference network with the consideration of the node degree heterogeneity.

preprint2010arXiv

Uncovering the community structure associated with the diffusion dynamics of networks

As two main focuses of the study of complex networks, the community structure and the dynamics on networks have both attracted much attention in various scientific fields. However, it is still an open question how the community structure is associated with the dynamics on complex networks. In this paper, through investigating the diffusion process taking place on networks, we demonstrate that the intrinsic community structure of networks can be revealed by the stable local equilibrium states of the diffusion process. Furthermore, we show that such community structure can be directly identified through the optimization of the conductance of network, which measures how easily the diffusion occurs among different communities. Tests on benchmark networks indicate that the conductance optimization method significantly outperforms the modularity optimization methods at identifying the community structure of networks. Applications on real world networks also demonstrate the effectiveness of the conductance optimization method. This work provides insights into the multiple topological scales of complex networks, and the obtained community structure can naturally reflect the diffusion capability of the underlying network.