Source author record

Hera Y. He

Hera Y. He appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Computation math.NA physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

2works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity

Social networking websites allow users to create and share content. Big information cascades of post resharing can form as users of these sites reshare others' posts with their friends and followers. One of the central challenges in understanding such cascading behaviors is in forecasting information outbreaks, where a single post becomes widely popular by being reshared by many users. In this paper, we focus on predicting the final number of reshares of a given post. We build on the theory of self-exciting point processes to develop a statistical model that allows us to make accurate predictions. Our model requires no training or expensive feature engineering. It results in a simple and efficiently computable formula that allows us to answer questions, in real-time, such as: Given a post's resharing history so far, what is our current estimate of its final number of reshares? Is the post resharing cascade past the initial stage of explosive growth? And, which posts will be the most reshared in the future? We validate our model using one month of complete Twitter data and demonstrate a strong improvement in predictive accuracy over existing approaches. Our model gives only 15% relative error in predicting final size of an average information cascade after observing it for just one hour.

preprint2014arXiv

Optimal mixture weights in multiple importance sampling

In multiple importance sampling we combine samples from a finite list of proposal distributions. When those proposal distributions are used to create control variates, it is possible (Owen and Zhou, 2000) to bound the ratio of the resulting variance to that of the unknown best proposal distribution in our list. The minimax regret arises by taking a uniform mixture of proposals, but that is conservative when there are many components. In this paper we optimize the mixture component sampling rates to gain further efficiency. We show that the sampling variance of mixture importance sampling with control variates is jointly convex in the mixture probabilities and control variate regression coefficients. We also give a sequential importance sampling algorithm to estimate the optimal mixture from the sample data.