Source author record

Dah Ming Chiu

Dah Ming Chiu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

11works
8topics
4close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2016arXiv

Who are Like-minded: Mining User Interest Similarity in Online Social Networks

In this paper, we mine and learn to predict how similar a pair of users' interests towards videos are, based on demographic (age, gender and location) and social (friendship, interaction and group membership) information of these users. We use the video access patterns of active users as ground truth (a form of benchmark). We adopt tag-based user profiling to establish this ground truth, and justify why it is used instead of video-based methods, or many latent topic models such as LDA and Collaborative Filtering approaches. We then show the effectiveness of the different demographic and social features, and their combinations and derivatives, in predicting user interest similarity, based on different machine-learning methods for combining multiple features. We propose a hybrid tree-encoded linear model for combining the features, and show that it out-performs other linear and treebased models. Our methods can be used to predict user interest similarity when the ground-truth is not available, e.g. for new users, or inactive users whose interests may have changed from old access data, and is useful for video recommendation. Our study is based on a rich dataset from Tencent, a popular service provider of social networks, video services, and various other services in China.

preprint2015arXiv

A Population Model for the Academic Ecosystem

In recent times, the academic ecosystem has seen a tremendous growth in number of authors and publications. While most temporal studies in this area focus on evolution of co-author and citation network structure, this systemic inflation has received very little attention. In this paper, we address this issue by proposing a population model for academia, derived from publication records in the Computer Science domain. We use a generalized branching process as an overarching framework, which enables us to describe the evolution and composition of the research community in a systematic manner. Further, the observed patterns allow us to shed light on researchers' lifecycle encompassing arrival, academic life expectancy, activity, productivity and offspring distribution in the ecosystem. We believe such a study will help develop better bibliometric indices which account for the inflation, and also provide insights into sustainable and efficient resource management for academia.

preprint2015arXiv

Modeling and Analysis of Scholar Mobility on Scientific Landscape

Scientific literature till date can be thought of as a partially revealed landscape, where scholars continue to unveil hidden knowledge by exploring novel research topics. How do scholars explore the scientific landscape , i.e., choose research topics to work on? We propose an agent-based model of topic mobility behavior where scholars migrate across research topics on the space of science following different strategies, seeking different utilities. We use this model to study whether strategies widely used in current scientific community can provide a balance between individual scientific success and the efficiency and diversity of the whole academic society. Through extensive simulations, we provide insights into the roles of different strategies, such as choosing topics according to research potential or the popularity. Our model provides a conceptual framework and a computational approach to analyze scholars' behavior and its impact on scientific production. We also discuss how such an agent-based modeling approach can be integrated with big real-world scholarly data.

preprint2014arXiv

Modeling Dynamics of Online Video Popularity

Large Internet video delivery systems serve millions of videos to tens of millions of users on daily basis, via Video-on-Demand and live streaming. Video popularity evolves over time. It represents the workload, as welll as business value, of the video to the overall system. The ability to predict video popularity is very helpful for improving service quality and operating efficiency. Previous studies adopted simple models for video popularity, or directly adopted patterns from measurement studies. In this paper, we develop a stochastic fluid model that tries to capture two hidden processes that give rise to different patterns of a given video's popularity evolution: the information spreading process, and the user reaction process. Specifically, these processes model how the video is recommended to the user, the videos inherent attractiveness, and users reaction rate, and yield specific popularity evolution patterns. We then validate our model by matching the predictions of the model with observed patterns from our collaborator, a large content provider in China. This model thus gives us the insight to explain the common and different video popularity evolution patterns and why.

preprint2014arXiv

MYE: Missing Year Estimation in Academic Social Networks

In bibliometrics studies, a common challenge is how to deal with incorrect or incomplete data. However, given a large volume of data, there often exists certain relationships between the data items that can allow us to recover missing data items and correct erroneous data. In this paper, we study a particular problem of this sort - estimating the missing year information associated with publications (and hence authors' years of active publication). We first propose a simple algorithm that only makes use of the "direct" information, such as paper citation/reference relationships or paper-author relationships. The result of this simple algorithm is used as a benchmark for comparison. Our goal is to develop algorithms that increase both the coverage (the percentage of missing year papers recovered) and accuracy (mean absolute error of the estimated year to the real year). We propose some advanced algorithms that extend inference by information propagation. For each algorithm, we propose three versions according to the given academic social network type: a) Homogeneous (only contains paper citation links), b) Bipartite (only contains paper-author relations), and, c) Heterogeneous (both paper citation and paper-author relations). We carry out experiments on the three public data sets (MSR Libra, DBLP and APS), and evaluated by applying the K-fold cross validation method. We show that the advanced algorithms can improve both coverage and accuracy.

preprint2014arXiv

Smart Streaming for Online Video Services

Bandwidth consumption is a significant concern for online video service providers. Practical video streaming systems usually use some form of HTTP streaming (progressive download) to let users download the video at a faster rate than the video bitrate. Since users may quit before viewing the complete video, however, much of the downloaded video will be "wasted". To the extent that users' departure behavior can be predicted, we develop smart streaming that can be used to improve user QoE with limited server bandwidth or save bandwidth cost with unlimited server bandwidth. Through measurement, we extract certain user behavior properties for implementing such smart streaming, and demonstrate its advantage using prototype implementation as well as simulations.

preprint2014arXiv

The Academic Social Network

Through academic publications, the authors of these publications form a social network. Instead of sharing casual thoughts and photos (as in Facebook), authors pick co-authors and reference papers written by other authors. Thanks to various efforts (such as Microsoft Libra and DBLP), the data necessary for analyzing the academic social network is becoming more available on the Internet. What type of information and queries would be useful for users to find out, beyond the search queries already available from services such as Google Scholar? In this paper, we explore this question by defining a variety of ranking metrics on different entities -authors, publication venues and institutions. We go beyond traditional metrics such as paper counts, citations and h-index. Specifically, we define metrics such as influence, connections and exposure for authors. An author gains influence by receiving more citations, but also citations from influential authors. An author increases his/her connections by co-authoring with other authors, and specially from other authors with high connections. An author receives exposure by publishing in selective venues where publications received high citations in the past, and the selectivity of these venues also depends on the influence of the authors who publish there. We discuss the computation aspects of these metrics, and similarity between different metrics. With additional information of author-institution relationships, we are able to study institution rankings based on the corresponding authors' rankings for each type of metric as well as different domains. We are prepared to demonstrate these ideas with a web site (http://pubstat.org) built from millions of publications and authors.

preprint2013arXiv

Fake View Analytics in Online Video Services

Online video-on-demand(VoD) services invariably maintain a view count for each video they serve, and it has become an important currency for various stakeholders, from viewers, to content owners, advertizers, and the online service providers themselves. There is often significant financial incentive to use a robot (or a botnet) to artificially create fake views. How can we detect the fake views? Can we detect them (and stop them) using online algorithms as they occur? What is the extent of fake views with current VoD service providers? These are the questions we study in the paper. We develop some algorithms and show that they are quite effective for this problem.

preprint2011arXiv

Exploring Network Economics

In this paper, we explore what \emph{network economics} is all about, focusing on the interesting topics brought about by the Internet. Our intent is make this a brief survey, useful as an outline for a course on this topic, with an extended list of references. We try to make it as intuitive and readable as possible. We also deliberately try to be critical at times, and hope our interpretation of the topic will lead to interests for further discussions by those doing research in the same field.

preprint2011arXiv

Reciprocating Preferences Stablize Matching: College Admissions Revisited

In considering the college admissions problem, almost fifty years ago, Gale and Shapley came up with a simple abstraction based on preferences of students and colleges. They introduced the concept of stability and optimality; and proposed the deferred acceptance (DA) algorithm that is proven to lead to a stable and optimal solution. This algorithm is simple and computationally efficient. Furthermore, in subsequent studies it is shown that the DA algorithm is also strategy-proof, which means, when the algorithm is played out as a mechanism for matching two sides (e.g. colleges and students), the parties (colleges or students) have no incentives to act other than according to their true preferences. Yet, in practical college admission systems, the DA algorithm is often not adopted. Instead, an algorithm known as the Boston Mechanism (BM) or its variants are widely adopted. In BM, colleges accept students without deferral (considering other colleges' decisions), which is exactly the opposite of Gale-Shapley's DA algorithm. To explain and rationalize this reality, we introduce the notion of reciprocating preference to capture the influence of a student's interest on a college's decision. This model is inspired by the actual mechanism used to match students to universities in Hong Kong. The notion of reciprocating preference defines a class of matching algorithms, allowing different degrees of reciprocating preferences by the students and colleges. DA and BM are but two extreme cases (with zero and a hundred percent reciprocation) of this set. This model extends the notion of stability and optimality as well. As in Gale-Shapley's original paper, we discuss how the analogy can be carried over to the stable marriage problem, thus demonstrating the model's general applicability.

preprint2010arXiv

Mathematical Modeling of Competition in Sponsored Search Market

Sponsored search mechanisms have drawn much attention from both academic community and industry in recent years since the seminal papers of [13] and [14]. However, most of the existing literature concentrates on the mechanism design and analysis within the scope of only one search engine in the market. In this paper we propose a mathematical framework for modeling the interaction of publishers, advertisers and end users in a competitive market. We first consider the monopoly market model and provide optimal solutions for both ex ante and ex post cases, which represents the long-term and short-term revenues of search engines respectively. We then analyze the strategic behaviors of end users and advertisers under duopoly and prove the existence of equilibrium for both search engines to co-exist from ex-post perspective. To show the more general ex ante results, we carry out extensive simulations under different parameter settings. Our analysis and observation in this work can provide useful insight in regulating the sponsored search market and protecting the interests of advertisers and end users.