Source author record

David Phipps

David Phipps appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Retrieval Social and Information Networks Computation and Language Cryptography and Security cs.CY

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2013arXiv

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions

Weibo and other popular Chinese microblogging sites are well known for exercising internal censorship, to comply with Chinese government requirements. This research seeks to quantify the mechanisms of this censorship: how fast and how comprehensively posts are deleted.Our analysis considered 2.38 million posts gathered over roughly two months in 2012, with our attention focused on repeatedly visiting "sensitive" users. This gives us a view of censorship events within minutes of their occurrence, albeit at a cost of our data no longer representing a random sample of the general Weibo population. We also have a larger 470 million post sampling from Weibo's public timeline, taken over a longer time period, that is more representative of a random sample. We found that deletions happen most heavily in the first hour after a post has been submitted. Focusing on original posts, not reposts/retweets, we observed that nearly 30% of the total deletion events occur within 5- 30 minutes. Nearly 90% of the deletions happen within the first 24 hours. Leveraging our data, we also considered a variety of hypotheses about the mechanisms used by Weibo for censorship, such as the extent to which Weibo's censors use retrospective keyword-based censorship, and how repost/retweet popularity interacts with censorship. We also used natural language processing techniques to analyze which topics were more likely to be censored.

preprint2012arXiv

Language Without Words: A Pointillist Model for Natural Language Processing

This paper explores two separate questions: Can we perform natural language processing tasks without a lexicon?; and, Should we? Existing natural language processing techniques are either based on words as units or use units such as grams only for basic classification tasks. How close can a machine come to reasoning about the meanings of words and phrases in a corpus without using any lexicon, based only on grams? Our own motivation for posing this question is based on our efforts to find popular trends in words and phrases from online Chinese social media. This form of written Chinese uses so many neologisms, creative character placements, and combinations of writing systems that it has been dubbed the "Martian Language." Readers must often use visual queues, audible queues from reading out loud, and their knowledge and understanding of current events to understand a post. For analysis of popular trends, the specific problem is that it is difficult to build a lexicon when the invention of new ways to refer to a word or concept is easy and common. For natural language processing in general, we argue in this paper that new uses of language in social media will challenge machines' abilities to operate with words as the basic unit of understanding, not only in Chinese but potentially in other languages.

preprint2012arXiv

Tracking and Quantifying Censorship on a Chinese Microblogging Site

We present measurements and analysis of censorship on Weibo, a popular microblogging site in China. Since we were limited in the rate at which we could download posts, we identified users likely to participate in sensitive topics and recursively followed their social contacts. We also leveraged new natural language processing techniques to pick out trending topics despite the use of neologisms, named entities, and informal language usage in Chinese social media. We found that Weibo dynamically adapts to the changing interests of its users through multiple layers of filtering. The filtering includes both retroactively searching posts by keyword or repost links to delete them, and rejecting posts as they are posted. The trend of sensitive topics is short-lived, suggesting that the censorship is effective in stopping the "viral" spread of sensitive issues. We also give evidence that sensitive topics in Weibo only scarcely propagate beyond a core of sensitive posters.

David Phipps

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions

Language Without Words: A Pointillist Model for Natural Language Processing

Tracking and Quantifying Censorship on a Chinese Microblogging Site