Paper detail

Using temporal IDF for efficient novelty detection in text streams

Novelty detection in text streams is a challenging task that emerges in quite a few different scenarios, ranging from email thread filtering to RSS news feed recommendation on a smartphone. An efficient novelty detection algorithm can save the user a great deal of time and resources when browsing through relevant yet usually previously-seen content. Most of the recent research on detection of novel documents in text streams has been building upon either geometric distances or distributional similarities, with the former typically performing better but being much slower due to the need of comparing an incoming document with all the previously-seen ones. In this paper, we propose a new approach to novelty detection in text streams. We describe a resource-aware mechanism that is able to handle massive text streams such as the ones present today thanks to the burst of social media and the emergence of the Web as the main source of information. We capitalize on the historical Inverse Document Frequency (IDF) that was known for capturing well term specificity and we show that it can be used successfully at the document level as a measure of document novelty. This enables us to avoid similarity comparisons with previous documents in the text stream, thus scaling better and leading to faster execution times. Moreover, as the collection of documents evolves over time, we use a temporal variant of IDF not only to maintain an efficient representation of what has already been seen but also to decay the document frequencies as the time goes by. We evaluate the performance of the proposed approach on a real-world news articles dataset created for this task. The results show that the proposed method outperforms all of the baselines while managing to operate efficiently in terms of time complexity and memory usage, which are of great importance in a mobile setting scenario.

preprint2014arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.