Source author record

Eamonn J. Keogh

Eamonn J. Keogh appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning

Catalog footprint

What is connected

4works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress

Time series anomaly detection has been a perennially important topic in data science, with papers dating back to the 1950s. However, in recent years there has been an explosion of interest in this topic, much of it driven by the success of deep learning in other domains and for other time series tasks. Most of these papers test on one or more of a handful of popular benchmark datasets, created by Yahoo, Numenta, NASA, etc. In this work we make a surprising claim. The majority of the individual exemplars in these datasets suffer from one or more of four flaws. Because of these four flaws, we believe that many published comparisons of anomaly detection algorithms may be unreliable, and more importantly, much of the apparent progress in recent years may be illusionary. In addition to demonstrating these claims, with this paper we introduce the UCR Time Series Anomaly Archive. We believe that this resource will perform a similar role as the UCR Time Series Classification Archive, by providing the community with a benchmark that allows meaningful comparisons between approaches and a meaningful gauge of overall progress.

preprint2022arXiv

FastDTW is approximate and Generally Slower than the Algorithm it Approximates

Many time series data mining problems can be solved with repeated use of distance measure. Examples of such tasks include similarity search, clustering, classification, anomaly detection and segmentation. For over two decades it has been known that the Dynamic Time Warping (DTW) distance measure is the best measure to use for most tasks, in most domains. Because the classic DTW algorithm has quadratic time complexity, many ideas have been introduced to reduce its amortized time, or to quickly approximate it. One of the most cited approximate approaches is FastDTW. The FastDTW algorithm has well over a thousand citations and has been explicitly used in several hundred research efforts. In this work, we make a surprising claim. In any realistic data mining application, the approximate FastDTW is much slower than the exact DTW. This fact clearly has implications for the community that uses this algorithm: allowing it to address much larger datasets, get exact results, and do so in less time.

preprint2022arXiv

When is Early Classification of Time Series Meaningful?

Since its introduction two decades ago, there has been increasing interest in the problem of early classification of time series. This problem generalizes classic time series classification to ask if we can classify a time series subsequence with sufficient accuracy and confidence after seeing only some prefix of a target pattern. The idea is that the earlier classification would allow us to take immediate action, in a domain in which some practical interventions are possible. For example, that intervention might be sounding an alarm or applying the brakes in an automobile. In this work, we make a surprising claim. In spite of the fact that there are dozens of papers on early classification of time series, it is not clear that any of them could ever work in a real-world setting. The problem is not with the algorithms per se but with the vague and underspecified problem description. Essentially all algorithms make implicit and unwarranted assumptions about the problem that will ensure that they will be plagued by false positives and false negatives even if their results suggested that they could obtain near-perfect results. We will explain our findings with novel insights and experiments and offer recommendations to the community.

preprint2019arXiv

Time Series Classification: Lessons Learned in the (Literal) Field while Studying Chicken Behavior

Poultry farms are a major contributor to the human food chain. However, around the world, there have been growing concerns about the quality of life for the livestock in poultry farms; and increasingly vocal demands for improved standards of animal welfare. Recent advances in sensing technologies and machine learning allow the possibility of monitoring birds, and employing the lessons learned to improve the welfare for all birds. This task superficially appears to be easy, yet, studying behavioral patterns involves collecting enormous amounts of data, justifying the term Big Data. Before the big data can be used for analytical purposes to tease out meaningful, well-conserved behavioral patterns, the collected data needs to be pre-processed. The pre-processing refers to processes for cleansing and preparing data so that it is in the format ready to be analyzed by downstream algorithms, such as classification and clustering algorithms. However, as we shall demonstrate, efficient pre-processing of chicken big data is both non-trivial and crucial towards success of further analytics.

Eamonn J. Keogh

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress

FastDTW is approximate and Generally Slower than the Algorithm it Approximates

When is Early Classification of Time Series Meaningful?

Time Series Classification: Lessons Learned in the (Literal) Field while Studying Chicken Behavior