时序数据异常检测工具/数据集大列表
作者 | rob-med
编辑 | 小极
来源 | https://github.com/rob-med/awesome-TS-anomaly-detection
原文 | https://zhuanlan.zhihu.com/p/57432180
【导读】分享一个时序数据异常检测工具/数据集大列表,包括一些异常检测软件、相关软件和基准数据集等,GitHub地址:https://github.com/rob-med/awesome-TS-anomaly-detection
Anomaly Detection Software
Name | Language | Pitch | License |
---|---|---|---|
Numenta's Nupic | C++ | Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM). | AGPL |
Etsy's Skyline | Python | Skyline is a real-time anomaly detection system, built to enable passive monitoring of hundreds of thousands of metrics. | MIT |
Twitter's AnomalyDetection | R | AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. | GPL |
Netflix's Surus | Java | Robust Anomaly Detection (RAD) - An implementation of the Robust PCA. | Apache-2.0 |
Lytics Anomalyzer | Go | Anomalyzer implements a suite of statistical tests that yield the probability that a given set of numeric input, typically a time series, contains anomalous behavior. | Apache-2.0 |
Yahoo's EGADS | Java | GADS is a library that contains a number of anomaly detection techniques applicable to many use-cases in a single package with the only dependency being Java. | GPL |
Linkedin's luminol | Python | Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detection and correlation. It can be used to investigate possible causes of anomaly. | Apache-2.0 |
Ele.me's banshee | Go | Anomalies detection system for periodic metrics. | MIT |
Mentat's datastream.io | Python | An open-source framework for real-time anomaly detection using Python, Elasticsearch and Kibana. | Apache-2.0 |
Donut | Python | Donut is an unsupervised anomaly detection algorithm for seasonal KPIs, based on Variational Autoencoders. | - |
NASA's Telemanom | Python | A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions. | custom |
banpei | Python | Outlier detection (Hotelling's theory) and Change point detection (Singular spectrum transformation) for time-series. | MIT |
CAD | Python | Contextual Anomaly Detection for real-time AD on streagming data (winner algorithm of the 2016 NAB competition). | AGPL |
Related Software
This section includes some time-series software for anomaly detection-related tasks, such as forecasting and labeling.
Forecasting
Name | Language | Pitch | License |
---|---|---|---|
Facebook's Prophet | Python/R | Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. | BSD |
PyFlux | Python | The library has a good array of modern time series models, as well as a flexible array of inference options (frequentist and Bayesian) that can be applied to these models. | BSD 3-Clause |
Pyramid | Python | Porting of R's auto.arima with a scikit-learn-friendly interface. | MIT |
SaxPy | Python | General implementation of SAX, as well as HOTSAX for anomaly detection. | GPLv2.0 |
tslearn | Python | tslearn is a Python package that provides machine learning tools for the analysis of time series. This package builds on scikit-learn, numpy and scipy libraries. | BSD 2-Clause |
seglearn | Python | Seglearn is a python package for machine learning time series or sequences. It provides an integrated pipeline for segmentation, feature extraction, feature processing, and final estimator. | BSD 3-Clause |
Tigramite | Python | Tigramite is a causal time series analysis python package. It allows to efficiently reconstruct causal graphs from high-dimensional time series datasets and model the obtained causal dependencies for causal mediation and prediction analyses. | GPLv3.0 |
Labeling
Name | Language | Pitch | License |
---|---|---|---|
Microsoft's Taganomaly | R (dockerized web app) | Simple tool for tagging time series data. Works for univariate and multivariate data, provides a reference anomaly prediction using Twitter's AnomalyDetection package. | MIT |
Baidu's Curve | Python | Curve is an open-source tool to help label anomalies on time-series data. | Apache-2.0 |
Benchmark Datasets
Numenta's NAB
NAB is a novel benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications.
Yahoo's Webscope S5
The dataset consists of real and synthetic time-series with tagged anomaly points. The dataset tests the detection accuracy of various anomaly-types including outliers and change-points.