Anomaly Detection using Unsupervised methods

Some anomalies can be detected by Using both unsupervised and supervised methods

Univariate Outlier detection

Boxplot

Boxplot to detect an anomaly

Histogram

using a histogram to detect an anomaly

Distribution based approach

using a distribution based approach to detect an anomaly

Multimodal distribution

Guassian Mixture model

Multivariate Outlier detection

Reason we need to use a multivariate detection methods

Histogram based outlier score (HBOS)

HBOS Advantages vs disdavantages

Neighborhood methods

KNN

A is detected as an anomaly but not B

Local Outlier Factor (LOF)

Using LOF to detect an anomaly

Connectivity Outlier Factor (COF)

Detecting isolated outliers

Different Neighborhood approaches to use in different situations

Advantages vs disadvantages of Neighborhood approaches

One-class classification

One-class SVM

One-class SVM

Clustering

Cluster Approach

DBSCAN

Advantages vs disdavantages of clustering approaches

Approaches for High-Dimensional Data

In higher dimensions the similarity between two similar people is decreased and increased for irrelevant people - Curse of dimensionality

In high dimensions, distance metrics such as Eculidean distance and neighborhood concept does not make sense

Solutions for Anomaly detection in High-dimensional data

  • Dimensions Reduction Techniques
    • PCA
    • Matrix / Tensor Factorization
    • Autoencoder
  • Angle-based outlier detection
  • Ensemble Approaches
    • Isolation Forest
    • Feature Bagging

PCA

Matrix Factorization

Tensor Factorization

Using Tensor Factorization to capture temporal features

Using Tensor Factorization with other methods to find anomalies

Autoencoder

Using reconstruction errors to detect anomalies

Angle based Outlier detection

Examine the variance over the angles

Ensemble

Isolation Forest

Using Average deptth to Isolate Outliers

Feature Bagging

Comparison of Various approaches

Benchmarking Methods available in pyod packages

Factors to consider for Anomaly detection