```
# libraries
import pytimetk as tk
import pandas as pd
import numpy as np
# Import Data
= tk.load_dataset('m4_daily', parse_dates = ['date']) m4_daily_df
```

# Anomaly Detection

Anomaly detection in time series analysis is a crucial process for identifying unusual patterns that deviate from expected behavior. These anomalies can signify critical, often unforeseen events in time series data. Effective anomaly detection helps in maintaining the quality and reliability of data, ensuring accurate forecasting and decision-making. The challenge lies in distinguishing between true anomalies and natural fluctuations, which demands sophisticated analytical techniques and a deep understanding of the underlying time series patterns. As a result, anomaly detection is an essential component of time series analysis, driving the proactive management of risks and opportunities in dynamic environments.

Pytimetk uses the following methods to determine anomalies in time series data;

**Decomposition of Time Series:**The first step is to decompose the time series into several components. Commonly, this includes

**trend**,**seasonality**, and**remainder**(or residual) components.Trend represents the underlying pattern or direction in the data over time. Seasonality captures recurring patterns or cycles over a specific period, such as daily, weekly, monthly, etc.

The remainder (or residual) is what’s left after the trend and seasonal components have been removed from the original time series.

**Generating Remainders:**After decomposition, the remainder component is extracted. This component reflects the part of the time series that cannot be explained by the trend and seasonal components.

The idea is that while trend and seasonality represent predictable and thus “normal” patterns, the remainder is where anomalies are most likely to manifest.

There are 2 common techniques for seasonal decomposition; STL and Twitter;

**STL**(Seasonal and Trend Decomposition) is a versatile and robust method for decomposing time series. STL works very well in circumstances where a long term trend is present. The Loess algorithm typically does a very good job at detecting the trend. However, it circumstances when the seasonal component is more dominant than the trend, Twitter tends to perform better.**Twitter**method is a similar decomposition method to that used in Twitter’s AnomalyDetection package. The Twitter method works identically to STL for removing the seasonal component. The main difference is in removing the trend, which is performed by removing the median of the data rather than fitting a smoother. The median works well when a long-term trend is less dominant that the short-term seasonal component. This is because the smoother tends to overfit the anomalies.

# 1 Anomaly Detection in Pytimetk

This section will demonstrate how to use the set of `anomalize`

functions for in pytimetk;

`anomalize()`

`plot_anomalies()`

`plot_anomalies_decomp()`

`plot_anomalies_cleaned()`

## 1.1 Setup

To setup, import the necessary packages and the `m4_daily_df`

dataset;

Let’s first demonstrate with a single time series. We’ll filter `m4_daily_df`

for `id`

= `D10`

and `date`

within the year 2015.

```
# Data filtering
= (
df
m4_daily_df"id == 'D10'")
.query("date.dt.year == 2015")
.query( )
```

We can plot this data to see the trend

```
# Plot data
tk.plot_timeseries(= df,
data = 'date',
date_column = 'value'
value_column )
```

## 1.2 Seasonal Decomposition & Remainder

First we perform seasonal decomposition and on the data and generate remainders using `anomalize()`

.

`anomalize()`

Use `help(tk.anomalize)`

to review additional helpful documentation.

```
# Anomalize
= tk.anomalize(
anomalize_df = df,
data = 'date',
date_column = 'value',
value_column = 7,
period = 0.05, # using the default
iqr_alpha = 0.75, # using the default
clean_alpha = "min_max"
clean
)
anomalize_df.glimpse()
```

```
<class 'pandas.core.frame.DataFrame'>: 365 rows of 12 columns
date: datetime64[ns] [Timestamp('2015-01-01 00:00:00'), ...
observed: float64 [2351.0, 2302.7, 2300.7, 2341.2, 2 ...
seasonal: float64 [14.163009085035995, -17.341946034 ...
seasadj: float64 [2336.836990914964, 2320.041946034 ...
trend: float64 [2323.900317851228, 2322.996460334 ...
remainder: float64 [12.93667306373618, -2.95451429904 ...
anomaly: object ['No', 'No', 'No', 'No', 'No', 'No ...
anomaly_score: float64 [19.42215274680143, 35.31334010958 ...
anomaly_direction: int64 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
recomposed_l1: float64 [2179.860403909094, 2147.451591271 ...
recomposed_l2: float64 [2560.9839015845087, 2528.57508894 ...
observed_clean: float64 [2351.0, 2302.7, 2300.7, 2341.2, 2 ...
```

## 1.3 Plot Seasonal Decomposition

We plot the seaonal decomposition to get a visual representation;

`plot_anomalies_decomp()`

Use `help(tk.plot_anomalies_decomp)`

to review additional helpful documentation.

```
# Plot seasonal decomposition
tk.plot_anomalies_decomp(= anomalize_df,
data = 'date',
date_column = 'plotly',
engine = 'Seasonal Decomposition'
title )
```

## 1.4 Plot Anomalies

Next we can plot the anomalies using `tk.plot_anomalies()`

;

`plot_anomalies()`

Use `help(tk.plot_anomalies)`

to review additional helpful documentation.

```
# Plot anomalies
tk.plot_anomalies(= anomalize_df,
data = 'date',
date_column = 'plotly',
engine = 'Plot Anomaly Bands'
title )
```

## 1.5 Plot Cleaned Anomalies

Finally we can also see a plot of the data with cleaned anomalies using `plot_anomalies_cleaned()`

;

`plot_anomalies_cleaned()`

Use `help(tk.plot_anomalies_cleaned)`

to review additional helpful documentation.

```
# Plot cleaned anomalies
tk.plot_anomalies_cleaned(= anomalize_df,
data = 'date'
date_column )
```

## 1.6 Changing Parameters

Some important parameters to hightlight in the `anomalize()`

function include `iqr_alpha`

.

`iqr_alpha`

controls the threshold for detecting outliers. It is the significance level used in the interquartile range (IQR) method for outlier detection. The default value is 0.05, which corresponds to a 5% significance level. A lower significance level will result in a higher threshold, which means fewer outliers will be detected. A higher significance level will result in a lower threshold, which means more outliers will be detected.

Lets visualize the effect of changing the `iqr_alpha`

parameter;

### Changing `iqr_alpha`

First, lets get a dataframe with multiple values for `iqr_alpha`

;

```
# Anomalized data with multiple iqr_alpha values
# - Alpha values
= [0.05, 0.10, 0.15, 0.20]
iqr_alpha_values
# - Empty dataframes list
= []
dfs
for alpha in iqr_alpha_values:
# - Run anomalize function
= tk.anomalize(
anomalize_df = df,
data = 'date',
date_column = 'value',
value_column = 7,
period = alpha
iqr_alpha
)
# - Add the iqr_alpha column
'iqr_alpha'] = f'iqr_alpha value of {alpha}'
anomalize_df[
# - Append to the list
dfs.append(anomalize_df)
# - Concatenate all dataframes
= pd.concat(dfs) final_df
```

Now we can visualize the anomalies;

```
# Visualize
(
final_df'iqr_alpha')
.groupby(
.plot_anomalies(= 'date',
date_column = 'plotly',
engine = 2
facet_ncol
) )
```

# 2 More Coming Soon…

We are in the early stages of development. But it’s obvious the potential for `pytimetk`

now in Python. 🐍

- Please ⭐ us on GitHub (it takes 2-seconds and means a lot).
- To make requests, please see our Project Roadmap GH Issue #2. You can make requests there.
- Want to contribute? See our contributing guide here.