ts_summary

ts_summary(data, date_column, threads=1, show_progress=True, engine='pandas')

Computes summary statistics for a time series data, either for the entire dataset or grouped by a specific column.

Parameters

Name Type Description Default
data pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy The data parameter can be either a Pandas DataFrame or a Pandas DataFrameGroupBy object. It represents the data that you want to summarize. required
date_column str The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to compute summary statistics for the time series data. required
engine str The engine parameter is used to specify the engine to use for augmenting lags. It can be either โ€œpandasโ€ or โ€œpolarsโ€. - The default value is โ€œpandasโ€. - When โ€œpolarsโ€, the function will internally use the polars library. This can be faster than using โ€œpandasโ€ for large datasets. 'pandas'

Returns

Type Description
pd.DataFrame The ts_summary function returns a summary of time series data. The summary includes the following statistics: - If grouped data is provided, the returned data will contain the grouping columns first. - date_n: The number of observations in the time series. - date_tz: The time zone of the time series. - date_start: The first date in the time series. - date_end: The last date in the time series. - freq_inferred_unit: The inferred frequency of the time series from pandas. - freq_median_timedelta: The median time difference between consecutive observations in the time series. - freq_median_scale: The median time difference between consecutive observations in the time series, scaled to a common unit. - freq_median_unit: The unit of the median time difference between consecutive observations in the time series. - diff_min: The minimum time difference between consecutive observations in the time series as a timedelta. - diff_q25: The 25th percentile of the time difference between consecutive observations in the time series as a timedelta. - diff_median: The median time difference between consecutive observations in the time series as a timedelta. - diff_mean: The mean time difference between consecutive observations in the time series as a timedelta. - diff_q75: The 75th percentile of the time difference between consecutive observations in the time series as a timedelta. - diff_max: The maximum time difference between consecutive observations in the time series as a timedelta. - diff_min_seconds: The minimum time difference between consecutive observations in the time series in seconds. - diff_q25_seconds: The 25th percentile of the time difference between consecutive observations in the time series in seconds. - diff_median_seconds: The median time difference between consecutive observations in the time series in seconds. - diff_mean_seconds: The mean time difference between consecutive observations in the time series in seconds. - diff_q75_seconds: The 75th percentile of the time difference between consecutive observations in the time series in seconds. - diff_max_seconds: The maximum time difference between consecutive observations in the time series in seconds.

Notes

Performance

This function uses parallel processing to speed up computation for large datasets with many time series groups:

Parallel processing has overhead and may not be faster on small datasets.

To use parallel processing, set threads = -1 to use all available processors.

Examples

import pytimetk as tk
import pandas as pd

dates = pd.to_datetime(["2023-10-02", "2023-10-03", "2023-10-04", "2023-10-05", "2023-10-06", "2023-10-09", "2023-10-10"]) 
df = pd.DataFrame(dates, columns = ["date"])

df.ts_summary(date_column = 'date')
date_n date_tz date_start date_end freq_inferred_unit freq_median_timedelta freq_median_scale freq_median_unit diff_min diff_q25 diff_median diff_mean diff_q75 diff_max diff_min_seconds diff_q25_seconds diff_median_seconds diff_mean_seconds diff_q75_seconds diff_max_seconds
0 7 None 2023-10-02 2023-10-10 B 1 days 1.0 D 1 days 1 days 1 days 1 days 08:00:00 1 days 3 days 86400.0 86400.0 86400.0 115200.0 86400.0 259200.0
# Grouped ts_summary
df = tk.load_dataset('stocks_daily', parse_dates = ['date'])
 
df.groupby('symbol').ts_summary(date_column = 'date') 
symbol date_n date_tz date_start date_end freq_inferred_unit freq_median_timedelta freq_median_scale freq_median_unit diff_min ... diff_median diff_mean diff_q75 diff_max diff_min_seconds diff_q25_seconds diff_median_seconds diff_mean_seconds diff_q75_seconds diff_max_seconds
0 AAPL 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 AMZN 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 GOOG 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 META 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 NFLX 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 NVDA 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0

6 rows ร— 21 columns

# Parallelized grouped ts_summary 
(
    df 
        .groupby('symbol') 
        .ts_summary(
            date_column = 'date', 
            threads = 2, 
            show_progress = True
        ) 
)
symbol date_n date_tz date_start date_end freq_inferred_unit freq_median_timedelta freq_median_scale freq_median_unit diff_min ... diff_median diff_mean diff_q75 diff_max diff_min_seconds diff_q25_seconds diff_median_seconds diff_mean_seconds diff_q75_seconds diff_max_seconds
0 AAPL 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 AMZN 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 GOOG 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 META 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 NFLX 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0
0 NVDA 2699 None 2013-01-02 2023-09-21 B 1 days 1.0 D 1 days ... 1 days 1 days 10:49:00.845070422 1 days 4 days 86400.0 86400.0 86400.0 125340.84507 86400.0 345600.0

6 rows ร— 21 columns