Computes summary statistics for a time series data, either for the entire dataset or grouped by a specific column.
Parameters
Name
Type
Description
Default
data
pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy
The data parameter can be either a Pandas DataFrame or a Pandas DataFrameGroupBy object. It represents the data that you want to summarize.
required
date_column
str
The date_column parameter is a string that specifies the name of the column in the DataFrame that contains the dates. This column will be used to compute summary statistics for the time series data.
required
engine
str
The engine parameter is used to specify the engine to use for augmenting lags. It can be either โpandasโ or โpolarsโ. - The default value is โpandasโ. - When โpolarsโ, the function will internally use the polars library. This can be faster than using โpandasโ for large datasets.
'pandas'
Returns
Type
Description
pd.DataFrame
The ts_summary function returns a summary of time series data. The summary includes the following statistics: - If grouped data is provided, the returned data will contain the grouping columns first. - date_n: The number of observations in the time series. - date_tz: The time zone of the time series. - date_start: The first date in the time series. - date_end: The last date in the time series. - freq_inferred_unit: The inferred frequency of the time series from pandas. - freq_median_timedelta: The median time difference between consecutive observations in the time series. - freq_median_scale: The median time difference between consecutive observations in the time series, scaled to a common unit. - freq_median_unit: The unit of the median time difference between consecutive observations in the time series. - diff_min: The minimum time difference between consecutive observations in the time series as a timedelta. - diff_q25: The 25th percentile of the time difference between consecutive observations in the time series as a timedelta. - diff_median: The median time difference between consecutive observations in the time series as a timedelta. - diff_mean: The mean time difference between consecutive observations in the time series as a timedelta. - diff_q75: The 75th percentile of the time difference between consecutive observations in the time series as a timedelta. - diff_max: The maximum time difference between consecutive observations in the time series as a timedelta. - diff_min_seconds: The minimum time difference between consecutive observations in the time series in seconds. - diff_q25_seconds: The 25th percentile of the time difference between consecutive observations in the time series in seconds. - diff_median_seconds: The median time difference between consecutive observations in the time series in seconds. - diff_mean_seconds: The mean time difference between consecutive observations in the time series in seconds. - diff_q75_seconds: The 75th percentile of the time difference between consecutive observations in the time series in seconds. - diff_max_seconds: The maximum time difference between consecutive observations in the time series in seconds.
Notes
Performance
This function uses parallel processing to speed up computation for large datasets with many time series groups:
Parallel processing has overhead and may not be faster on small datasets.
To use parallel processing, set threads = -1 to use all available processors.