import pandas as pd
import pytimetk as tk
= tk.load_dataset('bike_sales_sample', parse_dates = ['order_date']) df
augment_timeseries_signature
augment_timeseries_signature(data, date_column, reduce_memory=False, engine='pandas')
The function augment_timeseries_signature
takes a DataFrame and a date column as input and returns the original DataFrame with the 29 different date and time based features added as new columns with the feature name based on the date_column.
Parameters
Name | Type | Description | Default |
---|---|---|---|
data |
pd.DataFrame | The data parameter is a pandas DataFrame that contains the time series data. |
required |
date_column |
str | The date_column parameter is a string that represents the name of the date column in the data DataFrame. |
required |
reduce_memory |
bool | The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is False. |
False |
engine |
str | The engine parameter is used to specify the engine to use for augmenting datetime features. It can be either “pandas” or “polars”. - The default value is “pandas”. - When “polars”, the function will internally use the polars library for feature generation. This is generally faster than using “pandas” for large datasets. |
'pandas' |
Returns
Type | Description |
---|---|
pd.DataFrame | A Pandas DataFrame with 29 datetime features added to it. |
- _index_num: An int64 feature that captures the entire datetime as a numeric value to the second | |
- _year: The year of the datetime | |
- _year_iso: The iso year of the datetime | |
- _yearstart: Logical (0,1) indicating if first day of year (defined by frequency) | |
- _yearend: Logical (0,1) indicating if last day of year (defined by frequency) | |
- _leapyear: Logical (0,1) indicating if the date belongs to a leap year | |
- _half: Half year of the date: Jan-Jun = 1, July-Dec = 2 | |
- _quarter: Quarter of the date: Jan-Mar = 1, Apr-Jun = 2, Jul-Sep = 3, Oct-Dec = 4 | |
- _quarteryear: Quarter of the date + relative year | |
- _quarterstart: Logical (0,1) indicating if first day of quarter (defined by frequency) | |
- _quarterend: Logical (0,1) indicating if last day of quarter (defined by frequency) | |
- _month: The month of the datetime | |
- _month_lbl: The month label of the datetime | |
- _monthstart: Logical (0,1) indicating if first day of month (defined by frequency) | |
- _monthend: Logical (0,1) indicating if last day of month (defined by frequency) | |
- _yweek: The week ordinal of the year | |
- _mweek: The week ordinal of the month | |
- _wday: The number of the day of the week with Monday=1, Sunday=6 | |
- _wday_lbl: The day of the week label | |
- _mday: The day of the datetime | |
- _qday: The days of the relative quarter | |
- _yday: The ordinal day of year | |
- _weekend: Logical (0,1) indicating if the day is a weekend | |
- _hour: The hour of the datetime | |
- _minute: The minutes of the datetime | |
- _second: The seconds of the datetime | |
- _msecond: The microseconds of the datetime | |
- _nsecond: The nanoseconds of the datetime | |
- _am_pm: Half of the day, AM = ante meridiem, PM = post meridiem |
Examples
# Adds 29 new time series features as columns to the original DataFrame (pandas engine)
(
df='order_date', engine ='pandas')
.augment_timeseries_signature(date_column
.glimpse() )
<class 'pandas.core.frame.DataFrame'>: 2466 rows of 42 columns
order_id: int64 [1, 1, 2, 2, 3, 3, 3, 3, 3, ...
order_line: int64 [1, 2, 1, 2, 1, 2, 3, 4, 5, ...
order_date: datetime64[ns] [Timestamp('2011-01-07 00:00 ...
quantity: int64 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
price: int64 [6070, 5970, 2770, 5970, 106 ...
total_price: int64 [6070, 5970, 2770, 5970, 106 ...
model: object ['Jekyll Carbon 2', 'Trigger ...
category_1: object ['Mountain', 'Mountain', 'Mo ...
category_2: object ['Over Mountain', 'Over Moun ...
frame_material: object ['Carbon', 'Carbon', 'Alumin ...
bikeshop_name: object ['Ithaca Mountain Climbers', ...
city: object ['Ithaca', 'Ithaca', 'Kansas ...
state: object ['NY', 'NY', 'KS', 'KS', 'KY ...
order_date_index_num: int64 [1294358400, 1294358400, 129 ...
order_date_year: int64 [2011, 2011, 2011, 2011, 201 ...
order_date_year_iso: UInt32 [2011, 2011, 2011, 2011, 201 ...
order_date_yearstart: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_yearend: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_leapyear: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_half: int64 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_quarter: int64 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_quarteryear: object ['2011Q1', '2011Q1', '2011Q1 ...
order_date_quarterstart: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_quarterend: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_month: int64 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_month_lbl: object ['January', 'January', 'Janu ...
order_date_monthstart: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_monthend: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_yweek: UInt32 [1, 1, 2, 2, 2, 2, 2, 2, 2, ...
order_date_mweek: int64 [1, 1, 2, 2, 2, 2, 2, 2, 2, ...
order_date_wday: int64 [5, 5, 1, 1, 1, 1, 1, 1, 1, ...
order_date_wday_lbl: object ['Friday', 'Friday', 'Monday ...
order_date_mday: int64 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_qday: int64 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_yday: int64 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_weekend: int64 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_hour: int64 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_minute: int64 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_second: int64 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_msecond: int64 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_nsecond: int64 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_am_pm: object ['am', 'am', 'am', 'am', 'am ...
# Adds 29 new time series features as columns to the original DataFrame (polars engine)
(
df='order_date', engine ='polars')
.augment_timeseries_signature(date_column
.glimpse() )
<class 'pandas.core.frame.DataFrame'>: 2466 rows of 42 columns
order_id: int64 [1, 1, 2, 2, 3, 3, 3, 3, 3, ...
order_line: int64 [1, 2, 1, 2, 1, 2, 3, 4, 5, ...
order_date: datetime64[ns] [Timestamp('2011-01-07 00:00 ...
quantity: int64 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
price: int64 [6070, 5970, 2770, 5970, 106 ...
total_price: int64 [6070, 5970, 2770, 5970, 106 ...
model: object ['Jekyll Carbon 2', 'Trigger ...
category_1: object ['Mountain', 'Mountain', 'Mo ...
category_2: object ['Over Mountain', 'Over Moun ...
frame_material: object ['Carbon', 'Carbon', 'Alumin ...
bikeshop_name: object ['Ithaca Mountain Climbers', ...
city: object ['Ithaca', 'Ithaca', 'Kansas ...
state: object ['NY', 'NY', 'KS', 'KS', 'KY ...
order_date_index_num: float64 [1294358400.0, 1294358400.0, ...
order_date_year: int32 [2011, 2011, 2011, 2011, 201 ...
order_date_year_iso: int32 [2011, 2011, 2011, 2011, 201 ...
order_date_yearstart: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_yearend: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_leapyear: int8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_half: int32 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_quarter: uint32 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_quarteryear: object ['2011Q1', '2011Q1', '2011Q1 ...
order_date_quarterstart: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_quarterend: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_month: uint32 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_month_lbl: object ['January', 'January', 'Janu ...
order_date_monthstart: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_monthend: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_yweek: uint32 [1, 1, 2, 2, 2, 2, 2, 2, 2, ...
order_date_mweek: uint32 [1, 1, 2, 2, 2, 2, 2, 2, 2, ...
order_date_wday: uint32 [5, 5, 1, 1, 1, 1, 1, 1, 1, ...
order_date_wday_lbl: object ['Friday', 'Friday', 'Monday ...
order_date_mday: uint32 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_qday: int64 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_yday: uint32 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_weekend: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_hour: uint32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_minute: uint32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_second: uint32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_msecond: uint32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_nsecond: uint32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_am_pm: object ['am', 'am', 'am', 'am', 'am ...