import pandas as pd
import pytimetk as tk
= tk.load_dataset('bike_sales_sample', parse_dates = ['order_date']) df
augment_timeseries_signature
augment_timeseries_signature(
data,
date_column,=False,
reduce_memory='pandas',
engine )
The function augment_timeseries_signature
takes a DataFrame and a date column as input and returns the original DataFrame with the 29 different date and time based features added as new columns with the feature name based on the date_column.
Parameters
Name | Type | Description | Default |
---|---|---|---|
data | DataFrame or GroupBy(pandas or polars) | Tabular time series data. Grouped inputs are processed per group before the signature columns are appended. Accepts both pandas and polars inputs. | required |
date_column | str | The date_column parameter is a string that represents the name of the date column in the data DataFrame. |
required |
reduce_memory | bool | The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is False. |
False |
engine | str | The engine parameter is used to specify the engine to use for augmenting datetime features. It can be either “pandas” or “polars”. - The default value is “pandas”. - When “polars”, the function will internally use the polars library for feature generation. This is generally faster than using “pandas” for large datasets. |
'pandas' |
Returns
Name | Type | Description |
---|---|---|
DataFrame | Data with 29 datetime features appended. The return type matches the input backend. | |
- _index_num: An int64 feature that captures the entire datetime as a numeric value to the second | ||
- _year: The year of the datetime | ||
- _year_iso: The iso year of the datetime | ||
- _yearstart: Logical (0,1) indicating if first day of year (defined by frequency) | ||
- _yearend: Logical (0,1) indicating if last day of year (defined by frequency) | ||
- _leapyear: Logical (0,1) indicating if the date belongs to a leap year | ||
- _half: Half year of the date: Jan-Jun = 1, July-Dec = 2 | ||
- _quarter: Quarter of the date: Jan-Mar = 1, Apr-Jun = 2, Jul-Sep = 3, Oct-Dec = 4 | ||
- _quarteryear: Quarter of the date + relative year | ||
- _quarterstart: Logical (0,1) indicating if first day of quarter (defined by frequency) | ||
- _quarterend: Logical (0,1) indicating if last day of quarter (defined by frequency) | ||
- _month: The month of the datetime | ||
- _month_lbl: The month label of the datetime | ||
- _monthstart: Logical (0,1) indicating if first day of month (defined by frequency) | ||
- _monthend: Logical (0,1) indicating if last day of month (defined by frequency) | ||
- _yweek: The week ordinal of the year | ||
- _mweek: The week ordinal of the month | ||
- _wday: The number of the day of the week with Monday=1, Sunday=6 | ||
- _wday_lbl: The day of the week label | ||
- _mday: The day of the datetime | ||
- _qday: The days of the relative quarter | ||
- _yday: The ordinal day of year | ||
- _weekend: Logical (0,1) indicating if the day is a weekend | ||
- _hour: The hour of the datetime | ||
- _minute: The minutes of the datetime | ||
- _second: The seconds of the datetime | ||
- _msecond: The microseconds of the datetime | ||
- _nsecond: The nanoseconds of the datetime | ||
- _am_pm: Half of the day, AM = ante meridiem, PM = post meridiem |
Examples
# Adds 29 new time series features as columns to the original DataFrame (pandas engine)
(
df='order_date', engine ='pandas')
.augment_timeseries_signature(date_column
.glimpse() )
<class 'pandas.core.frame.DataFrame'>: 2466 rows of 42 columns
order_id: int64 [1, 1, 2, 2, 3, 3, 3, 3, 3, ...
order_line: int64 [1, 2, 1, 2, 1, 2, 3, 4, 5, ...
order_date: datetime64[ns] [Timestamp('2011-01-07 00:00 ...
quantity: int64 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
price: int64 [6070, 5970, 2770, 5970, 106 ...
total_price: int64 [6070, 5970, 2770, 5970, 106 ...
model: object ['Jekyll Carbon 2', 'Trigger ...
category_1: object ['Mountain', 'Mountain', 'Mo ...
category_2: object ['Over Mountain', 'Over Moun ...
frame_material: object ['Carbon', 'Carbon', 'Alumin ...
bikeshop_name: object ['Ithaca Mountain Climbers', ...
city: object ['Ithaca', 'Ithaca', 'Kansas ...
state: object ['NY', 'NY', 'KS', 'KS', 'KY ...
order_date_index_num: int64 [1294358400, 1294358400, 129 ...
order_date_year: int32 [2011, 2011, 2011, 2011, 201 ...
order_date_year_iso: UInt32 [2011, 2011, 2011, 2011, 201 ...
order_date_yearstart: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_yearend: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_leapyear: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_half: int64 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_quarter: int32 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_quarteryear: object ['2011Q1', '2011Q1', '2011Q1 ...
order_date_quarterstart: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_quarterend: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_month: int32 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_month_lbl: object ['January', 'January', 'Janu ...
order_date_monthstart: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_monthend: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_yweek: UInt32 [1, 1, 2, 2, 2, 2, 2, 2, 2, ...
order_date_mweek: int32 [1, 1, 2, 2, 2, 2, 2, 2, 2, ...
order_date_wday: int32 [5, 5, 1, 1, 1, 1, 1, 1, 1, ...
order_date_wday_lbl: object ['Friday', 'Friday', 'Monday ...
order_date_mday: int32 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_qday: int64 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_yday: int32 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_weekend: int64 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_hour: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_minute: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_second: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_msecond: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_nsecond: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_am_pm: object ['am', 'am', 'am', 'am', 'am ...
# Adds 29 new time series features as columns to the original DataFrame (polars engine)
(
df='order_date', engine ='polars')
.augment_timeseries_signature(date_column
.glimpse() )
<class 'pandas.core.frame.DataFrame'>: 2466 rows of 42 columns
order_id: int64 [1, 1, 2, 2, 3, 3, 3, 3, 3, ...
order_line: int64 [1, 2, 1, 2, 1, 2, 3, 4, 5, ...
order_date: datetime64[ns] [Timestamp('2011-01-07 00:00 ...
quantity: int64 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
price: int64 [6070, 5970, 2770, 5970, 106 ...
total_price: int64 [6070, 5970, 2770, 5970, 106 ...
model: object ['Jekyll Carbon 2', 'Trigger ...
category_1: object ['Mountain', 'Mountain', 'Mo ...
category_2: object ['Over Mountain', 'Over Moun ...
frame_material: object ['Carbon', 'Carbon', 'Alumin ...
bikeshop_name: object ['Ithaca Mountain Climbers', ...
city: object ['Ithaca', 'Ithaca', 'Kansas ...
state: object ['NY', 'NY', 'KS', 'KS', 'KY ...
order_date_index_num: int64 [1294358400, 1294358400, 129 ...
order_date_year: int32 [2011, 2011, 2011, 2011, 201 ...
order_date_year_iso: UInt32 [2011, 2011, 2011, 2011, 201 ...
order_date_yearstart: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_yearend: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_leapyear: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_half: int64 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_quarter: int32 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_quarteryear: object ['2011Q1', '2011Q1', '2011Q1 ...
order_date_quarterstart: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_quarterend: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_month: int32 [1, 1, 1, 1, 1, 1, 1, 1, 1, ...
order_date_month_lbl: object ['January', 'January', 'Janu ...
order_date_monthstart: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_monthend: uint8 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_yweek: UInt32 [1, 1, 2, 2, 2, 2, 2, 2, 2, ...
order_date_mweek: int32 [1, 1, 2, 2, 2, 2, 2, 2, 2, ...
order_date_wday: int32 [5, 5, 1, 1, 1, 1, 1, 1, 1, ...
order_date_wday_lbl: object ['Friday', 'Friday', 'Monday ...
order_date_mday: int32 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_qday: int64 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_yday: int32 [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_weekend: int64 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_hour: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_minute: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_second: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_msecond: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_nsecond: int32 [0, 0, 0, 0, 0, 0, 0, 0, 0, ...
order_date_am_pm: object ['am', 'am', 'am', 'am', 'am ...
# Polars DataFrame using the tk accessor
import polars as pl
= pl.from_pandas(df)
pl_df
='order_date') pl_df.tk.augment_timeseries_signature(date_column
shape: (2_466, 42)
order_id | order_line | order_date | quantity | price | total_price | model | category_1 | category_2 | frame_material | bikeshop_name | city | state | order_date_index_num | order_date_year | order_date_year_iso | order_date_yearstart | order_date_yearend | order_date_leapyear | order_date_half | order_date_quarter | order_date_quarteryear | order_date_quarterstart | order_date_quarterend | order_date_month | order_date_month_lbl | order_date_monthstart | order_date_monthend | order_date_yweek | order_date_mweek | order_date_wday | order_date_wday_lbl | order_date_mday | order_date_qday | order_date_yday | order_date_weekend | order_date_hour | order_date_minute | order_date_second | order_date_msecond | order_date_nsecond | order_date_am_pm |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
i64 | i64 | datetime[ns] | i64 | i64 | i64 | str | str | str | str | str | str | str | i64 | i32 | u32 | u8 | u8 | u8 | i64 | i32 | str | u8 | u8 | i32 | str | u8 | u8 | u32 | i32 | i32 | str | i32 | i64 | i32 | i64 | i32 | i32 | i32 | i32 | i32 | str |
1 | 1 | 2011-01-07 00:00:00 | 1 | 6070 | 6070 | "Jekyll Carbon 2" | "Mountain" | "Over Mountain" | "Carbon" | "Ithaca Mountain Climbers" | "Ithaca" | "NY" | 1294358400 | 2011 | 2011 | 0 | 0 | 0 | 1 | 1 | "2011Q1" | 0 | 0 | 1 | "January" | 0 | 0 | 1 | 1 | 5 | "Friday" | 7 | 7 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |
1 | 2 | 2011-01-07 00:00:00 | 1 | 5970 | 5970 | "Trigger Carbon 2" | "Mountain" | "Over Mountain" | "Carbon" | "Ithaca Mountain Climbers" | "Ithaca" | "NY" | 1294358400 | 2011 | 2011 | 0 | 0 | 0 | 1 | 1 | "2011Q1" | 0 | 0 | 1 | "January" | 0 | 0 | 1 | 1 | 5 | "Friday" | 7 | 7 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |
2 | 1 | 2011-01-10 00:00:00 | 1 | 2770 | 2770 | "Beast of the East 1" | "Mountain" | "Trail" | "Aluminum" | "Kansas City 29ers" | "Kansas City" | "KS" | 1294617600 | 2011 | 2011 | 0 | 0 | 0 | 1 | 1 | "2011Q1" | 0 | 0 | 1 | "January" | 0 | 0 | 2 | 2 | 1 | "Monday" | 10 | 10 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |
2 | 2 | 2011-01-10 00:00:00 | 1 | 5970 | 5970 | "Trigger Carbon 2" | "Mountain" | "Over Mountain" | "Carbon" | "Kansas City 29ers" | "Kansas City" | "KS" | 1294617600 | 2011 | 2011 | 0 | 0 | 0 | 1 | 1 | "2011Q1" | 0 | 0 | 1 | "January" | 0 | 0 | 2 | 2 | 1 | "Monday" | 10 | 10 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |
3 | 1 | 2011-01-10 00:00:00 | 1 | 10660 | 10660 | "Supersix Evo Hi-Mod Team" | "Road" | "Elite Road" | "Carbon" | "Louisville Race Equipment" | "Louisville" | "KY" | 1294617600 | 2011 | 2011 | 0 | 0 | 0 | 1 | 1 | "2011Q1" | 0 | 0 | 1 | "January" | 0 | 0 | 2 | 2 | 1 | "Monday" | 10 | 10 | 10 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
321 | 3 | 2011-12-22 00:00:00 | 1 | 1410 | 1410 | "CAAD8 105" | "Road" | "Elite Road" | "Aluminum" | "Miami Race Equipment" | "Miami" | "FL" | 1324512000 | 2011 | 2011 | 0 | 0 | 0 | 2 | 4 | "2011Q4" | 0 | 0 | 12 | "December" | 0 | 0 | 51 | 4 | 4 | "Thursday" | 22 | 83 | 356 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |
322 | 1 | 2011-12-28 00:00:00 | 1 | 1250 | 1250 | "Synapse Disc Tiagra" | "Road" | "Endurance Road" | "Aluminum" | "Phoenix Bi-peds" | "Phoenix" | "AZ" | 1325030400 | 2011 | 2011 | 0 | 0 | 0 | 2 | 4 | "2011Q4" | 0 | 0 | 12 | "December" | 0 | 0 | 52 | 4 | 3 | "Wednesday" | 28 | 89 | 362 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |
322 | 2 | 2011-12-28 00:00:00 | 1 | 2660 | 2660 | "Bad Habit 2" | "Mountain" | "Trail" | "Aluminum" | "Phoenix Bi-peds" | "Phoenix" | "AZ" | 1325030400 | 2011 | 2011 | 0 | 0 | 0 | 2 | 4 | "2011Q4" | 0 | 0 | 12 | "December" | 0 | 0 | 52 | 4 | 3 | "Wednesday" | 28 | 89 | 362 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |
322 | 3 | 2011-12-28 00:00:00 | 1 | 2340 | 2340 | "F-Si 1" | "Mountain" | "Cross Country Race" | "Aluminum" | "Phoenix Bi-peds" | "Phoenix" | "AZ" | 1325030400 | 2011 | 2011 | 0 | 0 | 0 | 2 | 4 | "2011Q4" | 0 | 0 | 12 | "December" | 0 | 0 | 52 | 4 | 3 | "Wednesday" | 28 | 89 | 362 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |
322 | 4 | 2011-12-28 00:00:00 | 1 | 5860 | 5860 | "Synapse Hi-Mod Dura Ace" | "Road" | "Endurance Road" | "Carbon" | "Phoenix Bi-peds" | "Phoenix" | "AZ" | 1325030400 | 2011 | 2011 | 0 | 0 | 0 | 2 | 4 | "2011Q4" | 0 | 0 | 12 | "December" | 0 | 0 | 52 | 4 | 3 | "Wednesday" | 28 | 89 | 362 | 0 | 0 | 0 | 0 | 0 | 0 | "am" |