augment_timeseries_signature

augment_timeseries_signature(
    data,
    date_column,
    reduce_memory=False,
    engine='pandas',
)

The function augment_timeseries_signature takes a DataFrame and a date column as input and returns the original DataFrame with the 29 different date and time based features added as new columns with the feature name based on the date_column.

Parameters

Name Type Description Default
data DataFrame or GroupBy(pandas or polars) Tabular time series data. Grouped inputs are processed per group before the signature columns are appended. Accepts both pandas and polars inputs. required
date_column str The date_column parameter is a string that represents the name of the date column in the data DataFrame. required
reduce_memory bool The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is False. False
engine str The engine parameter is used to specify the engine to use for augmenting datetime features. It can be either “pandas” or “polars”. - The default value is “pandas”. - When “polars”, the function will internally use the polars library for feature generation. This is generally faster than using “pandas” for large datasets. 'pandas'

Returns

Name Type Description
DataFrame Data with 29 datetime features appended. The return type matches the input backend.
- _index_num: An int64 feature that captures the entire datetime as a numeric value to the second
- _year: The year of the datetime
- _year_iso: The iso year of the datetime
- _yearstart: Logical (0,1) indicating if first day of year (defined by frequency)
- _yearend: Logical (0,1) indicating if last day of year (defined by frequency)
- _leapyear: Logical (0,1) indicating if the date belongs to a leap year
- _half: Half year of the date: Jan-Jun = 1, July-Dec = 2
- _quarter: Quarter of the date: Jan-Mar = 1, Apr-Jun = 2, Jul-Sep = 3, Oct-Dec = 4
- _quarteryear: Quarter of the date + relative year
- _quarterstart: Logical (0,1) indicating if first day of quarter (defined by frequency)
- _quarterend: Logical (0,1) indicating if last day of quarter (defined by frequency)
- _month: The month of the datetime
- _month_lbl: The month label of the datetime
- _monthstart: Logical (0,1) indicating if first day of month (defined by frequency)
- _monthend: Logical (0,1) indicating if last day of month (defined by frequency)
- _yweek: The week ordinal of the year
- _mweek: The week ordinal of the month
- _wday: The number of the day of the week with Monday=1, Sunday=6
- _wday_lbl: The day of the week label
- _mday: The day of the datetime
- _qday: The days of the relative quarter
- _yday: The ordinal day of year
- _weekend: Logical (0,1) indicating if the day is a weekend
- _hour: The hour of the datetime
- _minute: The minutes of the datetime
- _second: The seconds of the datetime
- _msecond: The microseconds of the datetime
- _nsecond: The nanoseconds of the datetime
- _am_pm: Half of the day, AM = ante meridiem, PM = post meridiem

Examples

import pandas as pd
import pytimetk as tk

df = tk.load_dataset('bike_sales_sample', parse_dates = ['order_date'])
# Adds 29 new time series features as columns to the original DataFrame (pandas engine)
(
    df
        .augment_timeseries_signature(date_column='order_date', engine ='pandas')
        .glimpse()
)
<class 'pandas.core.frame.DataFrame'>: 2466 rows of 42 columns
order_id:                 int64             [1, 1, 2, 2, 3, 3, 3, 3, 3,  ...
order_line:               int64             [1, 2, 1, 2, 1, 2, 3, 4, 5,  ...
order_date:               datetime64[ns]    [Timestamp('2011-01-07 00:00 ...
quantity:                 int64             [1, 1, 1, 1, 1, 1, 1, 1, 1,  ...
price:                    int64             [6070, 5970, 2770, 5970, 106 ...
total_price:              int64             [6070, 5970, 2770, 5970, 106 ...
model:                    object            ['Jekyll Carbon 2', 'Trigger ...
category_1:               object            ['Mountain', 'Mountain', 'Mo ...
category_2:               object            ['Over Mountain', 'Over Moun ...
frame_material:           object            ['Carbon', 'Carbon', 'Alumin ...
bikeshop_name:            object            ['Ithaca Mountain Climbers', ...
city:                     object            ['Ithaca', 'Ithaca', 'Kansas ...
state:                    object            ['NY', 'NY', 'KS', 'KS', 'KY ...
order_date_index_num:     int64             [1294358400, 1294358400, 129 ...
order_date_year:          int32             [2011, 2011, 2011, 2011, 201 ...
order_date_year_iso:      UInt32            [2011, 2011, 2011, 2011, 201 ...
order_date_yearstart:     uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_yearend:       uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_leapyear:      uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_half:          int64             [1, 1, 1, 1, 1, 1, 1, 1, 1,  ...
order_date_quarter:       int32             [1, 1, 1, 1, 1, 1, 1, 1, 1,  ...
order_date_quarteryear:   object            ['2011Q1', '2011Q1', '2011Q1 ...
order_date_quarterstart:  uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_quarterend:    uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_month:         int32             [1, 1, 1, 1, 1, 1, 1, 1, 1,  ...
order_date_month_lbl:     object            ['January', 'January', 'Janu ...
order_date_monthstart:    uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_monthend:      uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_yweek:         UInt32            [1, 1, 2, 2, 2, 2, 2, 2, 2,  ...
order_date_mweek:         int32             [1, 1, 2, 2, 2, 2, 2, 2, 2,  ...
order_date_wday:          int32             [5, 5, 1, 1, 1, 1, 1, 1, 1,  ...
order_date_wday_lbl:      object            ['Friday', 'Friday', 'Monday ...
order_date_mday:          int32             [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_qday:          int64             [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_yday:          int32             [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_weekend:       int64             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_hour:          int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_minute:        int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_second:        int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_msecond:       int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_nsecond:       int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_am_pm:         object            ['am', 'am', 'am', 'am', 'am ...
# Adds 29 new time series features as columns to the original DataFrame (polars engine)
(
    df
        .augment_timeseries_signature(date_column='order_date', engine ='polars')
        .glimpse()
)
<class 'pandas.core.frame.DataFrame'>: 2466 rows of 42 columns
order_id:                 int64             [1, 1, 2, 2, 3, 3, 3, 3, 3,  ...
order_line:               int64             [1, 2, 1, 2, 1, 2, 3, 4, 5,  ...
order_date:               datetime64[ns]    [Timestamp('2011-01-07 00:00 ...
quantity:                 int64             [1, 1, 1, 1, 1, 1, 1, 1, 1,  ...
price:                    int64             [6070, 5970, 2770, 5970, 106 ...
total_price:              int64             [6070, 5970, 2770, 5970, 106 ...
model:                    object            ['Jekyll Carbon 2', 'Trigger ...
category_1:               object            ['Mountain', 'Mountain', 'Mo ...
category_2:               object            ['Over Mountain', 'Over Moun ...
frame_material:           object            ['Carbon', 'Carbon', 'Alumin ...
bikeshop_name:            object            ['Ithaca Mountain Climbers', ...
city:                     object            ['Ithaca', 'Ithaca', 'Kansas ...
state:                    object            ['NY', 'NY', 'KS', 'KS', 'KY ...
order_date_index_num:     int64             [1294358400, 1294358400, 129 ...
order_date_year:          int32             [2011, 2011, 2011, 2011, 201 ...
order_date_year_iso:      UInt32            [2011, 2011, 2011, 2011, 201 ...
order_date_yearstart:     uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_yearend:       uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_leapyear:      uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_half:          int64             [1, 1, 1, 1, 1, 1, 1, 1, 1,  ...
order_date_quarter:       int32             [1, 1, 1, 1, 1, 1, 1, 1, 1,  ...
order_date_quarteryear:   object            ['2011Q1', '2011Q1', '2011Q1 ...
order_date_quarterstart:  uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_quarterend:    uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_month:         int32             [1, 1, 1, 1, 1, 1, 1, 1, 1,  ...
order_date_month_lbl:     object            ['January', 'January', 'Janu ...
order_date_monthstart:    uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_monthend:      uint8             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_yweek:         UInt32            [1, 1, 2, 2, 2, 2, 2, 2, 2,  ...
order_date_mweek:         int32             [1, 1, 2, 2, 2, 2, 2, 2, 2,  ...
order_date_wday:          int32             [5, 5, 1, 1, 1, 1, 1, 1, 1,  ...
order_date_wday_lbl:      object            ['Friday', 'Friday', 'Monday ...
order_date_mday:          int32             [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_qday:          int64             [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_yday:          int32             [7, 7, 10, 10, 10, 10, 10, 1 ...
order_date_weekend:       int64             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_hour:          int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_minute:        int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_second:        int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_msecond:       int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_nsecond:       int32             [0, 0, 0, 0, 0, 0, 0, 0, 0,  ...
order_date_am_pm:         object            ['am', 'am', 'am', 'am', 'am ...
# Polars DataFrame using the tk accessor
import polars as pl


pl_df = pl.from_pandas(df)

pl_df.tk.augment_timeseries_signature(date_column='order_date')
shape: (2_466, 42)
order_id order_line order_date quantity price total_price model category_1 category_2 frame_material bikeshop_name city state order_date_index_num order_date_year order_date_year_iso order_date_yearstart order_date_yearend order_date_leapyear order_date_half order_date_quarter order_date_quarteryear order_date_quarterstart order_date_quarterend order_date_month order_date_month_lbl order_date_monthstart order_date_monthend order_date_yweek order_date_mweek order_date_wday order_date_wday_lbl order_date_mday order_date_qday order_date_yday order_date_weekend order_date_hour order_date_minute order_date_second order_date_msecond order_date_nsecond order_date_am_pm
i64 i64 datetime[ns] i64 i64 i64 str str str str str str str i64 i32 u32 u8 u8 u8 i64 i32 str u8 u8 i32 str u8 u8 u32 i32 i32 str i32 i64 i32 i64 i32 i32 i32 i32 i32 str
1 1 2011-01-07 00:00:00 1 6070 6070 "Jekyll Carbon 2" "Mountain" "Over Mountain" "Carbon" "Ithaca Mountain Climbers" "Ithaca" "NY" 1294358400 2011 2011 0 0 0 1 1 "2011Q1" 0 0 1 "January" 0 0 1 1 5 "Friday" 7 7 7 0 0 0 0 0 0 "am"
1 2 2011-01-07 00:00:00 1 5970 5970 "Trigger Carbon 2" "Mountain" "Over Mountain" "Carbon" "Ithaca Mountain Climbers" "Ithaca" "NY" 1294358400 2011 2011 0 0 0 1 1 "2011Q1" 0 0 1 "January" 0 0 1 1 5 "Friday" 7 7 7 0 0 0 0 0 0 "am"
2 1 2011-01-10 00:00:00 1 2770 2770 "Beast of the East 1" "Mountain" "Trail" "Aluminum" "Kansas City 29ers" "Kansas City" "KS" 1294617600 2011 2011 0 0 0 1 1 "2011Q1" 0 0 1 "January" 0 0 2 2 1 "Monday" 10 10 10 0 0 0 0 0 0 "am"
2 2 2011-01-10 00:00:00 1 5970 5970 "Trigger Carbon 2" "Mountain" "Over Mountain" "Carbon" "Kansas City 29ers" "Kansas City" "KS" 1294617600 2011 2011 0 0 0 1 1 "2011Q1" 0 0 1 "January" 0 0 2 2 1 "Monday" 10 10 10 0 0 0 0 0 0 "am"
3 1 2011-01-10 00:00:00 1 10660 10660 "Supersix Evo Hi-Mod Team" "Road" "Elite Road" "Carbon" "Louisville Race Equipment" "Louisville" "KY" 1294617600 2011 2011 0 0 0 1 1 "2011Q1" 0 0 1 "January" 0 0 2 2 1 "Monday" 10 10 10 0 0 0 0 0 0 "am"
321 3 2011-12-22 00:00:00 1 1410 1410 "CAAD8 105" "Road" "Elite Road" "Aluminum" "Miami Race Equipment" "Miami" "FL" 1324512000 2011 2011 0 0 0 2 4 "2011Q4" 0 0 12 "December" 0 0 51 4 4 "Thursday" 22 83 356 0 0 0 0 0 0 "am"
322 1 2011-12-28 00:00:00 1 1250 1250 "Synapse Disc Tiagra" "Road" "Endurance Road" "Aluminum" "Phoenix Bi-peds" "Phoenix" "AZ" 1325030400 2011 2011 0 0 0 2 4 "2011Q4" 0 0 12 "December" 0 0 52 4 3 "Wednesday" 28 89 362 0 0 0 0 0 0 "am"
322 2 2011-12-28 00:00:00 1 2660 2660 "Bad Habit 2" "Mountain" "Trail" "Aluminum" "Phoenix Bi-peds" "Phoenix" "AZ" 1325030400 2011 2011 0 0 0 2 4 "2011Q4" 0 0 12 "December" 0 0 52 4 3 "Wednesday" 28 89 362 0 0 0 0 0 0 "am"
322 3 2011-12-28 00:00:00 1 2340 2340 "F-Si 1" "Mountain" "Cross Country Race" "Aluminum" "Phoenix Bi-peds" "Phoenix" "AZ" 1325030400 2011 2011 0 0 0 2 4 "2011Q4" 0 0 12 "December" 0 0 52 4 3 "Wednesday" 28 89 362 0 0 0 0 0 0 "am"
322 4 2011-12-28 00:00:00 1 5860 5860 "Synapse Hi-Mod Dura Ace" "Road" "Endurance Road" "Carbon" "Phoenix Bi-peds" "Phoenix" "AZ" 1325030400 2011 2011 0 0 0 2 4 "2011Q4" 0 0 12 "December" 0 0 52 4 3 "Wednesday" 28 89 362 0 0 0 0 0 0 "am"