apply_by_time

apply_by_time(data, date_column, freq='D', wide_format=False, fillna=0, reduce_memory=False, **named_funcs)

Apply for time series.

Parameters

Name Type Description Default
data Union[pd.DataFrame, pd.core.groupby.generic.DataFrameGroupBy] The data parameter can be either a pandas DataFrame or a pandas DataFrameGroupBy object. It represents the data on which the apply operation will be performed. required
date_column str The name of the column in the DataFrame that contains the dates. required
freq str The freq parameter specifies the frequency at which the data should be resampled. It accepts a string representing a time frequency, such as β€œD” for daily, β€œW” for weekly, β€œM” for monthly, etc. The default value is β€œD”, which means the data will be resampled on a daily basis. Some common frequency aliases include: - S: secondly frequency - min: minute frequency - H: hourly frequency - D: daily frequency - W: weekly frequency - M: month end frequency - MS: month start frequency - Q: quarter end frequency - QS: quarter start frequency - Y: year end frequency - YS: year start frequency 'D'
wide_format bool The wide_format parameter is a boolean flag that determines whether the output should be in wide format or not. If wide_format is set to True, the output will have a multi-index column structure, where the first level represents the original columns and the second level represents the group names. False
fillna int The fillna parameter is used to specify the value that will be used to fill missing values in the resulting DataFrame. By default, it is set to 0. 0
reduce_memory bool The reduce_memory parameter is used to specify whether to reduce the memory usage of the DataFrame by converting int, float to smaller bytes and str to categorical data. This reduces memory for large data but may impact resolution of float and will change str to categorical. Default is True. False
**named_funcs The **named_funcs parameter is used to specify one or more custom aggregation functions to apply to the data. It accepts named functions in the format: python name = lambda df: df['column1'].corr(df['column2']]) Where name is the name of the function and df is the DataFrame that will be passed to the function. The function must return a single value. {}

Returns

Type Description
pd.DataFrame The function apply_by_time returns a pandas DataFrame object.

Examples

import pytimetk as tk
import pandas as pd
    
df = tk.load_dataset('bike_sales_sample', parse_dates = ['order_date'])
    
df.glimpse()
<class 'pandas.core.frame.DataFrame'>: 2466 rows of 13 columns
order_id:        int64             [1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 5,  ...
order_line:      int64             [1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 1, 2,  ...
order_date:      datetime64[ns]    [Timestamp('2011-01-07 00:00:00'), Ti ...
quantity:        int64             [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,  ...
price:           int64             [6070, 5970, 2770, 5970, 10660, 3200, ...
total_price:     int64             [6070, 5970, 2770, 5970, 10660, 3200, ...
model:           object            ['Jekyll Carbon 2', 'Trigger Carbon 2 ...
category_1:      object            ['Mountain', 'Mountain', 'Mountain',  ...
category_2:      object            ['Over Mountain', 'Over Mountain', 'T ...
frame_material:  object            ['Carbon', 'Carbon', 'Aluminum', 'Car ...
bikeshop_name:   object            ['Ithaca Mountain Climbers', 'Ithaca  ...
city:            object            ['Ithaca', 'Ithaca', 'Kansas City', ' ...
state:           object            ['NY', 'NY', 'KS', 'KS', 'KY', 'KY',  ...
# Apply by time with a DataFrame object
# Allows access to multiple columns at once
( 
    df[['order_date', 'price', 'quantity']] 
        .apply_by_time(
            
            # Named apply functions
            price_quantity_sum = lambda df: (df['price'] * df['quantity']).sum(),
            price_quantity_mean = lambda df: (df['price'] * df['quantity']).mean(),
            
            # Parameters
            date_column  = 'order_date', 
            freq         = "MS",
            
        )
)
order_date price_quantity_sum price_quantity_mean
0 2011-01-01 483015.0 4600.142857
1 2011-02-01 1162075.0 4611.408730
2 2011-03-01 659975.0 5196.653543
3 2011-04-01 1827140.0 4533.846154
4 2011-05-01 844170.0 4097.912621
5 2011-06-01 1413445.0 4544.839228
6 2011-07-01 1194430.0 4976.791667
7 2011-08-01 679790.0 4961.970803
8 2011-09-01 814720.0 4682.298851
9 2011-10-01 734920.0 3930.053476
10 2011-11-01 1006085.0 4768.175355
11 2011-12-01 473120.0 4186.902655
# Apply by time with a GroupBy object
( 
    df[['category_1', 'order_date', 'price', 'quantity']] 
        .groupby('category_1')
        .apply_by_time(
            
            # Named functions
            price_quantity_sum = lambda df: (df['price'] * df['quantity']).sum(),
            price_quantity_mean = lambda df: (df['price'] * df['quantity']).mean(),
            
            # Parameters
            date_column  = 'order_date', 
            freq         = "MS",
            
        )
)
category_1 order_date price_quantity_sum price_quantity_mean
0 Mountain 2011-01-01 221490.0 4922.000000
1 Mountain 2011-02-01 660555.0 4374.536424
2 Mountain 2011-03-01 358855.0 5882.868852
3 Mountain 2011-04-01 1075975.0 4890.795455
4 Mountain 2011-05-01 450440.0 4549.898990
5 Mountain 2011-06-01 723040.0 5021.111111
6 Mountain 2011-07-01 767740.0 5444.964539
7 Mountain 2011-08-01 361255.0 5734.206349
8 Mountain 2011-09-01 401125.0 5077.531646
9 Mountain 2011-10-01 377335.0 4439.235294
10 Mountain 2011-11-01 549345.0 5282.163462
11 Mountain 2011-12-01 276055.0 5208.584906
12 Road 2011-01-01 261525.0 4358.750000
13 Road 2011-02-01 501520.0 4965.544554
14 Road 2011-03-01 301120.0 4562.424242
15 Road 2011-04-01 751165.0 4104.726776
16 Road 2011-05-01 393730.0 3679.719626
17 Road 2011-06-01 690405.0 4134.161677
18 Road 2011-07-01 426690.0 4310.000000
19 Road 2011-08-01 318535.0 4304.527027
20 Road 2011-09-01 413595.0 4353.631579
21 Road 2011-10-01 357585.0 3505.735294
22 Road 2011-11-01 456740.0 4268.598131
23 Road 2011-12-01 197065.0 3284.416667
# Return complex objects
( 
    df[['order_date', 'price', 'quantity']] 
        .apply_by_time(
            
            # Named apply functions
            complex_object = lambda df: [df],
            
            # Parameters
            date_column  = 'order_date', 
            freq         = "MS",
            
        )
)
order_date price quantity
0 2011-01-01 [[6070, 5970, 2770, 5970, 10660, 3200, 12790, ... [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,...
1 2011-02-01 [[8200, 7990, 3200, 4800, 3200, 2130, 1030, 37... [[1, 4, 1, 1, 1, 2, 1, 1, 1, 3, 1, 2, 1, 1, 2,...
2 2011-03-01 [[2660, 3200, 3200, 815, 8200, 9060, 815, 2130... [[1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1,...
3 2011-04-01 [[5330, 4500, 585, 2660, 3200, 2770, 1030, 234... [[1, 1, 1, 3, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 7,...
4 2011-05-01 [[1840, 3200, 7000, 5860, 1030, 3200, 3500, 15... [[1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
5 2011-06-01 [[7990, 4500, 1250, 3730, 1950, 2660, 2340, 19... [[1, 1, 1, 3, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 9,...
6 2011-07-01 [[3200, 2880, 5330, 3200, 585, 5330, 4800, 111... [[2, 3, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 2, 1, 1,...
7 2011-08-01 [[12250, 2130, 7000, 2660, 5860, 3500, 1950, 1... [[2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,...
8 2011-09-01 [[4800, 480, 12790, 6390, 7990, 3500, 3730, 63... [[1, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
9 2011-10-01 [[9060, 12250, 2880, 9060, 4480, 3200, 2340, 2... [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
10 2011-11-01 [[2240, 2660, 3200, 980, 2880, 1750, 2130, 224... [[1, 1, 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 2, 1, 6,...
11 2011-12-01 [[1030, 3200, 870, 1350, 4260, 7460, 2880, 270... [[1, 1, 2, 1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,...