filter_by_time

filter_by_time(data, date_column, start_date='start', end_date='end', engine='pandas')

Filters a DataFrame or GroupBy object based on a specified date range.

This function filters data in a pandas DataFrame or a pandas GroupBy object by a given date range. It supports various date formats and can handle both DataFrame and GroupBy objects.

Parameters

Name Type Description Default
data pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy The data to be filtered. It can be a pandas DataFrame or a pandas GroupBy object. required
date_column str The name of the column in data that contains date information. This column is used for filtering the data based on the date range. required
start_date str The start date of the filtering range. The format of the date can be YYYY, YYYY-MM, YYYY-MM-DD, YYYY-MM-DD HH, YYYY-MM-DD HH:SS, or YYYY-MM-DD HH:MM:SS. Default: β€˜start’, which will filter from the earliest date in the data. 'start'
end_date str The end date of the filtering range. It supports the same formats as start_date. Default: β€˜end’, which will filter until the latest date in the data. 'end'
engine str The engine to be used for filtering the data. Currently, only β€˜pandas’. = 'pandas'

Returns

Type Description
pd.DataFrame A pandas DataFrame containing the filtered data within the specified date range.

Raises

Type Description
ValueError If the provided date strings do not match any of the supported formats.

Notes

  • The function uses pd.to_datetime to convert the start date (e.g. start_date = β€œ2014” becomes β€œ2014-01-01”).
  • The function internally uses the parse_end_date function to convert the end dates (e.g. end_date = β€œ2014” becomes β€œ2014-12-31”).

Examples

import pytimetk as tk
import pandas as pd
import datetime

m4_daily_df = tk.datasets.load_dataset('m4_daily', parse_dates = ['date'])
# Example 1 - Filter by date

df_filtered = tk.filter_by_time(
    data        = m4_daily_df,
    date_column = 'date',
    start_date  = '2014-07-03',
    end_date    = '2014-07-10'
)

df_filtered
id date value
0 D10 2014-07-03 2076.2
1 D10 2014-07-04 2073.4
2 D10 2014-07-05 2048.7
3 D10 2014-07-06 2048.9
4 D10 2014-07-07 2006.4
5 D10 2014-07-08 2017.6
6 D10 2014-07-09 2019.1
7 D10 2014-07-10 2007.4
# Example 2 - Filter by month.
# Note: This will filter by the first day of the month.

df_filtered = tk.filter_by_time(
    data        = m4_daily_df,
    date_column = 'date',
    start_date  = '2014-07',
    end_date    = '2014-09'
)

df_filtered
id date value
0 D10 2014-07-03 2076.2
1 D10 2014-07-04 2073.4
2 D10 2014-07-05 2048.7
3 D10 2014-07-06 2048.9
4 D10 2014-07-07 2006.4
... ... ... ...
85 D10 2014-09-26 1987.9
86 D10 2014-09-27 1999.8
87 D10 2014-09-28 2000.2
88 D10 2014-09-29 1996.4
89 D10 2014-09-30 2023.5

90 rows Γ— 3 columns

# Example 3 - Filter by year.
# Note: This will filter by the first day of the year.

df_filtered = tk.filter_by_time(
    data        = m4_daily_df,
    date_column = 'date',
    start_date  = '2014',
    end_date    = '2014'
)

df_filtered
id date value
0 D10 2014-07-03 2076.2
1 D10 2014-07-04 2073.4
2 D10 2014-07-05 2048.7
3 D10 2014-07-06 2048.9
4 D10 2014-07-07 2006.4
... ... ... ...
177 D10 2014-12-27 2270.1
178 D10 2014-12-28 2322.0
179 D10 2014-12-29 2327.3
180 D10 2014-12-30 2344.9
181 D10 2014-12-31 2327.8

182 rows Γ— 3 columns

# Example 4 - Filter by day/hour/minute/second
# Here we'll use an hourly dataset, however this will also work for minute/second data

# Load data and format date column appropriately
m4_hourly_df = tk.datasets.load_dataset('m4_hourly', parse_dates = ['date'])

df_filtered = tk.filter_by_time(
    data        = m4_hourly_df,
    date_column = "date",
    start_date  = '2015-07-01 12:00:00',
    end_date    = '2015-07-01 20:00:00'
)

df_filtered
id date value
0 H10 2015-07-01 12:00:00+00:00 513
1 H10 2015-07-01 13:00:00+00:00 512
2 H10 2015-07-01 14:00:00+00:00 506
3 H10 2015-07-01 15:00:00+00:00 500
4 H10 2015-07-01 16:00:00+00:00 490
5 H10 2015-07-01 17:00:00+00:00 484
6 H10 2015-07-01 18:00:00+00:00 467
7 H10 2015-07-01 19:00:00+00:00 446
8 H10 2015-07-01 20:00:00+00:00 434
700 H50 2015-07-01 12:00:00+00:00 39325
701 H50 2015-07-01 13:00:00+00:00 38153
702 H50 2015-07-01 14:00:00+00:00 36829
703 H50 2015-07-01 15:00:00+00:00 35878
704 H50 2015-07-01 16:00:00+00:00 33626
705 H50 2015-07-01 17:00:00+00:00 31014
706 H50 2015-07-01 18:00:00+00:00 28891
707 H50 2015-07-01 19:00:00+00:00 27413
708 H50 2015-07-01 20:00:00+00:00 26291
# Example 5 - Combine year/month/day/hour/minute/second filters
df_filtered = tk.filter_by_time(
    data        = m4_hourly_df,
    date_column = "date",
    start_date  = '2015-07-01',
    end_date    = '2015-07-29'
)

df_filtered
id date value
0 H10 2015-07-01 12:00:00+00:00 513
1 H10 2015-07-01 13:00:00+00:00 512
2 H10 2015-07-01 14:00:00+00:00 506
3 H10 2015-07-01 15:00:00+00:00 500
4 H10 2015-07-01 16:00:00+00:00 490
... ... ... ...
1379 H50 2015-07-29 19:00:00+00:00 30167
1380 H50 2015-07-29 20:00:00+00:00 28894
1381 H50 2015-07-29 21:00:00+00:00 27949
1382 H50 2015-07-29 22:00:00+00:00 27507
1383 H50 2015-07-29 23:00:00+00:00 28020

1368 rows Γ— 3 columns

# Example 6 - Filter a GroupBy object

df_filtered = (
    m4_hourly_df
        .groupby('id')
        .filter_by_time(
            date_column = "date",
            start_date  = '2015-07-01 12:00:00',
            end_date    = '2015-07-01 20:00:00'
        )
)

df_filtered
id date value
0 H10 2015-07-01 12:00:00+00:00 513
1 H10 2015-07-01 13:00:00+00:00 512
2 H10 2015-07-01 14:00:00+00:00 506
3 H10 2015-07-01 15:00:00+00:00 500
4 H10 2015-07-01 16:00:00+00:00 490
5 H10 2015-07-01 17:00:00+00:00 484
6 H10 2015-07-01 18:00:00+00:00 467
7 H10 2015-07-01 19:00:00+00:00 446
8 H10 2015-07-01 20:00:00+00:00 434
700 H50 2015-07-01 12:00:00+00:00 39325
701 H50 2015-07-01 13:00:00+00:00 38153
702 H50 2015-07-01 14:00:00+00:00 36829
703 H50 2015-07-01 15:00:00+00:00 35878
704 H50 2015-07-01 16:00:00+00:00 33626
705 H50 2015-07-01 17:00:00+00:00 31014
706 H50 2015-07-01 18:00:00+00:00 28891
707 H50 2015-07-01 19:00:00+00:00 27413
708 H50 2015-07-01 20:00:00+00:00 26291