filter_by_time

filter_by_time(
    data,
    date_column,
    start_date='start',
    end_date='end',
    engine='pandas',
)

Filters a DataFrame or GroupBy object based on a specified date range.

This function filters data in a pandas DataFrame or a pandas GroupBy object by a given date range. It supports various date formats and can handle both DataFrame and GroupBy objects.

Parameters

Name Type Description Default
data DataFrame or GroupBy(pandas or polars) The data to be filtered. Supports both pandas and polars DataFrames / GroupBy objects. Grouped inputs are processed per group before the final result is returned. required
date_column str The name of the column in data that contains date information. This column is used for filtering the data based on the date range. required
start_date str The start date of the filtering range. The format of the date can be YYYY, YYYY-MM, YYYY-MM-DD, YYYY-MM-DD HH, YYYY-MM-DD HH:SS, or YYYY-MM-DD HH:MM:SS. Default: β€˜start’, which will filter from the earliest date in the data. 'start'
end_date str The end date of the filtering range. It supports the same formats as start_date. Default: β€˜end’, which will filter until the latest date in the data. 'end'
engine str Computation engine. Use 'pandas', 'polars', or 'cudf'. The special value 'auto' infers the engine from the input data. = 'pandas'

Returns

Name Type Description
DataFrame Data containing rows within the specified date range. The concrete type matches the engine used.

Raises

Name Type Description
ValueError If the provided date strings do not match any of the supported formats.

Notes

  • The function uses pd.to_datetime to convert the start date (e.g. start_date = β€œ2014” becomes β€œ2014-01-01”).
  • The function internally uses the parse_end_date function to convert the end dates (e.g. end_date = β€œ2014” becomes β€œ2014-12-31”).

Examples

import pytimetk as tk
import pandas as pd
import datetime

m4_daily_df = tk.datasets.load_dataset('m4_daily', parse_dates = ['date'])
# Example 1 - Filter by date

df_filtered = tk.filter_by_time(
    data        = m4_daily_df,
    date_column = 'date',
    start_date  = '2014-07-03',
    end_date    = '2014-07-10'
)

df_filtered
id date value
0 D10 2014-07-03 2076.2
1 D10 2014-07-04 2073.4
2 D10 2014-07-05 2048.7
3 D10 2014-07-06 2048.9
4 D10 2014-07-07 2006.4
5 D10 2014-07-08 2017.6
6 D10 2014-07-09 2019.1
7 D10 2014-07-10 2007.4
# Example 2 - Filter by month.
# Note: This will filter by the first day of the month.

df_filtered = tk.filter_by_time(
    data        = m4_daily_df,
    date_column = 'date',
    start_date  = '2014-07',
    end_date    = '2014-09'
)

df_filtered
id date value
0 D10 2014-07-03 2076.2
1 D10 2014-07-04 2073.4
2 D10 2014-07-05 2048.7
3 D10 2014-07-06 2048.9
4 D10 2014-07-07 2006.4
... ... ... ...
85 D10 2014-09-26 1987.9
86 D10 2014-09-27 1999.8
87 D10 2014-09-28 2000.2
88 D10 2014-09-29 1996.4
89 D10 2014-09-30 2023.5

90 rows Γ— 3 columns

# Example 3 - Filter by year.
# Note: This will filter by the first day of the year.

df_filtered = tk.filter_by_time(
    data        = m4_daily_df,
    date_column = 'date',
    start_date  = '2014',
    end_date    = '2014'
)

df_filtered
id date value
0 D10 2014-07-03 2076.2
1 D10 2014-07-04 2073.4
2 D10 2014-07-05 2048.7
3 D10 2014-07-06 2048.9
4 D10 2014-07-07 2006.4
... ... ... ...
177 D10 2014-12-27 2270.1
178 D10 2014-12-28 2322.0
179 D10 2014-12-29 2327.3
180 D10 2014-12-30 2344.9
181 D10 2014-12-31 2327.8

182 rows Γ— 3 columns

# Example 4 - Filter by day/hour/minute/second
# Here we'll use an hourly dataset, however this will also work for minute/second data

# Load data and format date column appropriately
m4_hourly_df = tk.datasets.load_dataset('m4_hourly', parse_dates = ['date'])

df_filtered = tk.filter_by_time(
    data        = m4_hourly_df,
    date_column = "date",
    start_date  = '2015-07-01 12:00:00',
    end_date    = '2015-07-01 20:00:00'
)

df_filtered
id date value
0 H10 2015-07-01 12:00:00+00:00 513
1 H10 2015-07-01 13:00:00+00:00 512
2 H10 2015-07-01 14:00:00+00:00 506
3 H10 2015-07-01 15:00:00+00:00 500
4 H10 2015-07-01 16:00:00+00:00 490
5 H10 2015-07-01 17:00:00+00:00 484
6 H10 2015-07-01 18:00:00+00:00 467
7 H10 2015-07-01 19:00:00+00:00 446
8 H10 2015-07-01 20:00:00+00:00 434
700 H50 2015-07-01 12:00:00+00:00 39325
701 H50 2015-07-01 13:00:00+00:00 38153
702 H50 2015-07-01 14:00:00+00:00 36829
703 H50 2015-07-01 15:00:00+00:00 35878
704 H50 2015-07-01 16:00:00+00:00 33626
705 H50 2015-07-01 17:00:00+00:00 31014
706 H50 2015-07-01 18:00:00+00:00 28891
707 H50 2015-07-01 19:00:00+00:00 27413
708 H50 2015-07-01 20:00:00+00:00 26291
# Example 5 - Combine year/month/day/hour/minute/second filters
df_filtered = tk.filter_by_time(
    data        = m4_hourly_df,
    date_column = "date",
    start_date  = '2015-07-01',
    end_date    = '2015-07-29'
)

df_filtered
id date value
0 H10 2015-07-01 12:00:00+00:00 513
1 H10 2015-07-01 13:00:00+00:00 512
2 H10 2015-07-01 14:00:00+00:00 506
3 H10 2015-07-01 15:00:00+00:00 500
4 H10 2015-07-01 16:00:00+00:00 490
... ... ... ...
1379 H50 2015-07-29 19:00:00+00:00 30167
1380 H50 2015-07-29 20:00:00+00:00 28894
1381 H50 2015-07-29 21:00:00+00:00 27949
1382 H50 2015-07-29 22:00:00+00:00 27507
1383 H50 2015-07-29 23:00:00+00:00 28020

1368 rows Γ— 3 columns

# Example 7 - Filter using the polars engine and tk accessor
import polars as pl


pl_df = pl.from_pandas(m4_daily_df)

df_filtered = (
    pl_df
        .tk.filter_by_time(
            date_column = 'date',
            start_date  = '2014-07-03',
            end_date    = '2014-07-10'
        )
)

df_filtered
shape: (8, 3)
id date value
str datetime[ns] f64
"D10" 2014-07-03 00:00:00 2076.2
"D10" 2014-07-04 00:00:00 2073.4
"D10" 2014-07-05 00:00:00 2048.7
"D10" 2014-07-06 00:00:00 2048.9
"D10" 2014-07-07 00:00:00 2006.4
"D10" 2014-07-08 00:00:00 2017.6
"D10" 2014-07-09 00:00:00 2019.1
"D10" 2014-07-10 00:00:00 2007.4
# Example 6 - Filter a GroupBy object

df_filtered = (
    m4_hourly_df
        .groupby('id')
        .filter_by_time(
            date_column = "date",
            start_date  = '2015-07-01 12:00:00',
            end_date    = '2015-07-01 20:00:00'
        )
)

df_filtered
id date value
0 H10 2015-07-01 12:00:00+00:00 513
1 H10 2015-07-01 13:00:00+00:00 512
2 H10 2015-07-01 14:00:00+00:00 506
3 H10 2015-07-01 15:00:00+00:00 500
4 H10 2015-07-01 16:00:00+00:00 490
5 H10 2015-07-01 17:00:00+00:00 484
6 H10 2015-07-01 18:00:00+00:00 467
7 H10 2015-07-01 19:00:00+00:00 446
8 H10 2015-07-01 20:00:00+00:00 434
700 H50 2015-07-01 12:00:00+00:00 39325
701 H50 2015-07-01 13:00:00+00:00 38153
702 H50 2015-07-01 14:00:00+00:00 36829
703 H50 2015-07-01 15:00:00+00:00 35878
704 H50 2015-07-01 16:00:00+00:00 33626
705 H50 2015-07-01 17:00:00+00:00 31014
706 H50 2015-07-01 18:00:00+00:00 28891
707 H50 2015-07-01 19:00:00+00:00 27413
708 H50 2015-07-01 20:00:00+00:00 26291