Filters a DataFrame or GroupBy object based on a specified date range.
This function filters data in a pandas DataFrame or a pandas GroupBy object by a given date range. It supports various date formats and can handle both DataFrame and GroupBy objects.
Parameters
Name
Type
Description
Default
data
DataFrame or GroupBy(pandas or polars)
The data to be filtered. Supports both pandas and polars DataFrames / GroupBy objects. Grouped inputs are processed per group before the final result is returned.
required
date_column
str
The name of the column in data that contains date information. This column is used for filtering the data based on the date range.
required
start_date
str
The start date of the filtering range. The format of the date can be YYYY, YYYY-MM, YYYY-MM-DD, YYYY-MM-DD HH, YYYY-MM-DD HH:SS, or YYYY-MM-DD HH:MM:SS. Default: βstartβ, which will filter from the earliest date in the data.
'start'
end_date
str
The end date of the filtering range. It supports the same formats as start_date. Default: βendβ, which will filter until the latest date in the data.
'end'
engine
str
Computation engine. Use 'pandas', 'polars', or 'cudf'. The special value 'auto' infers the engine from the input data.
= 'pandas'
Returns
Name
Type
Description
DataFrame
Data containing rows within the specified date range. The concrete type matches the engine used.
Raises
Name
Type
Description
ValueError
If the provided date strings do not match any of the supported formats.
Notes
The function uses pd.to_datetime to convert the start date (e.g. start_date = β2014β becomes β2014-01-01β).
The function internally uses the parse_end_date function to convert the end dates (e.g. end_date = β2014β becomes β2014-12-31β).
Examples
import pytimetk as tkimport pandas as pdimport datetimem4_daily_df = tk.datasets.load_dataset('m4_daily', parse_dates = ['date'])
# Example 1 - Filter by datedf_filtered = tk.filter_by_time( data = m4_daily_df, date_column ='date', start_date ='2014-07-03', end_date ='2014-07-10')df_filtered
id
date
value
0
D10
2014-07-03
2076.2
1
D10
2014-07-04
2073.4
2
D10
2014-07-05
2048.7
3
D10
2014-07-06
2048.9
4
D10
2014-07-07
2006.4
5
D10
2014-07-08
2017.6
6
D10
2014-07-09
2019.1
7
D10
2014-07-10
2007.4
# Example 2 - Filter by month.# Note: This will filter by the first day of the month.df_filtered = tk.filter_by_time( data = m4_daily_df, date_column ='date', start_date ='2014-07', end_date ='2014-09')df_filtered
id
date
value
0
D10
2014-07-03
2076.2
1
D10
2014-07-04
2073.4
2
D10
2014-07-05
2048.7
3
D10
2014-07-06
2048.9
4
D10
2014-07-07
2006.4
...
...
...
...
85
D10
2014-09-26
1987.9
86
D10
2014-09-27
1999.8
87
D10
2014-09-28
2000.2
88
D10
2014-09-29
1996.4
89
D10
2014-09-30
2023.5
90 rows Γ 3 columns
# Example 3 - Filter by year.# Note: This will filter by the first day of the year.df_filtered = tk.filter_by_time( data = m4_daily_df, date_column ='date', start_date ='2014', end_date ='2014')df_filtered
id
date
value
0
D10
2014-07-03
2076.2
1
D10
2014-07-04
2073.4
2
D10
2014-07-05
2048.7
3
D10
2014-07-06
2048.9
4
D10
2014-07-07
2006.4
...
...
...
...
177
D10
2014-12-27
2270.1
178
D10
2014-12-28
2322.0
179
D10
2014-12-29
2327.3
180
D10
2014-12-30
2344.9
181
D10
2014-12-31
2327.8
182 rows Γ 3 columns
# Example 4 - Filter by day/hour/minute/second# Here we'll use an hourly dataset, however this will also work for minute/second data# Load data and format date column appropriatelym4_hourly_df = tk.datasets.load_dataset('m4_hourly', parse_dates = ['date'])df_filtered = tk.filter_by_time( data = m4_hourly_df, date_column ="date", start_date ='2015-07-01 12:00:00', end_date ='2015-07-01 20:00:00')df_filtered
id
date
value
0
H10
2015-07-01 12:00:00+00:00
513
1
H10
2015-07-01 13:00:00+00:00
512
2
H10
2015-07-01 14:00:00+00:00
506
3
H10
2015-07-01 15:00:00+00:00
500
4
H10
2015-07-01 16:00:00+00:00
490
5
H10
2015-07-01 17:00:00+00:00
484
6
H10
2015-07-01 18:00:00+00:00
467
7
H10
2015-07-01 19:00:00+00:00
446
8
H10
2015-07-01 20:00:00+00:00
434
700
H50
2015-07-01 12:00:00+00:00
39325
701
H50
2015-07-01 13:00:00+00:00
38153
702
H50
2015-07-01 14:00:00+00:00
36829
703
H50
2015-07-01 15:00:00+00:00
35878
704
H50
2015-07-01 16:00:00+00:00
33626
705
H50
2015-07-01 17:00:00+00:00
31014
706
H50
2015-07-01 18:00:00+00:00
28891
707
H50
2015-07-01 19:00:00+00:00
27413
708
H50
2015-07-01 20:00:00+00:00
26291
# Example 5 - Combine year/month/day/hour/minute/second filtersdf_filtered = tk.filter_by_time( data = m4_hourly_df, date_column ="date", start_date ='2015-07-01', end_date ='2015-07-29')df_filtered
id
date
value
0
H10
2015-07-01 12:00:00+00:00
513
1
H10
2015-07-01 13:00:00+00:00
512
2
H10
2015-07-01 14:00:00+00:00
506
3
H10
2015-07-01 15:00:00+00:00
500
4
H10
2015-07-01 16:00:00+00:00
490
...
...
...
...
1379
H50
2015-07-29 19:00:00+00:00
30167
1380
H50
2015-07-29 20:00:00+00:00
28894
1381
H50
2015-07-29 21:00:00+00:00
27949
1382
H50
2015-07-29 22:00:00+00:00
27507
1383
H50
2015-07-29 23:00:00+00:00
28020
1368 rows Γ 3 columns
# Example 7 - Filter using the polars engine and tk accessorimport polars as plpl_df = pl.from_pandas(m4_daily_df)df_filtered = ( pl_df .tk.filter_by_time( date_column ='date', start_date ='2014-07-03', end_date ='2014-07-10' ))df_filtered