Filters a DataFrame or GroupBy object based on a specified date range.
This function filters data in a pandas DataFrame or a pandas GroupBy object by a given date range. It supports various date formats and can handle both DataFrame and GroupBy objects.
Parameters
Name
Type
Description
Default
data
pd.DataFrame or pd.core.groupby.generic.DataFrameGroupBy
The data to be filtered. It can be a pandas DataFrame or a pandas GroupBy object.
required
date_column
str
The name of the column in data that contains date information. This column is used for filtering the data based on the date range.
required
start_date
str
The start date of the filtering range. The format of the date can be YYYY, YYYY-MM, YYYY-MM-DD, YYYY-MM-DD HH, YYYY-MM-DD HH:SS, or YYYY-MM-DD HH:MM:SS. Default: βstartβ, which will filter from the earliest date in the data.
'start'
end_date
str
The end date of the filtering range. It supports the same formats as start_date. Default: βendβ, which will filter until the latest date in the data.
'end'
engine
str
The engine to be used for filtering the data. Currently, only βpandasβ.
= 'pandas'
Returns
Type
Description
pd.DataFrame
A pandas DataFrame containing the filtered data within the specified date range.
Raises
Type
Description
ValueError
If the provided date strings do not match any of the supported formats.
Notes
The function uses pd.to_datetime to convert the start date (e.g. start_date = β2014β becomes β2014-01-01β).
The function internally uses the parse_end_date function to convert the end dates (e.g. end_date = β2014β becomes β2014-12-31β).
Examples
import pytimetk as tkimport pandas as pdimport datetimem4_daily_df = tk.datasets.load_dataset('m4_daily', parse_dates = ['date'])
# Example 1 - Filter by datedf_filtered = tk.filter_by_time( data = m4_daily_df, date_column ='date', start_date ='2014-07-03', end_date ='2014-07-10')df_filtered
id
date
value
0
D10
2014-07-03
2076.2
1
D10
2014-07-04
2073.4
2
D10
2014-07-05
2048.7
3
D10
2014-07-06
2048.9
4
D10
2014-07-07
2006.4
5
D10
2014-07-08
2017.6
6
D10
2014-07-09
2019.1
7
D10
2014-07-10
2007.4
# Example 2 - Filter by month.# Note: This will filter by the first day of the month.df_filtered = tk.filter_by_time( data = m4_daily_df, date_column ='date', start_date ='2014-07', end_date ='2014-09')df_filtered
id
date
value
0
D10
2014-07-03
2076.2
1
D10
2014-07-04
2073.4
2
D10
2014-07-05
2048.7
3
D10
2014-07-06
2048.9
4
D10
2014-07-07
2006.4
...
...
...
...
85
D10
2014-09-26
1987.9
86
D10
2014-09-27
1999.8
87
D10
2014-09-28
2000.2
88
D10
2014-09-29
1996.4
89
D10
2014-09-30
2023.5
90 rows Γ 3 columns
# Example 3 - Filter by year.# Note: This will filter by the first day of the year.df_filtered = tk.filter_by_time( data = m4_daily_df, date_column ='date', start_date ='2014', end_date ='2014')df_filtered
id
date
value
0
D10
2014-07-03
2076.2
1
D10
2014-07-04
2073.4
2
D10
2014-07-05
2048.7
3
D10
2014-07-06
2048.9
4
D10
2014-07-07
2006.4
...
...
...
...
177
D10
2014-12-27
2270.1
178
D10
2014-12-28
2322.0
179
D10
2014-12-29
2327.3
180
D10
2014-12-30
2344.9
181
D10
2014-12-31
2327.8
182 rows Γ 3 columns
# Example 4 - Filter by day/hour/minute/second# Here we'll use an hourly dataset, however this will also work for minute/second data# Load data and format date column appropriatelym4_hourly_df = tk.datasets.load_dataset('m4_hourly', parse_dates = ['date'])df_filtered = tk.filter_by_time( data = m4_hourly_df, date_column ="date", start_date ='2015-07-01 12:00:00', end_date ='2015-07-01 20:00:00')df_filtered
id
date
value
0
H10
2015-07-01 12:00:00+00:00
513
1
H10
2015-07-01 13:00:00+00:00
512
2
H10
2015-07-01 14:00:00+00:00
506
3
H10
2015-07-01 15:00:00+00:00
500
4
H10
2015-07-01 16:00:00+00:00
490
5
H10
2015-07-01 17:00:00+00:00
484
6
H10
2015-07-01 18:00:00+00:00
467
7
H10
2015-07-01 19:00:00+00:00
446
8
H10
2015-07-01 20:00:00+00:00
434
700
H50
2015-07-01 12:00:00+00:00
39325
701
H50
2015-07-01 13:00:00+00:00
38153
702
H50
2015-07-01 14:00:00+00:00
36829
703
H50
2015-07-01 15:00:00+00:00
35878
704
H50
2015-07-01 16:00:00+00:00
33626
705
H50
2015-07-01 17:00:00+00:00
31014
706
H50
2015-07-01 18:00:00+00:00
28891
707
H50
2015-07-01 19:00:00+00:00
27413
708
H50
2015-07-01 20:00:00+00:00
26291
# Example 5 - Combine year/month/day/hour/minute/second filtersdf_filtered = tk.filter_by_time( data = m4_hourly_df, date_column ="date", start_date ='2015-07-01', end_date ='2015-07-29')df_filtered