Tidy Selectors & Human-Friendly Periods

Why this guide?

Many pytimetk helpers accept tidy selectors for columns and human-readable durations for periods/frequencies. Mastering these inputs keeps your code expressive and resilient as schemas evolve.

1 Setup

Code

import numpy as np
import pandas as pd
import pytimetk as tk
from pytimetk.utils.selection import contains, starts_with, ends_with

We’ll use the bike sales dataset, trimmed to a few relevant columns.

Code

sales = tk.load_dataset("bike_sales_sample", parse_dates=["order_date"])
sales = sales[
    [
        "order_date",
        "category_1",
        "category_2",
        "total_price",
        "quantity",
        "model",
    ]
]
sales.head()

	order_date	category_1	category_2	total_price	quantity	model
0	2011-01-07	Mountain	Over Mountain	6070	1	Jekyll Carbon 2
1	2011-01-07	Mountain	Over Mountain	5970	1	Trigger Carbon 2
2	2011-01-10	Mountain	Trail	2770	1	Beast of the East 1
3	2011-01-10	Mountain	Over Mountain	5970	1	Trigger Carbon 2
4	2011-01-10	Road	Elite Road	10660	1	Supersix Evo Hi-Mod Team

2 Tidy Selectors Basics

Selectors are callables (or patterns) that resolve to concrete column names at runtime. They work anywhere you see ColumnSelector in the docs—plot_timeseries, summarize_by_time, augment_*, etc.

2.1 String & list selectors

Passing a string or list behaves exactly like pandas:

Code

sales.groupby("category_1").plot_timeseries(
    date_column="order_date",
    value_column="total_price",
    color_column="category_1",
)

2.2 Helper selectors

Use helpers from pytimetk.utils.selection to match columns dynamically:

Code

sales.groupby("category_1").plot_timeseries(
    date_column="order_date",
    value_column=contains("price", case=False),
    color_column="category_1",
)

Code

wide_stats = (
    sales
    .summarize_by_time(
        date_column="order_date",
        value_column=[contains("price"), ends_with("quantity")],
        freq="MS",
        agg_func=["sum", "mean"],
    )
)
wide_stats.head()

	order_date	total_price_sum	total_price_mean	quantity_sum	quantity_mean
0	2011-01-01	483015	4600.142857	128	1.219048
1	2011-02-01	1162075	4611.408730	331	1.313492
2	2011-03-01	659975	5196.653543	174	1.370079
3	2011-04-01	1827140	4533.846154	542	1.344913
4	2011-05-01	844170	4097.912621	302	1.466019

Under the hood, selectors resolve through tk.resolve_column_selection, so you can even supply regular expressions or custom callables if needed.

3 Human-Friendly Periods & Durations

Frequency-oriented helpers (e.g., pad_by_time, future_frame, plot_time_series_boxplot) accept pandas offsets or natural language strings. pytimetk converts the latter using tk.parse_human_duration.

Code

print(tk.parse_human_duration("45 minutes"))
print(tk.parse_human_duration("3 months"))

0 days 00:45:00
<DateOffset: months=3>

3.1 Using durations in helpers

`pad_by_time`

Ensure a continuous hourly series and fill padded rows with zeros:

Code

sales_hourly = (
    sales.groupby(["category_1", "order_date"], as_index=False)
    .agg(total_price=("total_price", "sum"))
)

hourly = (
    sales_hourly
    .groupby("category_1")
    .pad_by_time(
        date_column="order_date",
        freq="1H",
        fillna=0,
    )
)
hourly.head()

	category_1	order_date	total_price
0	Mountain	2011-01-07 00:00:00	12040.0
1	Mountain	2011-01-07 01:00:00	0.0
2	Mountain	2011-01-07 02:00:00	0.0
3	Mountain	2011-01-07 03:00:00	0.0
4	Mountain	2011-01-07 04:00:00	0.0

`future_frame`

Generate 60 additional days while keeping the output separate from the historical data:

Code

future_dates = tk.future_frame(
    data=sales,
    date_column="order_date",
    length_out=60,   # number of new rows
    freq="1D",       # human-friendly specs like "30 minutes" also work
    bind_data=False,
)
future_dates.tail()

	order_date
55	2012-02-22
56	2012-02-23
57	2012-02-24
58	2012-02-25
59	2012-02-26

`plot_time_series_boxplot`

Mix selectors and durations to build rolling distributions over arbitrary periods:

Code

box_fig = sales.groupby('category_1').plot_time_series_boxplot(
    date_column="order_date",
    value_column=contains("price"),
    period="6 weeks",
    color_column="category_1",
    smooth=False,
    plotly_dropdown=True,
)
box_fig

4 Takeaways

Selectors keep your code schema-aware—no need to rename dozens of columns manually.
Human durations reduce friction when you’re thinking in business cadence (“2 weeks”, “6 hours”) instead of frequency codes.
Both patterns work across pandas, Polars, and cudf engines as long as the helper documents support for ColumnSelector or duration inputs.

5 Next Steps

Revisit Guide 01 – Visualization and refactor examples to use selectors where appropriate.
Explore the API reference and search for ColumnSelector or “duration” to discover which helpers accept these flexible inputs.