Tidy Selectors & Human-Friendly Periods

Why this guide?

Many pytimetk helpers accept tidy selectors for columns and human-readable durations for periods/frequencies. Mastering these inputs keeps your code expressive and resilient as schemas evolve.

1 Setup

Code
import numpy as np
import pandas as pd
import pytimetk as tk
from pytimetk.utils.selection import contains, starts_with, ends_with

We’ll use the bike sales dataset, trimmed to a few relevant columns.

Code
sales = tk.load_dataset("bike_sales_sample", parse_dates=["order_date"])
sales = sales[
    [
        "order_date",
        "category_1",
        "category_2",
        "total_price",
        "quantity",
        "model",
    ]
]
sales.head()
order_date category_1 category_2 total_price quantity model
0 2011-01-07 Mountain Over Mountain 6070 1 Jekyll Carbon 2
1 2011-01-07 Mountain Over Mountain 5970 1 Trigger Carbon 2
2 2011-01-10 Mountain Trail 2770 1 Beast of the East 1
3 2011-01-10 Mountain Over Mountain 5970 1 Trigger Carbon 2
4 2011-01-10 Road Elite Road 10660 1 Supersix Evo Hi-Mod Team

2 Tidy Selectors Basics

Selectors are callables (or patterns) that resolve to concrete column names at runtime. They work anywhere you see ColumnSelector in the docs—plot_timeseries, summarize_by_time, augment_*, etc.

2.1 String & list selectors

Passing a string or list behaves exactly like pandas:

Code
sales.groupby("category_1").plot_timeseries(
    date_column="order_date",
    value_column="total_price",
    color_column="category_1",
)

2.2 Helper selectors

Use helpers from pytimetk.utils.selection to match columns dynamically:

Code
sales.groupby("category_1").plot_timeseries(
    date_column="order_date",
    value_column=contains("price", case=False),
    color_column="category_1",
)
Code
wide_stats = (
    sales
    .summarize_by_time(
        date_column="order_date",
        value_column=[contains("price"), ends_with("quantity")],
        freq="MS",
        agg_func=["sum", "mean"],
    )
)
wide_stats.head()
order_date total_price_sum total_price_mean quantity_sum quantity_mean
0 2011-01-01 483015 4600.142857 128 1.219048
1 2011-02-01 1162075 4611.408730 331 1.313492
2 2011-03-01 659975 5196.653543 174 1.370079
3 2011-04-01 1827140 4533.846154 542 1.344913
4 2011-05-01 844170 4097.912621 302 1.466019

Under the hood, selectors resolve through tk.resolve_column_selection, so you can even supply regular expressions or custom callables if needed.

3 Human-Friendly Periods & Durations

Frequency-oriented helpers (e.g., pad_by_time, future_frame, plot_time_series_boxplot) accept pandas offsets or natural language strings. pytimetk converts the latter using tk.parse_human_duration.

Code
print(tk.parse_human_duration("45 minutes"))
print(tk.parse_human_duration("3 months"))
0 days 00:45:00
<DateOffset: months=3>

3.1 Using durations in helpers

pad_by_time

Ensure a continuous hourly series and fill padded rows with zeros:

Code
sales_hourly = (
    sales.groupby(["category_1", "order_date"], as_index=False)
    .agg(total_price=("total_price", "sum"))
)

hourly = (
    sales_hourly
    .groupby("category_1")
    .pad_by_time(
        date_column="order_date",
        freq="1H",
        fillna=0,
    )
)
hourly.head()
category_1 order_date total_price
0 Mountain 2011-01-07 00:00:00 12040.0
1 Mountain 2011-01-07 01:00:00 0.0
2 Mountain 2011-01-07 02:00:00 0.0
3 Mountain 2011-01-07 03:00:00 0.0
4 Mountain 2011-01-07 04:00:00 0.0

future_frame

Generate 60 additional days while keeping the output separate from the historical data:

Code
future_dates = tk.future_frame(
    data=sales,
    date_column="order_date",
    length_out=60,   # number of new rows
    freq="1D",       # human-friendly specs like "30 minutes" also work
    bind_data=False,
)
future_dates.tail()
order_date
55 2012-02-22
56 2012-02-23
57 2012-02-24
58 2012-02-25
59 2012-02-26

plot_time_series_boxplot

Mix selectors and durations to build rolling distributions over arbitrary periods:

Code
box_fig = sales.groupby('category_1').plot_time_series_boxplot(
    date_column="order_date",
    value_column=contains("price"),
    period="6 weeks",
    color_column="category_1",
    smooth=False,
    plotly_dropdown=True,
)
box_fig

4 Takeaways

  • Selectors keep your code schema-aware—no need to rename dozens of columns manually.
  • Human durations reduce friction when you’re thinking in business cadence (“2 weeks”, “6 hours”) instead of frequency codes.
  • Both patterns work across pandas, Polars, and cudf engines as long as the helper documents support for ColumnSelector or duration inputs.

5 Next Steps

  • Revisit Guide 01 – Visualization and refactor examples to use selectors where appropriate.
  • Explore the API reference and search for ColumnSelector or “duration” to discover which helpers accept these flexible inputs.