Speeding Up Workflows with Polars

Why Polars?

Polars shines on wide datasets and large groups thanks to its columnar memory model and eager/lazy execution. pytimetk supports Polars both through the .tk accessor and via engine="polars" on many heavy-hitting helpers, so you can keep your existing workflows while getting a speed boost.

1 Setup

Code
import polars as pl
import pytimetk as tk

We’ll use the m4_daily dataset (multiple daily series). Start in pandas, then convert to Polars.

Code
m4_daily_pd = tk.load_dataset("m4_daily", parse_dates=["date"])
m4_daily_pl = pl.from_pandas(m4_daily_pd)

m4_daily_pl
shape: (9_743, 3)
id date value
str datetime[ns] f64
"D10" 2014-07-03 00:00:00 2076.2
"D10" 2014-07-04 00:00:00 2073.4
"D10" 2014-07-05 00:00:00 2048.7
"D10" 2014-07-06 00:00:00 2048.9
"D10" 2014-07-07 00:00:00 2006.4
"D500" 2012-09-19 00:00:00 9418.8
"D500" 2012-09-20 00:00:00 9365.7
"D500" 2012-09-21 00:00:00 9445.9
"D500" 2012-09-22 00:00:00 9497.9
"D500" 2012-09-23 00:00:00 9545.3

2 Plotting Directly from Polars

Every Polars DataFrame gains a .tk accessor once pytimetk is imported. This means you can send Polars data straight into the visual helpers without bouncing back to pandas.

Code
single_series = m4_daily_pl.filter(pl.col("id") == "D10")

single_series.tk.plot_timeseries(
    date_column="date",
    value_column="value",
    title="Polars-powered plot_timeseries()",
)

3 Time-Based Aggregations with the Polars Engine

When you pass engine="polars" the heavy lifting happens in Polars, but the result returns as a pandas frame (so it works with the rest of the ecosystem). This is handy for weekly/monthly summaries across many groups.

Code
weekly_summary = (
    m4_daily_pl
    .group_by("id")
    .tk.summarize_by_time(
        date_column="date",
        value_column="value",
        freq="W",
        agg_func="mean",
        engine="polars",
    )
)

weekly_summary.head()
shape: (5, 3)
id date value
str datetime[ns] f64
"D10" 2014-07-06 00:00:00 2061.8
"D10" 2014-07-13 00:00:00 2005.828571
"D10" 2014-07-20 00:00:00 1981.085714
"D10" 2014-07-27 00:00:00 1895.185714
"D10" 2014-08-03 00:00:00 1924.457143

4 Rolling Features without Leaving Polars

The same pattern applies to rolling window computations. Here we build trailing 7-day mean and standard deviation per series, computed entirely with the Polars backend.

Code
rolling_features = (
    m4_daily_pl
    .group_by("id")
    .tk.augment_rolling(
        date_column="date",
        value_column="value",
        window=7,
        window_func=["mean", "std"],
        engine="polars",
    )
)

rolling_features.head()
shape: (5, 5)
id date value value_rolling_mean_win_7 value_rolling_std_win_7
str datetime[ns] f64 f64 f64
"D10" 2014-07-03 00:00:00 2076.2 null null
"D10" 2014-07-04 00:00:00 2073.4 null null
"D10" 2014-07-05 00:00:00 2048.7 null null
"D10" 2014-07-06 00:00:00 2048.9 null null
"D10" 2014-07-07 00:00:00 2006.4 null null

If you need a pandas DataFrame afterwards, just convert:

Code
rolling_features.to_pandas().head()
id date value value_rolling_mean_win_7 value_rolling_std_win_7
0 D10 2014-07-03 2076.2 NaN NaN
1 D10 2014-07-04 2073.4 NaN NaN
2 D10 2014-07-05 2048.7 NaN NaN
3 D10 2014-07-06 2048.9 NaN NaN
4 D10 2014-07-07 2006.4 NaN NaN

5 Pure Polars Pipelines

You can stay in Polars from end-to-end:

  1. Prep data with pl.DataFrame operations.
  2. Call .tk helpers that support Polars inputs.
  3. Only convert to pandas at the final step if your next tool requires it.

This keeps the data in a columnar format for as long as possible, unlocking better cache usage and multithreading—without rewriting the entire pytimetk API.

6 Next Steps