Ray Parallelism Quickstart

How to take advantage of pytimetk’s Ray-backed helpers for faster time-series workflows.

Why Ray?

Many of pytimetk’s performance-sensitive helpers (e.g., future_frame, ts_features, rolling/expanding utilities) now fan work out via Ray whenever you set threads != 1. Shipping Ray as a core dependency means you already have everything installed—there are just two knobs to remember:

  1. Enable parallelism by passing threads=-1 (all cores) or any value > 1.
  2. Disable parallelism by leaving threads=1 (the default) if you want strictly single-threaded execution.

To keep things predictable, Ray is initialized lazily the first time a helper actually needs it, so the common single-threaded path has zero extra overhead.

Example: ts_features with Ray

The snippet below mirrors the production behavior. Run it from any Python session (no Ray-specific bootstrapping required):

import pandas as pd
import pytimetk as tk
from tsfeatures import acf_features, hurst

# Load a small grouped dataset
df = tk.load_dataset("m4_hourly", parse_dates=["date"])

# Extract a couple of features per id using Ray workers
feature_df = (
    df
        .groupby("id", sort=False)
        .ts_features(
            date_column="date",
            value_column="value",
            features=[acf_features, hurst],
            freq=24,
            threads=-1,          # <-- spin up Ray workers (all cores)
            show_progress=True,
        )
)

print(feature_df.head())
2025-11-07 09:12:21,496 INFO worker.py:2012 -- Started a local Ray instance.
     id     hurst    x_acf1   x_acf10  diff1_acf1  diff1_acf10  diff2_acf1  \
0   H10  0.899455  0.935151  2.857201    0.181541     0.422999   -0.557066   
1  H150  0.464328  0.909548  2.548242    0.316810     0.350988   -0.422249   
2  H410  0.480160  0.803585  1.235837    0.258207     0.222276   -0.216247   
3   H50  0.890642  0.972679  3.225596    0.933112     3.583310    0.433440   

   diff2_acf10  seas_acf1  
0     0.331021   0.869954  
1     0.227795   0.711519  
2     0.219944   0.752024  
3     0.847016   0.900682  

Behind the scenes:

  • The grouped frame is chunked by id.
  • Ray initializes (if it hasn’t already) using the available CPU count.
  • Each chunk runs in parallel, and the results are stitched back together in the original order.

If you need to fall back to single-threaded mode—for example, when debugging inside an environment that restricts background processes—just set threads=1 and the helper will never touch Ray.

Troubleshooting Tips

  • Memory pressure? Use smaller threads values (e.g., threads=2) to limit the number of Ray workers.
  • Jupyter notebooks sometimes keep Ray clusters alive between runs. Call import ray; ray.shutdown() if you need to tear down the cluster manually.
  • Progress bars still work. If you prefer silence, set show_progress=False.