Persist expensive feature engineering once, reuse it everywhere. The feature store bundled with pytimetk ≥ 2.0 lets you register reusable transforms, materialise results to disk (or any pyarrow-compatible object store), and reload them in downstream notebooks, jobs, or ML pipelines.
1 Why Use the Feature Store?
Teams building time-series models (forecasting, anomaly detection, policy simulation) often compute the same feature sets—calendar signatures, lag stacks, rolling stats—across notebooks, pipelines, and model retrains.
The feature store lets them register those transforms once, materialize them to disk or a shared URI, and reload them instantly later. That avoids re-running expensive calculations, keeps metadata/versioning consistent, and makes it easy to assemble feature matrices across multiple transforms for downstream modeling.
2 Benefits
Avoid repeated work – cache signatures, lag stacks, rolling stats, and any custom transform.
Share across teams – store artifacts on a shared file system or S3/GCS/Azure using pyarrow.fs.
Track metadata automatically – every build records parameters, row counts, hashes, timestamps, and version info.
Coordinate writers – optional file locks prevent conflicting writes when multiple jobs run the same pipeline.
3 Quickstart (Pandas)
Code
import pandas as pdimport pytimetk as tkfrom pathlib import Pathsales = tk.load_dataset("bike_sales_sample", parse_dates=["order_date"])feature_store_root = Path("feature-store-demo").resolve()store = tk.FeatureStore(root_path=feature_store_root)store.register("sales_signature",lambda df: tk.augment_timeseries_signature( df, date_column="order_date", engine="pandas", ), default_key_columns=("order_id",), description="Calendar features for order history.",)signature = store.build("sales_signature", sales, return_engine="pandas")signature.from_cache, signature.metadata.row_count, signature.metadata.column_count
(True, 2466, 42)
Run the notebook a second time and the store will detect the same data + parameters and serve a cached artifact:
Remove a feature set (optionally keeping the cached artifact):
Code
# Remove the Polars feature set via its accessor store.accessor.store.drop("sales_centered") # deletes metadata and artifact# Drop a pandas feature set but keep the cached artifact for reuse.store.drop("sales_signature", delete_artifact=False)
For remote backends the catalog metadata remains local, while the artifacts are removed from the remote filesystem via pyarrow.fs.