PyTimeTK

Time series easier, faster, more fun. Pytimetk.

PyTimetk’s Mission: To make time series analysis easier, faster, and more enjoyable in Python.

Please ⭐ us on GitHub (it takes 2-seconds and means a lot).

1 Introducing pytimetk: Simplifying Time Series Analysis for Everyone

Time series analysis is fundamental in many fields, from business forecasting to scientific research. While the Python ecosystem offers tools like pandas, they sometimes can be verbose and not optimized for all operations, especially for complex time-based aggregations and visualizations.

Enter pytimetk. Crafted with a blend of ease-of-use and computational efficiency, pytimetk significantly simplifies the process of time series manipulation and visualization. By leveraging the polars backend, you can experience speed improvements ranging from 3X to a whopping 3500X. Let’s dive into a comparative analysis.

Features/Properties	pytimetk	pandas (+matplotlib)
Speed	🚀 3X to 500X Faster	🐢 Standard
Code Simplicity	🎉 Concise, readable syntax	📜 Often verbose
`plot_timeseries()`	🎨 2 lines, no customization	🎨 16 lines, customization needed
`summarize_by_time()`	🕐 2 lines, 13.4X faster	🕐 6 lines, 2 for-loops
`pad_by_time()`	⛳ 2 lines, fills gaps in timeseries	❌ No equivalent
`anomalize()`	📈 2 lines, detects and corrects anomalies	❌ No equivalent
`augment_timeseries_signature()`	📅 1 line, all calendar features	🕐 30 lines of `dt` extractors
`augment_rolling()`	🏎️ 10X to 3500X faster	🐢 Slow Rolling Operations

As evident from the table, pytimetk is not just about speed; it also simplifies your codebase. For example, summarize_by_time(), converts a 6-line, double for-loop routine in pandas into a concise 2-line operation. And with the polars engine, get results 13.4X faster than pandas!

Similarly, plot_timeseries() dramatically streamlines the plotting process, encapsulating what would typically require 16 lines of matplotlib code into a mere 2-line command in pytimetk, without sacrificing customization or quality. And with plotly and plotnine engines, you can create interactive plots and beautiful static visualizations with just a few lines of code.

For calendar features, pytimetk offers augment_timeseries_signature() which cuts down on over 30 lines of pandas dt extractions. For rolling features, pytimetk offers augment_rolling(), which is 10X to 3500X faster than pandas. It also offers pad_by_time() to fill gaps in your time series data, and anomalize() to detect and correct anomalies in your time series data.

Join the revolution in time series analysis. Reduce your code complexity, increase your productivity, and harness the speed that pytimetk brings to your workflows.

Explore more at our pytimetk homepage.

2 🚀 Installation

Install the Latest Stable Version:

pip install pytimetk

Alternatively, install the Development GitHub Version:

pip install --upgrade --force-reinstall git+https://github.com/business-science/pytimetk.git

3 🏁 Quick Start: A Monthly Sales Analysis

This is a simple exercise to showcase the power of summarize_by_time():

Import Libraries & Data

First, import pytimetk as tk. This gets you access to the most important functions. Use tk.load_dataset() to load the “bike_sales_sample” dataset.

About the Bike Sales Sample Dataset

This dataset contains “orderlines” for orders recieved. The order_date column contains timestamps. We can use this column to peform sales aggregations (e.g. total revenue).

import pytimetk as tk
import pandas as pd

df = tk.load_dataset('bike_sales_sample')
df['order_date'] = pd.to_datetime(df['order_date'])

df

	order_id	order_line	order_date	quantity	price	total_price	model	category_1	category_2	frame_material	bikeshop_name	city	state
0	1	1	2011-01-07	1	6070	6070	Jekyll Carbon 2	Mountain	Over Mountain	Carbon	Ithaca Mountain Climbers	Ithaca	NY
1	1	2	2011-01-07	1	5970	5970	Trigger Carbon 2	Mountain	Over Mountain	Carbon	Ithaca Mountain Climbers	Ithaca	NY
2	2	1	2011-01-10	1	2770	2770	Beast of the East 1	Mountain	Trail	Aluminum	Kansas City 29ers	Kansas City	KS
3	2	2	2011-01-10	1	5970	5970	Trigger Carbon 2	Mountain	Over Mountain	Carbon	Kansas City 29ers	Kansas City	KS
4	3	1	2011-01-10	1	10660	10660	Supersix Evo Hi-Mod Team	Road	Elite Road	Carbon	Louisville Race Equipment	Louisville	KY
...	...	...	...	...	...	...	...	...	...	...	...	...	...
2461	321	3	2011-12-22	1	1410	1410	CAAD8 105	Road	Elite Road	Aluminum	Miami Race Equipment	Miami	FL
2462	322	1	2011-12-28	1	1250	1250	Synapse Disc Tiagra	Road	Endurance Road	Aluminum	Phoenix Bi-peds	Phoenix	AZ
2463	322	2	2011-12-28	1	2660	2660	Bad Habit 2	Mountain	Trail	Aluminum	Phoenix Bi-peds	Phoenix	AZ
2464	322	3	2011-12-28	1	2340	2340	F-Si 1	Mountain	Cross Country Race	Aluminum	Phoenix Bi-peds	Phoenix	AZ
2465	322	4	2011-12-28	1	5860	5860	Synapse Hi-Mod Dura Ace	Road	Endurance Road	Carbon	Phoenix Bi-peds	Phoenix	AZ

2466 rows × 13 columns

Using `summarize_by_time()` for a Sales Analysis

Your company might be interested in sales patterns for various categories of bicycles. We can obtain a grouped monthly sales aggregation by category_1 in two lines of code:

First use pandas’s groupby() method to group the DataFrame on category_1
Next, use timetk’s summarize_by_time() method to apply the sum function my month start (“MS”) and use wide_format = 'False' to return the dataframe in a long format (Note long format is the default). The default engine is "pandas". Selecting engine = "polars" allows us to improve the speed of the function.

The result is the total revenue for Mountain and Road bikes by month.

summary_category_1_df = df \
    .groupby("category_1") \
    .summarize_by_time(
        date_column  = 'order_date', 
        value_column = 'total_price',
        freq         = "MS",
        agg_func     = 'sum',
        wide_format  = False,
        engine       = "polars"
    )

# Quickly examine each column
summary_category_1_df.glimpse()

<class 'pandas.core.frame.DataFrame'>: 24 rows of 3 columns
category_1:       object            ['Mountain', 'Mountain', 'Mountain', ...
order_date:       datetime64[ns]    [Timestamp('2011-01-01 00:00:00'), T ...
total_price_sum:  int64             [221490, 660555, 358855, 1075975, 45 ...

Visualizing Sales Patterns

Now available: plot_timeseries().

Plot time series is a quick and easy way to visualize time series and make professional time series plots.

With the data summarized by time, we can visualize with plot_timeseries(). pytimetk functions are groupby() aware meaning they understand if your data is grouped to do things by group. This is useful in time series where we often deal with 100s of time series groups.

The default engine in “plotnine” for static plotting. Setting the engine = "plotly" returns an interactive plot.

summary_category_1_df \
    .groupby('category_1') \
    .plot_timeseries(
        date_column  = 'order_date',
        value_column = 'total_price_sum',
        smooth_frac  = 0.8,
        engine       = "plotly"
    )

4 📚 Documentation

Next step? Learn more with the pytimetk documentation

5 🍻 Contributing

Interested in helping us make this the best Python package for time series analysis? We’d love your help.

Follow these instructions to Contribute.

6 🏆 More Coming Soon…

We are in the early stages of development. But it’s obvious the potential for pytimetk now in Python. 🐍

Please ⭐ us on GitHub (it takes 2-seconds and means a lot).
To make requests, please see our Project Roadmap GH Issue #2. You can make requests there.
Want to contribute? See our contributing guide here.

7 ⭐️ Star History

Please ⭐ us on GitHub (it takes 2 seconds and means a lot).

1 Introducing pytimetk: Simplifying Time Series Analysis for Everyone

2 🚀 Installation

3 🏁 Quick Start: A Monthly Sales Analysis

Import Libraries & Data

Using summarize_by_time() for a Sales Analysis

Visualizing Sales Patterns

4 📚 Documentation

5 🍻 Contributing

6 🏆 More Coming Soon…

7 ⭐️ Star History

Using `summarize_by_time()` for a Sales Analysis