PyTimeTK

Time series easier, faster, more fun. Pytimetk.

PyTimetkโ€™s Mission: To make time series analysis easier, faster, and more enjoyable in Python.

Please โญ us on GitHub (it takes 2-seconds and means a lot).

1 Introducing pytimetk: Simplifying Time Series Analysis for Everyone

Time series analysis is fundamental in many fields, from business forecasting to scientific research. While the Python ecosystem offers tools like pandas, they sometimes can be verbose and not optimized for all operations, especially for complex time-based aggregations and visualizations.

Enter pytimetk. Crafted with a blend of ease-of-use and computational efficiency, pytimetk significantly simplifies the process of time series manipulation and visualization. By leveraging the polars backend, you can experience speed improvements ranging from 3X to a whopping 3500X. Letโ€™s dive into a comparative analysis.

Features/Properties pytimetk pandas (+matplotlib)
Speed ๐Ÿš€ 3X to 500X Faster ๐Ÿข Standard
Code Simplicity ๐ŸŽ‰ Concise, readable syntax ๐Ÿ“œ Often verbose
plot_timeseries() ๐ŸŽจ 2 lines, no customization ๐ŸŽจ 16 lines, customization needed
summarize_by_time() ๐Ÿ• 2 lines, 13.4X faster ๐Ÿ• 6 lines, 2 for-loops
pad_by_time() โ›ณ 2 lines, fills gaps in timeseries โŒ No equivalent
anomalize() ๐Ÿ“ˆ 2 lines, detects and corrects anomalies โŒ No equivalent
augment_timeseries_signature() ๐Ÿ“… 1 line, all calendar features ๐Ÿ• 30 lines of dt extractors
augment_rolling() ๐ŸŽ๏ธ 10X to 3500X faster ๐Ÿข Slow Rolling Operations

As evident from the table, pytimetk is not just about speed; it also simplifies your codebase. For example, summarize_by_time(), converts a 6-line, double for-loop routine in pandas into a concise 2-line operation. And with the polars engine, get results 13.4X faster than pandas!

Similarly, plot_timeseries() dramatically streamlines the plotting process, encapsulating what would typically require 16 lines of matplotlib code into a mere 2-line command in pytimetk, without sacrificing customization or quality. And with plotly and plotnine engines, you can create interactive plots and beautiful static visualizations with just a few lines of code.

For calendar features, pytimetk offers augment_timeseries_signature() which cuts down on over 30 lines of pandas dt extractions. For rolling features, pytimetk offers augment_rolling(), which is 10X to 3500X faster than pandas. It also offers pad_by_time() to fill gaps in your time series data, and anomalize() to detect and correct anomalies in your time series data.

Join the revolution in time series analysis. Reduce your code complexity, increase your productivity, and harness the speed that pytimetk brings to your workflows.

Explore more at our pytimetk homepage.

2 ๐Ÿš€ Installation

Install the Latest Stable Version:

pip install pytimetk

Alternatively, install the Development GitHub Version:

pip install git+https://github.com/business-science/pytimetk.git

3 ๐Ÿ Quick Start: A Monthly Sales Analysis

This is a simple exercise to showcase the power of summarize_by_time():

Import Libraries & Data

First, import pytimetk as tk. This gets you access to the most important functions. Use tk.load_dataset() to load the โ€œbike_sales_sampleโ€ dataset.

About the Bike Sales Sample Dataset

This dataset contains โ€œorderlinesโ€ for orders recieved. The order_date column contains timestamps. We can use this column to peform sales aggregations (e.g. total revenue).

import pytimetk as tk
import pandas as pd

df = tk.load_dataset('bike_sales_sample')
df['order_date'] = pd.to_datetime(df['order_date'])

df   
order_id order_line order_date quantity price total_price model category_1 category_2 frame_material bikeshop_name city state
0 1 1 2011-01-07 1 6070 6070 Jekyll Carbon 2 Mountain Over Mountain Carbon Ithaca Mountain Climbers Ithaca NY
1 1 2 2011-01-07 1 5970 5970 Trigger Carbon 2 Mountain Over Mountain Carbon Ithaca Mountain Climbers Ithaca NY
2 2 1 2011-01-10 1 2770 2770 Beast of the East 1 Mountain Trail Aluminum Kansas City 29ers Kansas City KS
3 2 2 2011-01-10 1 5970 5970 Trigger Carbon 2 Mountain Over Mountain Carbon Kansas City 29ers Kansas City KS
4 3 1 2011-01-10 1 10660 10660 Supersix Evo Hi-Mod Team Road Elite Road Carbon Louisville Race Equipment Louisville KY
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2461 321 3 2011-12-22 1 1410 1410 CAAD8 105 Road Elite Road Aluminum Miami Race Equipment Miami FL
2462 322 1 2011-12-28 1 1250 1250 Synapse Disc Tiagra Road Endurance Road Aluminum Phoenix Bi-peds Phoenix AZ
2463 322 2 2011-12-28 1 2660 2660 Bad Habit 2 Mountain Trail Aluminum Phoenix Bi-peds Phoenix AZ
2464 322 3 2011-12-28 1 2340 2340 F-Si 1 Mountain Cross Country Race Aluminum Phoenix Bi-peds Phoenix AZ
2465 322 4 2011-12-28 1 5860 5860 Synapse Hi-Mod Dura Ace Road Endurance Road Carbon Phoenix Bi-peds Phoenix AZ

2466 rows ร— 13 columns

Using summarize_by_time() for a Sales Analysis

Your company might be interested in sales patterns for various categories of bicycles. We can obtain a grouped monthly sales aggregation by category_1 in two lines of code:

  1. First use pandasโ€™s groupby() method to group the DataFrame on category_1
  2. Next, use timetkโ€™s summarize_by_time() method to apply the sum function my month start (โ€œMSโ€) and use wide_format = 'False' to return the dataframe in a long format (Note long format is the default). The default engine is "pandas". Selecting engine = "polars" allows us to improve the speed of the function.

The result is the total revenue for Mountain and Road bikes by month.

summary_category_1_df = df \
    .groupby("category_1") \
    .summarize_by_time(
        date_column  = 'order_date', 
        value_column = 'total_price',
        freq         = "MS",
        agg_func     = 'sum',
        wide_format  = False,
        engine       = "polars"
    )

# Quickly examine each column
summary_category_1_df.glimpse()
<class 'pandas.core.frame.DataFrame'>: 24 rows of 3 columns
category_1:       object            ['Mountain', 'Mountain', 'Mountain', ...
order_date:       datetime64[ns]    [Timestamp('2011-01-01 00:00:00'), T ...
total_price_sum:  int64             [221490, 660555, 358855, 1075975, 45 ...

Visualizing Sales Patterns

Now available: plot_timeseries().

Plot time series is a quick and easy way to visualize time series and make professional time series plots.

With the data summarized by time, we can visualize with plot_timeseries(). pytimetk functions are groupby() aware meaning they understand if your data is grouped to do things by group. This is useful in time series where we often deal with 100s of time series groups.

The default engine in โ€œplotnineโ€ for static plotting. Setting the engine = "plotly" returns an interactive plot.

summary_category_1_df \
    .groupby('category_1') \
    .plot_timeseries(
        date_column  = 'order_date',
        value_column = 'total_price_sum',
        smooth_frac  = 0.8,
        engine       = "plotly"
    )

4 ๐Ÿ“š Documentation

Next step? Learn more with the pytimetk documentation

5 ๐Ÿป Contributing

Interested in helping us make this the best Python package for time series analysis? Weโ€™d love your help.

Follow these instructions to Contribute.

6 ๐Ÿ† More Coming Soonโ€ฆ

We are in the early stages of development. But itโ€™s obvious the potential for pytimetk now in Python. ๐Ÿ