1 Quick Start: A Monthly Sales Analysis

This is a simple exercise to showcase the power of our 2 most popular function:

  1. summarize_by_time()
  2. plot_timeseries()

1.1 Import Libraries & Data

First, import pytimetk as tk. This gets you access to the most important functions. Use tk.load_dataset() to load the “bike_sales_sample” dataset.

About the Bike Sales Sample Dataset

This dataset contains “orderlines” for orders recieved. The order_date column contains timestamps. We can use this column to peform sales aggregations (e.g. total revenue).

Code
import pytimetk as tk
import pandas as pd

df = tk.load_dataset('bike_sales_sample')
df['order_date'] = pd.to_datetime(df['order_date'])

df   
order_id order_line order_date quantity price total_price model category_1 category_2 frame_material bikeshop_name city state
0 1 1 2011-01-07 1 6070 6070 Jekyll Carbon 2 Mountain Over Mountain Carbon Ithaca Mountain Climbers Ithaca NY
1 1 2 2011-01-07 1 5970 5970 Trigger Carbon 2 Mountain Over Mountain Carbon Ithaca Mountain Climbers Ithaca NY
2 2 1 2011-01-10 1 2770 2770 Beast of the East 1 Mountain Trail Aluminum Kansas City 29ers Kansas City KS
3 2 2 2011-01-10 1 5970 5970 Trigger Carbon 2 Mountain Over Mountain Carbon Kansas City 29ers Kansas City KS
4 3 1 2011-01-10 1 10660 10660 Supersix Evo Hi-Mod Team Road Elite Road Carbon Louisville Race Equipment Louisville KY
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2461 321 3 2011-12-22 1 1410 1410 CAAD8 105 Road Elite Road Aluminum Miami Race Equipment Miami FL
2462 322 1 2011-12-28 1 1250 1250 Synapse Disc Tiagra Road Endurance Road Aluminum Phoenix Bi-peds Phoenix AZ
2463 322 2 2011-12-28 1 2660 2660 Bad Habit 2 Mountain Trail Aluminum Phoenix Bi-peds Phoenix AZ
2464 322 3 2011-12-28 1 2340 2340 F-Si 1 Mountain Cross Country Race Aluminum Phoenix Bi-peds Phoenix AZ
2465 322 4 2011-12-28 1 5860 5860 Synapse Hi-Mod Dura Ace Road Endurance Road Carbon Phoenix Bi-peds Phoenix AZ

2466 rows × 13 columns

1.2 Using summarize_by_time() for a Sales Analysis

Your company might be interested in sales patterns for various categories of bicycles. We can obtain a grouped monthly sales aggregation by category_1 in two lines of code:

  1. First use pandas’s groupby() method to group the DataFrame on category_1
  2. Next, use timetk’s summarize_by_time() method to apply the sum function my month start (“MS”) and use wide_format = 'False' to return the dataframe in a long format (Note long format is the default).

The result is the total revenue for Mountain and Road bikes by month.

Code
summary_category_1_df = df \
    .groupby("category_1") \
    .summarize_by_time(
        date_column  = 'order_date', 
        value_column = 'total_price',
        freq         = "MS",
        agg_func     = 'sum',
        wide_format  = False
    )

# First 5 rows shown
summary_category_1_df.head()
category_1 order_date total_price
0 Mountain 2011-01-01 221490
1 Mountain 2011-02-01 660555
2 Mountain 2011-03-01 358855
3 Mountain 2011-04-01 1075975
4 Mountain 2011-05-01 450440

1.3 Visualizing Sales Patterns

Now available: plot_timeseries().

Plot time series is a quick and easy way to visualize time series and make professional time series plots.

With the data summarized by time, we can visualize with plot_timeseries(). pytimetk functions are groupby() aware meaning they understand if your data is grouped to do things by group. This is useful in time series where we often deal with 100s of time series groups.

Code
summary_category_1_df \
    .groupby('category_1') \
    .plot_timeseries(
        date_column  = 'order_date',
        value_column = 'total_price',
        smooth_frac  = 0.8
    )

2 Next steps

Check out the Data Visualization Guide Next.

3 More Coming Soon…

We are in the early stages of development. But it’s obvious the potential for pytimetk now in Python. 🐍