Finance Analysis

Timetk is designed to work with any time series domain. Arguably the most important is Finance. This tutorial showcases how you can perform Financial Investment and Stock Analysis at scale with pytimetk. This applied tutorial covers financial analysis with:

Load the following packages before proceeding with this tutorial.

Code
import pytimetk as tk
import pandas as pd
import numpy as np

1 3 Core Properties: Financial Data

Financial data from sources like openbb or yfinance come in OHLCV format and typically include an β€œadjusted” price (adjusted for stock splits). This data has the 3 core properties of time series:

  1. Timestamp: daily, hourly frequencies
  2. Value: A price (or returns)
  3. Groups: Stock symbols

Let’s take a look with the tk.glimpse() function.

Code
stocks_df = tk.load_dataset("stocks_daily", parse_dates = ['date'])

stocks_df.glimpse()
<class 'pandas.core.frame.DataFrame'>: 16194 rows of 8 columns
symbol:    object            ['META', 'META', 'META', 'META', 'META', 'M ...
date:      datetime64[ns]    [Timestamp('2013-01-02 00:00:00'), Timestam ...
open:      float64           [27.440000534057617, 27.8799991607666, 28.0 ...
high:      float64           [28.18000030517578, 28.46999931335449, 28.9 ...
low:       float64           [27.420000076293945, 27.59000015258789, 27. ...
close:     float64           [28.0, 27.770000457763672, 28.7600002288818 ...
volume:    int64             [69846400, 63140600, 72715400, 83781800, 45 ...
adjusted:  float64           [28.0, 27.770000457763672, 28.7600002288818 ...

2 Visualizing Financial Data

Visualizing financial data is critical for:

  1. Quick Insights
  2. Enhanced Decision Making
  3. Performance Monitoring
  4. Ease of Reporting

We can visualize financial data over time with tk.plot_timeseries():

  • An interactive plotly plot is returned by default. A static plot can be returned by setting engine = "plotnine".
  • A blue smoother is added by default. The smoother can be removed with smooth = False.
  • Click here to see our Data Visualization Guide
  • Use help(tk.plot_timeseries) to review additional helpful documentation.

An interactive plotly plot is returned by default. Interactive is useful for fast data exploration and for use in web apps (e.g. streamlit, shiny, dash), Click to expand code template.

Code
# plotly engine
stocks_df \
    .groupby('symbol') \
    .plot_timeseries(
        'date', 'adjusted',
        facet_ncol = 2,
        smooth = True,
        smooth_frac = 0.10,
        width = 900,
        height = 700,
        engine = 'plotly',
    )

You can quickly change to a static plot using the plotnine or matplotlib engines. This returns a professional facetted stock chart useful for financial reports. Click to expand code template.

Code
# plotnine engine
stocks_df \
    .groupby('symbol') \
    .plot_timeseries(
        'date', 'adjusted',
        facet_ncol = 2,
        smooth = True,
        smooth_frac = 0.10,
        width = 900,
        height = 700,
        engine = 'plotnine'
    )

<Figure Size: (900 x 700)>

3 Technical Indicators

Technical indicators are mathematical calculations based on the price, volume, or open interest of a security or contract used by traders who follow technical analysis. Technical analysis is a method of forecasting the direction of financial market prices through the study of past market data, primarily price, and volume. Technical indicators are most extensively used in the context of the stock market but are also used in other financial markets like forex, commodities, and cryptocurrencies.

Types of Technical Indicators:

  1. Trend Indicators:
    • Moving Averages: Helps smooth out price data to form a single flowing line, identifying the direction of the trend.
    • Moving Average Convergence Divergence (MACD): Shows the relationship between two moving averages of a security’s price.
    • Average True Range (ATR): Measures market volatility.
  2. Momentum Indicators:
    • Relative Strength Index (RSI): Measures the speed and change of price movements, typically on a scale of 1 to 100.
    • Stochastic Oscillator: Compares a security’s closing price to its price range over a specific period.
  3. Volume Indicators:
    • On-Balance Volume (OBV): Uses volume flow to predict changes in stock price.
    • Accumulation/Distribution Line: Looks at the proximity of closing prices to their highs or lows to determine if accumulation or distribution is occurring in the market.
  4. Volatility Indicators:
    • Bollinger Bands: Consist of a middle band being an N-period simple moving average (SMA), an upper band at K times an N-period standard deviation above the middle band, and a lower band at K times an N-period standard deviation below the middle band.
    • Average True Range (ATR): Provides a measure of a market’s volatility.
  5. Market Strength Indicators:
    • Advance/Decline Line: Represents the number of advancing stocks divided by the number of declining stocks over a given period.
    • Market Breadth: Measures the number of securities that have advanced and declined in a specific market or index, giving traders a feel for the market’s overall mood.

Let’s see a few examples of technical indicators in pytimetk.

3.1 Application: Moving Averages, 10-Day and 50-Day

This code template can be used to make and visualize the 10-day and 50-Day moving average of a group of stock symbols. Click to expand the code.

Code
# Add 2 moving averages (10-day and 50-Day)
sma_df = stocks_df[['symbol', 'date', 'adjusted']] \
    .groupby('symbol') \
    .augment_rolling(
        date_column = 'date',
        value_column = 'adjusted',
        window = [10, 50],
        window_func = ['mean'],
        center = False,
        threads = 1, # Change to -1 to use all available cores
    )

# Visualize 
(sma_df 

    # zoom in on dates
    .query('date >= "2023-01-01"') 

    # Convert to long format
    .melt(
        id_vars = ['symbol', 'date'],
        value_vars = ["adjusted", "adjusted_rolling_mean_win_10", "adjusted_rolling_mean_win_50"]
    ) 

    # Group on symbol and visualize
    .groupby("symbol") 
    .plot_timeseries(
        date_column = 'date',
        value_column = 'value',
        color_column = 'variable',
        smooth = False, 
        facet_ncol = 2,
        width = 900,
        height = 700,
        engine = "plotly"
    )
)
Code
# Add 2 moving averages (10-day and 50-Day)
sma_df = stocks_df[['symbol', 'date', 'adjusted']] \
    .groupby('symbol') \
    .augment_rolling(
        date_column = 'date',
        value_column = 'adjusted',
        window = [10, 50],
        window_func = ['mean'],
        center = False,
        threads = 1, # Change to -1 to use all available cores
    )

# Visualize 
(sma_df 

    # zoom in on dates
    .query('date >= "2023-01-01"') 

    # Convert to long format
    .melt(
        id_vars = ['symbol', 'date'],
        value_vars = ["adjusted", "adjusted_rolling_mean_win_10", "adjusted_rolling_mean_win_50"]
    ) 

    # Group on symbol and visualize
    .groupby("symbol") 
    .plot_timeseries(
        date_column = 'date',
        value_column = 'value',
        color_column = 'variable',
        smooth = False, 
        facet_ncol = 2,
        width = 900,
        height = 700,
        engine = "plotnine"
    )
)

<Figure Size: (900 x 700)>

3.2 Application: Bollinger Bands

Bollinger Bands are a volatility indicator commonly used in financial trading. They consist of three lines:

  1. The middle band, which is a simple moving average (usually over 20 periods).
  2. The upper band, calculated as the middle band plus k times the standard deviation of the price (typically, k=2).
  3. The lower band, calculated as the middle band minus k times the standard deviation of the price.

Here’s how you can calculate and plot Bollinger Bands with pytimetk using this code template (click to expand):

Code
# Bollinger Bands
bollinger_df = stocks_df[['symbol', 'date', 'adjusted']] \
    .groupby('symbol') \
    .augment_rolling(
        date_column = 'date',
        value_column = 'adjusted',
        window = 20,
        window_func = ['mean', 'std'],
        center = False
    ) \
    .assign(
        upper_band = lambda x: x['adjusted_rolling_mean_win_20'] + 2*x['adjusted_rolling_std_win_20'],
        lower_band = lambda x: x['adjusted_rolling_mean_win_20'] - 2*x['adjusted_rolling_std_win_20']
    )


# Visualize
(bollinger_df

    # zoom in on dates
    .query('date >= "2023-01-01"') 

    # Convert to long format
    .melt(
        id_vars = ['symbol', 'date'],
        value_vars = ["adjusted", "adjusted_rolling_mean_win_20", "upper_band", "lower_band"]
    ) 

    # Group on symbol and visualize
    .groupby("symbol") 
    .plot_timeseries(
        date_column = 'date',
        value_column = 'value',
        color_column = 'variable',
        # Adjust colors for Bollinger Bands
        color_palette =["#2C3E50", "#E31A1C", '#18BC9C', '#18BC9C'],
        smooth = False, 
        facet_ncol = 2,
        width = 900,
        height = 700,
        engine = "plotly" 
    )
)
Code
# Bollinger Bands
bollinger_df = stocks_df[['symbol', 'date', 'adjusted']] \
    .groupby('symbol') \
    .augment_rolling(
        date_column = 'date',
        value_column = 'adjusted',
        window = 20,
        window_func = ['mean', 'std'],
        center = False
    ) \
    .assign(
        upper_band = lambda x: x['adjusted_rolling_mean_win_20'] + 2*x['adjusted_rolling_std_win_20'],
        lower_band = lambda x: x['adjusted_rolling_mean_win_20'] - 2*x['adjusted_rolling_std_win_20']
    )


# Visualize
(bollinger_df

    # zoom in on dates
    .query('date >= "2023-01-01"') 

    # Convert to long format
    .melt(
        id_vars = ['symbol', 'date'],
        value_vars = ["adjusted", "adjusted_rolling_mean_win_20", "upper_band", "lower_band"]
    ) 

    # Group on symbol and visualize
    .groupby("symbol") 
    .plot_timeseries(
        date_column = 'date',
        value_column = 'value',
        color_column = 'variable',
        # Adjust colors for Bollinger Bands
        color_palette =["#2C3E50", "#E31A1C", '#18BC9C', '#18BC9C'],
        smooth = False, 
        facet_ncol = 2,
        width = 900,
        height = 700,
        engine = "plotnine"
    )
)

<Figure Size: (900 x 700)>

4 Returns Analysis

In finance, returns analysis involves evaluating the gains or losses made on an investment relative to the amount of money invested. It’s a critical aspect of investment and portfolio management:

  • Performance: Returns analysis determines the performance and the risk-reward profile of financial assets, portfolios, or investment strategies.
  • Informed Decision Making: Returns analysis allows investors, analysts, and portfolio managers to make informed decisions regarding asset allocation, risk management, and investment strategy.

4.1 Returns Analysis By Time

Returns are NOT static (so analyze them by time)
  1. We can use rolling window calculations with tk.augment_rolling() to compute many rolling features at scale such as rolling mean, std, range (spread).
  2. We can expand our tk.augment_rolling_apply() rolling calculations to Rolling Correlation and Rolling Regression (to make comparisons over time)

Application: Descriptive Statistic Analysis

Many traders compute descriptive statistics like mean, median, mode, skewness, kurtosis, and standard deviation to understand the central tendency, spread, and shape of the return distribution.

Step 1: Returns

Use this code to get the pct_change() in wide format. Click expand to get the code.

Code
returns_wide_df = stocks_df[['symbol', 'date', 'adjusted']] \
    .pivot(index = 'date', columns = 'symbol', values = 'adjusted') \
    .pct_change() \
    .reset_index() \
    [1:]

returns_wide_df
symbol date AAPL AMZN GOOG META NFLX NVDA
1 2013-01-03 -0.012622 0.004547 0.000581 -0.008214 0.049777 0.000786
2 2013-01-04 -0.027854 0.002592 0.019760 0.035650 -0.006315 0.032993
3 2013-01-07 -0.005883 0.035925 -0.004363 0.022949 0.033549 -0.028897
4 2013-01-08 0.002691 -0.007748 -0.001974 -0.012237 -0.020565 -0.021926
5 2013-01-09 -0.015629 -0.000113 0.006573 0.052650 -0.012865 -0.022418
... ... ... ... ... ... ... ...
2694 2023-09-15 -0.004154 -0.029920 -0.004964 -0.036603 -0.008864 -0.036879
2695 2023-09-18 0.016913 -0.002920 0.004772 0.007459 -0.006399 0.001503
2696 2023-09-19 0.006181 -0.016788 -0.000936 0.008329 0.004564 -0.010144
2697 2023-09-20 -0.019992 -0.017002 -0.030541 -0.017701 -0.024987 -0.029435
2698 2023-09-21 -0.008889 -0.044053 -0.023999 -0.013148 -0.005566 -0.028931

2698 rows Γ— 7 columns

Step 2: Descriptive Stats

Use this code to get standard statistics with the describe() method. Click expand to get the code.

Code
returns_wide_df.describe()
symbol AAPL AMZN GOOG META NFLX NVDA
count 2698.000000 2698.000000 2698.000000 2698.000000 2698.000000 2698.000000
mean 0.001030 0.001068 0.000885 0.001170 0.001689 0.002229
std 0.018036 0.020621 0.017267 0.024291 0.029683 0.028320
min -0.128647 -0.140494 -0.111008 -0.263901 -0.351166 -0.187559
25% -0.007410 -0.008635 -0.006900 -0.009610 -0.012071 -0.010938
50% 0.000892 0.001050 0.000700 0.001051 0.000544 0.001918
75% 0.010324 0.011363 0.009053 0.012580 0.014678 0.015202
max 0.119808 0.141311 0.160524 0.296115 0.422235 0.298067

Step 3: Correlation

And run a correlation with corr(). Click expand to get the code.

Code
corr_table_df = returns_wide_df.drop('date', axis=1).corr()
corr_table_df
symbol AAPL AMZN GOOG META NFLX NVDA
symbol
AAPL 1.000000 0.497906 0.566452 0.479787 0.321694 0.526508
AMZN 0.497906 1.000000 0.628103 0.544481 0.475078 0.490234
GOOG 0.566452 0.628103 1.000000 0.595728 0.428470 0.531382
META 0.479787 0.544481 0.595728 1.000000 0.407417 0.450586
NFLX 0.321694 0.475078 0.428470 0.407417 1.000000 0.380153
NVDA 0.526508 0.490234 0.531382 0.450586 0.380153 1.000000

The problem is that the stock market is constantly changing. And these descriptive statistics aren’t representative of the most recent fluctuations. This is where pytimetk comes into play with rolling descriptive statistics.

Application: 90-Day Rolling Descriptive Statistics Analysis with tk.augment_rolling()

Let’s compute and visualize the 90-day rolling statistics.

  • Click here to see our Augmenting Guide
  • Use help(tk.augment_rolling) to review additional helpful documentation.

Step 1: Long Format Pt.1

Use this code to get the date melt() into long format. Click expand to get the code.

Code
returns_long_df = returns_wide_df \
    .melt(id_vars='date', value_name='returns') 

returns_long_df
date symbol returns
0 2013-01-03 AAPL -0.012622
1 2013-01-04 AAPL -0.027854
2 2013-01-07 AAPL -0.005883
3 2013-01-08 AAPL 0.002691
4 2013-01-09 AAPL -0.015629
... ... ... ...
16183 2023-09-15 NVDA -0.036879
16184 2023-09-18 NVDA 0.001503
16185 2023-09-19 NVDA -0.010144
16186 2023-09-20 NVDA -0.029435
16187 2023-09-21 NVDA -0.028931

16188 rows Γ— 3 columns

Step 2: Augment Rolling Statistic

Let’s add multiple columns of rolling statistics. Click to expand the code.

Code
rolling_stats_df = returns_long_df \
    .groupby('symbol') \
    .augment_rolling(
        date_column = 'date',
        value_column = 'returns',
        window = [90],
        window_func = [
            'mean', 
            'std', 
            'min',
            ('q25', lambda x: np.quantile(x, 0.25)),
            'median',
            ('q75', lambda x: np.quantile(x, 0.75)),
            'max'
        ],
        threads = 1 # Change to -1 to use all threads
    ) \
    .dropna()

rolling_stats_df
date symbol returns returns_rolling_mean_win_90 returns_rolling_std_win_90 returns_rolling_min_win_90 returns_rolling_q25_win_90 returns_rolling_median_win_90 returns_rolling_q75_win_90 returns_rolling_max_win_90
89 2013-05-13 AAPL 0.003908 -0.001702 0.022233 -0.123558 -0.010533 -0.001776 0.012187 0.041509
90 2013-05-14 AAPL -0.023926 -0.001827 0.022327 -0.123558 -0.010533 -0.001776 0.012187 0.041509
91 2013-05-15 AAPL -0.033817 -0.001894 0.022414 -0.123558 -0.010533 -0.001776 0.012187 0.041509
92 2013-05-16 AAPL 0.013361 -0.001680 0.022467 -0.123558 -0.010533 -0.001360 0.013120 0.041509
93 2013-05-17 AAPL -0.003037 -0.001743 0.022462 -0.123558 -0.010533 -0.001776 0.013120 0.041509
... ... ... ... ... ... ... ... ... ... ...
16183 2023-09-15 NVDA -0.036879 0.005159 0.036070 -0.056767 -0.012587 -0.000457 0.018480 0.243696
16184 2023-09-18 NVDA 0.001503 0.005396 0.035974 -0.056767 -0.011117 0.000177 0.018480 0.243696
16185 2023-09-19 NVDA -0.010144 0.005162 0.036006 -0.056767 -0.011117 -0.000457 0.018480 0.243696
16186 2023-09-20 NVDA -0.029435 0.004953 0.036153 -0.056767 -0.012587 -0.000457 0.018480 0.243696
16187 2023-09-21 NVDA -0.028931 0.004724 0.036303 -0.056767 -0.013166 -0.000457 0.018480 0.243696

15654 rows Γ— 10 columns

Step 3: Long Format Pt.2

Finally, we can .melt() each of the rolling statistics for a Long Format Analysis. Click to expand the code.

Code
rolling_stats_long_df = rolling_stats_df \
    .melt(
        id_vars = ["symbol", "date"],
        var_name = "statistic_type"
    )

rolling_stats_long_df
symbol date statistic_type value
0 AAPL 2013-05-13 returns 0.003908
1 AAPL 2013-05-14 returns -0.023926
2 AAPL 2013-05-15 returns -0.033817
3 AAPL 2013-05-16 returns 0.013361
4 AAPL 2013-05-17 returns -0.003037
... ... ... ... ...
125227 NVDA 2023-09-15 returns_rolling_max_win_90 0.243696
125228 NVDA 2023-09-18 returns_rolling_max_win_90 0.243696
125229 NVDA 2023-09-19 returns_rolling_max_win_90 0.243696
125230 NVDA 2023-09-20 returns_rolling_max_win_90 0.243696
125231 NVDA 2023-09-21 returns_rolling_max_win_90 0.243696

125232 rows Γ— 4 columns

With the data formatted properly we can evaluate the 90-Day Rolling Statistics using .plot_timeseries().

Code
rolling_stats_long_df \
    .groupby(['symbol', 'statistic_type']) \
    .plot_timeseries(
        date_column = 'date',
        value_column = 'value',
        facet_ncol = 6,
        width = 1500,
        height = 1000,
        title = "90-Day Rolling Statistics"
    )
Code
rolling_stats_long_df \
    .groupby(['symbol', 'statistic_type']) \
    .plot_timeseries(
        date_column = 'date',
        value_column = 'value',
        facet_ncol = 6,
        facet_dir = 'v',
        width = 1500,
        height = 1000,
        title = "90-Day Rolling Statistics",
        engine = "plotnine"
    )

<Figure Size: (1500 x 1000)>

5 Rolling Correlation and Regressions with tk.augment_rolling_apply()

One final evaluation is to understand relationships to other stocks and the overall market index over time. For that we can use two techniques:

  1. Rolling Correlations
  2. Rolling Regressions

5.1 About: Rolling Correlation

Rolling correlation calculates the correlation between two time series over a rolling window of a specified size, moving one period at a time. In stock analysis, this is often used to assess:

  • Diversification: Helps in identifying how different stocks move in relation to each other, aiding in the creation of a diversified portfolio.

  • Market Dependency: Measures how a particular stock or sector is correlated with a broader market index.

  • Risk Management: Helps in identifying changes in correlation structures over time which is crucial for risk assessment and management.

For example, if the rolling correlation between two stocks starts increasing, it might suggest that they are being influenced by similar factors or market conditions.

5.2 Application: Rolling Correlation

Let’s revisit the returns wide and long format. We can combine these two using the merge() method.

Step 1: Create the return_combinations_long_df

Perform data wrangling to get the pairwise combinations in long format:

  • We first .merge() to join the long returns with the wide returns by date.
  • We then .melt() to get the wide data into long format.
Code
return_combinations_long_df = returns_long_df \
    .merge(returns_wide_df, how='left', on = 'date') \
    .melt(
        id_vars = ['date', 'symbol', 'returns'],
        var_name = "comp",
        value_name = "returns_comp"
    )
return_combinations_long_df
date symbol returns comp returns_comp
0 2013-01-03 AAPL -0.012622 AAPL -0.012622
1 2013-01-04 AAPL -0.027854 AAPL -0.027854
2 2013-01-07 AAPL -0.005883 AAPL -0.005883
3 2013-01-08 AAPL 0.002691 AAPL 0.002691
4 2013-01-09 AAPL -0.015629 AAPL -0.015629
... ... ... ... ... ...
97123 2023-09-15 NVDA -0.036879 NVDA -0.036879
97124 2023-09-18 NVDA 0.001503 NVDA 0.001503
97125 2023-09-19 NVDA -0.010144 NVDA -0.010144
97126 2023-09-20 NVDA -0.029435 NVDA -0.029435
97127 2023-09-21 NVDA -0.028931 NVDA -0.028931

97128 rows Γ— 5 columns

Step 2: Add Rolling Correlations with tk.augment_rolling_apply()

Next, let’s add rolling correlations.

  • We first .groupby() on the combination of our target assets β€œsymbol” and our comparison asset β€œcomp”.
  • Then we use a different function, tk.augment_rolling_apply().
tk.augment_rolling() vs tk.augment_rolling_apply()
  • For the vast majority of operations, tk.augment_rolling() will suffice. It’s used on a single column where there is a simple rolling transformation applied to only the value_column.
  • For more complex cases where other columns beyond a value_column are needed (e.g. rolling correlations, rolling regressions), the tk.augment_rolling_apply() comes to the rescue.
  • tk.augment_rolling_apply() exposes the group’s columns as a DataFrame to window function, thus allowing for multi-column analysis.
tk.augment_rolling_apply() has no value_column

This is because the rolling apply passes a DataFrame containing all columns to the custom function. The custom function is then responsible for handling the columns internally. This is how you can select multiple columns to work with.

Code
return_corr_df = return_combinations_long_df \
    .groupby(["symbol", "comp"]) \
    .augment_rolling_apply(
        date_column = "date",
        window = 90,
        window_func=[('corr', lambda x: x['returns'].corr(x['returns_comp']))],
        threads = 1, # Change to -1 to use all available cores
    )

return_corr_df
date symbol returns comp returns_comp rolling_corr_win_90
0 2013-01-03 AAPL -0.012622 AAPL -0.012622 NaN
1 2013-01-04 AAPL -0.027854 AAPL -0.027854 NaN
2 2013-01-07 AAPL -0.005883 AAPL -0.005883 NaN
3 2013-01-08 AAPL 0.002691 AAPL 0.002691 NaN
4 2013-01-09 AAPL -0.015629 AAPL -0.015629 NaN
... ... ... ... ... ... ...
97123 2023-09-15 NVDA -0.036879 NVDA -0.036879 1.0
97124 2023-09-18 NVDA 0.001503 NVDA 0.001503 1.0
97125 2023-09-19 NVDA -0.010144 NVDA -0.010144 1.0
97126 2023-09-20 NVDA -0.029435 NVDA -0.029435 1.0
97127 2023-09-21 NVDA -0.028931 NVDA -0.028931 1.0

97128 rows Γ— 6 columns

Step 3: Visualize the Rolling Correlation

We can use tk.plot_timeseries() to visualize the 90-day rolling correlation. It’s interesting to see that stock combinations such as AAPL | AMZN returns have a high positive correlation of 0.80, but this relationship was much lower 0.25 before 2015.

  • The blue smoother can help us detect trends
  • The y_intercept is useful in this case to draw lines at -1, 0, and 1
Code
return_corr_df \
    .dropna() \
    .groupby(['symbol', 'comp']) \
    .plot_timeseries(
        date_column = "date",
        value_column = "rolling_corr_win_90",
        facet_ncol = 6,
        y_intercept = [-1,0,1],
        y_intercept_color = tk.palette_timetk()['steel_blue'],
        width = 1500,
        height = 1000,
        title = "90-Day Rolling Correlation",
        engine = "plotly"
    )
Code
return_corr_df \
    .dropna() \
    .groupby(['symbol', 'comp']) \
    .plot_timeseries(
        date_column = "date",
        value_column = "rolling_corr_win_90",
        facet_ncol = 6,
        y_intercept = [-1,0,1],
        y_intercept_color = tk.palette_timetk()['steel_blue'],
        width = 1500,
        height = 1000,
        title = "90-Day Rolling Correlation",
        engine = "plotnine"
    )

<Figure Size: (1500 x 1000)>

For comparison, we can examine the corr_table_df from the Descriptive Statistics Analysis:

  • Notice that the values tend not to match the most recent trends
  • For example APPL | AMZN is correlated at 0.49 over the entire time period. But more recently this correlation has dropped to 0.17 in the 90-Day Rolling Correlation chart.
Code
corr_table_df
symbol AAPL AMZN GOOG META NFLX NVDA
symbol
AAPL 1.000000 0.497906 0.566452 0.479787 0.321694 0.526508
AMZN 0.497906 1.000000 0.628103 0.544481 0.475078 0.490234
GOOG 0.566452 0.628103 1.000000 0.595728 0.428470 0.531382
META 0.479787 0.544481 0.595728 1.000000 0.407417 0.450586
NFLX 0.321694 0.475078 0.428470 0.407417 1.000000 0.380153
NVDA 0.526508 0.490234 0.531382 0.450586 0.380153 1.000000

5.3 About: Rolling Regression

Rolling regression involves running regression analyses over rolling windows of data points to assess the relationship between a dependent and one or more independent variables. In the context of stock analysis, it can be used to:

  • Beta Estimation: It can be used to estimate the beta of a stock (a measure of market risk) against a market index over different time periods. A higher beta indicates higher market-related risk.

  • Market Timing: It can be useful in identifying changing relationships between stocks and market indicators, helping traders to adjust their positions accordingly.

  • Hedge Ratio Determination: It helps in determining the appropriate hedge ratios for pairs trading or other hedging strategies.

5.4 Application: 90-Day Rolling Regression

This Application Requires Scikit Learn

We need to make a regression function that returns the Slope and Intercept. Scikit Learn has an easy-to-use modeling interface. You may need to pip install scikit-learn to use this applied tutorial.

Step 1: Get Market Returns

For our purposes, we assume the market is the average returns of the 6 technology stocks.

  • We calculate an equal-weight portfolio as the β€œmarket returns”.
  • Then we merge the market returns into the returns long data.
Code
# Assume Market Returns = Equal Weight Portfolio
market_returns_df = returns_wide_df \
    .set_index("date") \
    .assign(returns_market = lambda df: df.sum(axis = 1) * (1 / df.shape[1])) \
    .reset_index() \
    [['date', 'returns_market']]

# Merge with returns long
returns_long_market_df = returns_long_df \
    .merge(market_returns_df, how='left', on='date')

returns_long_market_df
date symbol returns returns_market
0 2013-01-03 AAPL -0.012622 0.005809
1 2013-01-04 AAPL -0.027854 0.009471
2 2013-01-07 AAPL -0.005883 0.008880
3 2013-01-08 AAPL 0.002691 -0.010293
4 2013-01-09 AAPL -0.015629 0.001366
... ... ... ... ...
16183 2023-09-15 NVDA -0.036879 -0.020231
16184 2023-09-18 NVDA 0.001503 0.003555
16185 2023-09-19 NVDA -0.010144 -0.001466
16186 2023-09-20 NVDA -0.029435 -0.023276
16187 2023-09-21 NVDA -0.028931 -0.020764

16188 rows Γ— 4 columns

Step 2: Run a Rolling Regression

Next, run the following code to perform a rolling regression:

  • Use a custom regression function that will return the slope and intercept as a pandas series.
  • Run the rolling regression with tk.augment_rolling_apply().
Code
def regression(df):
    
    # External functions must 
    from sklearn.linear_model import LinearRegression

    model = LinearRegression()
    X = df[['returns_market']]  # Extract X values (independent variables)
    y = df['returns']  # Extract y values (dependent variable)
    model.fit(X, y)
    ret = pd.Series([model.intercept_, model.coef_[0]], index=['Intercept', 'Slope'])
    
    return ret # Return intercept and slope as a Series

return_regression_df = returns_long_market_df \
    .groupby('symbol') \
    .augment_rolling_apply(
        date_column = "date",
        window = 90,
        window_func = [('regression', regression)],
        threads = 1, # Change to -1 to use all available cores 
    ) \
    .dropna()

return_regression_df
date symbol returns returns_market rolling_regression_win_90
89 2013-05-13 AAPL 0.003908 0.007082 Intercept -0.001844 Slope 0.061629 dt...
90 2013-05-14 AAPL -0.023926 0.007583 Intercept -0.001959 Slope 0.056540 dt...
91 2013-05-15 AAPL -0.033817 0.005381 Intercept -0.002036 Slope 0.062330 dt...
92 2013-05-16 AAPL 0.013361 -0.009586 Intercept -0.001789 Slope 0.052348 dt...
93 2013-05-17 AAPL -0.003037 0.009005 Intercept -0.001871 Slope 0.055661 dt...
... ... ... ... ... ...
16183 2023-09-15 NVDA -0.036879 -0.020231 Intercept 0.000100 Slope 1.805479 dt...
16184 2023-09-18 NVDA 0.001503 0.003555 Intercept 0.000207 Slope 1.800813 dt...
16185 2023-09-19 NVDA -0.010144 -0.001466 Intercept 0.000301 Slope 1.817878 dt...
16186 2023-09-20 NVDA -0.029435 -0.023276 Intercept 0.000845 Slope 1.825818 dt...
16187 2023-09-21 NVDA -0.028931 -0.020764 Intercept 0.000901 Slope 1.818710 dt...

15654 rows Γ— 5 columns

Step 3: Extract the Slope Coefficient (Beta)

This is more of a hack than anything to extract the beta (slope) of the rolling regression.

Code
intercept_slope_df = pd.concat(return_regression_df['rolling_regression_win_90'].to_list(), axis=1).T 

intercept_slope_df.index = return_regression_df.index

return_beta_df = pd.concat([return_regression_df, intercept_slope_df], axis=1)

return_beta_df
date symbol returns returns_market rolling_regression_win_90 Intercept Slope
89 2013-05-13 AAPL 0.003908 0.007082 Intercept -0.001844 Slope 0.061629 dt... -0.001844 0.061629
90 2013-05-14 AAPL -0.023926 0.007583 Intercept -0.001959 Slope 0.056540 dt... -0.001959 0.056540
91 2013-05-15 AAPL -0.033817 0.005381 Intercept -0.002036 Slope 0.062330 dt... -0.002036 0.062330
92 2013-05-16 AAPL 0.013361 -0.009586 Intercept -0.001789 Slope 0.052348 dt... -0.001789 0.052348
93 2013-05-17 AAPL -0.003037 0.009005 Intercept -0.001871 Slope 0.055661 dt... -0.001871 0.055661
... ... ... ... ... ... ... ...
16183 2023-09-15 NVDA -0.036879 -0.020231 Intercept 0.000100 Slope 1.805479 dt... 0.000100 1.805479
16184 2023-09-18 NVDA 0.001503 0.003555 Intercept 0.000207 Slope 1.800813 dt... 0.000207 1.800813
16185 2023-09-19 NVDA -0.010144 -0.001466 Intercept 0.000301 Slope 1.817878 dt... 0.000301 1.817878
16186 2023-09-20 NVDA -0.029435 -0.023276 Intercept 0.000845 Slope 1.825818 dt... 0.000845 1.825818
16187 2023-09-21 NVDA -0.028931 -0.020764 Intercept 0.000901 Slope 1.818710 dt... 0.000901 1.818710

15654 rows Γ— 7 columns

Step 4: Visualize the Rolling Beta

Code
return_beta_df \
    .groupby('symbol') \
    .plot_timeseries(
        date_column = "date",
        value_column = "Slope",
        facet_ncol = 2,
        facet_scales = "free_x",
        y_intercept = [0, 3],
        y_intercept_color = tk.palette_timetk()['steel_blue'],
        width = 800,
        height = 600,
        title = "90-Day Rolling Regression",
        engine = "plotly",
    )
Code
return_beta_df \
    .groupby('symbol') \
    .plot_timeseries(
        date_column = "date",
        value_column = "Slope",
        facet_ncol = 2,
        facet_scales = "free_x",
        y_intercept = [0, 3],
        y_intercept_color = tk.palette_timetk()['steel_blue'],
        width = 800,
        height = 600,
        title = "90-Day Rolling Regression",
        engine = "plotnine",
    )

<Figure Size: (800 x 600)>

6 Conclusions

The pytimetk package offers a wide range of versatile time series functions, many of which can help improve Financial, Stock, Portfolio, and Investment Analysis in Python. We examined:

  • tk.plot_timeseries(): Visualizing financial data
  • tk.augment_rolling(): Moving averages
  • tk.augment_rolling_apply(): Rolling correlations and rolling regressions

7 More Coming Soon…

We are in the early stages of development. But it’s obvious the potential for pytimetk now in Python. 🐍