augment_hurst_exponent

augment_hurst_exponent(
    data,
    date_column,
    close_column,
    window=100,
    reduce_memory=False,
    engine='auto',
)

Calculate the Hurst Exponent on a rolling window for a financial time series. Used for detecting trends and mean-reversion.

Parameters

Name Type Description Default
data DataFrame or GroupBy(pandas or polars) Input time-series data. Grouped inputs are processed per group before the exponent is appended. required
date_column str or ColumnSelector Column name or selector containing dates or timestamps. required
close_column str, ColumnSelector, or list Column(s) with closing prices to calculate the Hurst Exponent. Must resolve to a single column. required
window Union[int, Tuple[int, int], List[int]] Size of the rolling window for Hurst Exponent calculation. Accepts int, tuple (start, end), or list. Default is 100. 100
reduce_memory bool If True, reduces memory usage before calculation. Default is False. False
engine (auto, pandas, polars) Execution engine. "auto" (default) infers the backend from the input data while allowing explicit overrides. "auto"

Returns

Name Type Description
DataFrame DataFrame with added columns: - {close_column}hurst{window}: Hurst Exponent for each window size

Notes

The Hurst Exponent measures the long-term memory of a time series:

  • H < 0.5: Mean-reverting behavior
  • H β‰ˆ 0.5: Random walk (no persistence)
  • H > 0.5: Trending or persistent behavior Computed using a simplified R/S analysis over rolling windows.

References:

  • https://en.wikipedia.org/wiki/Hurst_exponent

Examples:

import pandas as pd
import polars as pl
import pytimetk as tk

df = tk.load_dataset("stocks_daily", parse_dates=["date"])

df
symbol date open high low close volume adjusted
0 META 2013-01-02 27.440001 28.180000 27.420000 28.000000 69846400 28.000000
1 META 2013-01-03 27.879999 28.469999 27.590000 27.770000 63140600 27.770000
2 META 2013-01-04 28.010000 28.930000 27.830000 28.760000 72715400 28.760000
3 META 2013-01-07 28.690001 29.790001 28.650000 29.420000 83781800 29.420000
4 META 2013-01-08 29.510000 29.600000 28.860001 29.059999 45871300 29.059999
... ... ... ... ... ... ... ... ...
16189 GOOG 2023-09-15 138.800003 139.360001 137.179993 138.300003 48947600 138.300003
16190 GOOG 2023-09-18 137.630005 139.929993 137.630005 138.960007 16233600 138.960007
16191 GOOG 2023-09-19 138.250000 139.175003 137.500000 138.830002 15479100 138.830002
16192 GOOG 2023-09-20 138.830002 138.839996 134.520004 134.589996 21473500 134.589996
16193 GOOG 2023-09-21 132.389999 133.190002 131.089996 131.360001 22042700 131.360001

16194 rows Γ— 8 columns

# Hurst exponent - single stock (pandas)
hurst_single = (
    df
    .query("symbol == 'AAPL'")
    .augment_hurst_exponent(
        date_column="date",
        close_column="close",
        window=[100, 200],
    )
)

hurst_single.glimpse()
<class 'pandas.core.frame.DataFrame'>: 2699 rows of 10 columns
symbol:           object            ['AAPL', 'AAPL', 'AAPL', 'AAPL', 'AA ...
date:             datetime64[ns]    [Timestamp('2013-01-02 00:00:00'), T ...
open:             float64           [19.779285430908203, 19.567142486572 ...
high:             float64           [19.821428298950195, 19.631071090698 ...
low:              float64           [19.343929290771484, 19.321428298950 ...
close:            float64           [19.608213424682617, 19.360713958740 ...
volume:           int64             [560518000, 352965200, 594333600, 48 ...
adjusted:         float64           [16.791179656982422, 16.579240798950 ...
close_hurst_100:  float64           [nan, nan, nan, nan, nan, nan, nan,  ...
close_hurst_200:  float64           [nan, nan, nan, nan, nan, nan, nan,  ...
# Hurst exponent - grouped pandas engine
hurst_grouped = (
    df
    .groupby("symbol")
    .augment_hurst_exponent(
        date_column="date",
        close_column="close",
        window=100,
    )
)

hurst_grouped.glimpse()
<class 'pandas.core.frame.DataFrame'>: 16194 rows of 9 columns
symbol:           object            ['META', 'META', 'META', 'META', 'ME ...
date:             datetime64[ns]    [Timestamp('2013-01-02 00:00:00'), T ...
open:             float64           [27.440000534057617, 27.879999160766 ...
high:             float64           [28.18000030517578, 28.4699993133544 ...
low:              float64           [27.420000076293945, 27.590000152587 ...
close:            float64           [28.0, 27.770000457763672, 28.760000 ...
volume:           int64             [69846400, 63140600, 72715400, 83781 ...
adjusted:         float64           [28.0, 27.770000457763672, 28.760000 ...
close_hurst_100:  float64           [nan, nan, nan, nan, nan, nan, nan,  ...
# Hurst exponent - polars engine
pl_single = pl.from_pandas(df.query("symbol == 'AAPL'"))
hurst_polars = (
    pl_single
    .tk.augment_hurst_exponent(
        date_column="date",
        close_column="close",
        window=[100, 200],
    )
)

hurst_polars.glimpse()
Rows: 2699
Columns: 10
$ symbol                   <str> 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL'
$ date            <datetime[ns]> 2013-01-02 00:00:00, 2013-01-03 00:00:00, 2013-01-04 00:00:00, 2013-01-07 00:00:00, 2013-01-08 00:00:00, 2013-01-09 00:00:00, 2013-01-10 00:00:00, 2013-01-11 00:00:00, 2013-01-14 00:00:00, 2013-01-15 00:00:00
$ open                     <f64> 19.779285430908203, 19.567142486572266, 19.177499771118164, 18.64285659790039, 18.90035629272461, 18.66071319580078, 18.876785278320312, 18.60714340209961, 17.952856063842773, 17.796428680419922
$ high                     <f64> 19.821428298950195, 19.63107109069824, 19.236785888671875, 18.9035701751709, 18.996070861816406, 18.750356674194336, 18.882856369018555, 18.761428833007812, 18.125, 17.82107162475586
$ low                      <f64> 19.343929290771484, 19.321428298950195, 18.77964210510254, 18.399999618530273, 18.616071701049805, 18.428213119506836, 18.41142845153809, 18.53642845153809, 17.80392837524414, 17.26357078552246
$ close                    <f64> 19.608213424682617, 19.36071395874023, 18.821428298950195, 18.71071434020996, 18.761070251464844, 18.467857360839844, 18.696786880493164, 18.582143783569336, 17.91964340209961, 17.354286193847656
$ volume                   <i64> 560518000, 352965200, 594333600, 484156400, 458707200, 407604400, 601146000, 350506800, 734207600, 876772400
$ adjusted                 <f64> 16.791179656982422, 16.579240798950195, 16.1174373626709, 16.02262306213379, 16.065746307373047, 15.814659118652344, 16.010698318481445, 15.912524223327637, 15.345203399658203, 14.86106777191162
$ close_hurst_100          <f64> None, None, None, None, None, None, None, None, None, None
$ close_hurst_200          <f64> None, None, None, None, None, None, None, None, None, None
# Hurst exponent - polars grouped
pl_grouped = pl.from_pandas(df)
hurst_polars_grouped = (
    pl_grouped
    .group_by("symbol")
    .tk.augment_hurst_exponent(
        date_column="date",
        close_column="close",
        window=100,
    )
)

hurst_polars_grouped.glimpse()
Rows: 16194
Columns: 9
$ symbol                   <str> 'META', 'META', 'META', 'META', 'META', 'META', 'META', 'META', 'META', 'META'
$ date            <datetime[ns]> 2013-01-02 00:00:00, 2013-01-03 00:00:00, 2013-01-04 00:00:00, 2013-01-07 00:00:00, 2013-01-08 00:00:00, 2013-01-09 00:00:00, 2013-01-10 00:00:00, 2013-01-11 00:00:00, 2013-01-14 00:00:00, 2013-01-15 00:00:00
$ open                     <f64> 27.440000534057617, 27.8799991607666, 28.010000228881836, 28.690000534057617, 29.510000228881836, 29.670000076293945, 30.600000381469727, 31.280000686645508, 32.08000183105469, 30.63999938964844
$ high                     <f64> 28.18000030517578, 28.469999313354492, 28.93000030517578, 29.790000915527344, 29.600000381469727, 30.600000381469727, 31.450000762939453, 31.959999084472656, 32.209999084472656, 31.709999084472656
$ low                      <f64> 27.420000076293945, 27.59000015258789, 27.829999923706055, 28.649999618530273, 28.86000061035156, 29.489999771118164, 30.280000686645508, 31.100000381469727, 30.6200008392334, 29.8799991607666
$ close                    <f64> 28.0, 27.770000457763672, 28.760000228881836, 29.420000076293945, 29.059999465942383, 30.59000015258789, 31.299999237060547, 31.719999313354492, 30.950000762939453, 30.100000381469727
$ volume                   <i64> 69846400, 63140600, 72715400, 83781800, 45871300, 104787700, 95316400, 89598000, 98892800, 173242600
$ adjusted                 <f64> 28.0, 27.770000457763672, 28.760000228881836, 29.420000076293945, 29.059999465942383, 30.59000015258789, 31.299999237060547, 31.719999313354492, 30.950000762939453, 30.100000381469727
$ close_hurst_100          <f64> None, None, None, None, None, None, None, None, None, None
from pytimetk.utils.selection import contains

selector_df = (
    df
    .groupby("symbol")
    .augment_hurst_exponent(
        date_column=contains("dat"),
        close_column=contains("clos"),
        window=100,
    )
)

selector_df.glimpse()
<class 'pandas.core.frame.DataFrame'>: 16194 rows of 9 columns
symbol:           object            ['META', 'META', 'META', 'META', 'ME ...
date:             datetime64[ns]    [Timestamp('2013-01-02 00:00:00'), T ...
open:             float64           [27.440000534057617, 27.879999160766 ...
high:             float64           [28.18000030517578, 28.4699993133544 ...
low:              float64           [27.420000076293945, 27.590000152587 ...
close:            float64           [28.0, 27.770000457763672, 28.760000 ...
volume:           int64             [69846400, 63140600, 72715400, 83781 ...
adjusted:         float64           [28.0, 27.770000457763672, 28.760000 ...
close_hurst_100:  float64           [nan, nan, nan, nan, nan, nan, nan,  ...