Skip to contents

Standardization is commonly used to center and scale numeric features to prevent one from dominating in algorithms that require data to be on the same scale.

Usage

standardize_vec(x, mean = NULL, sd = NULL, silent = FALSE)

standardize_inv_vec(x, mean, sd)

Arguments

x

A numeric vector.

mean

The mean used to invert the standardization

sd

The standard deviation used to invert the standardization process.

silent

Whether or not to report the automated mean and sd parameters as a message.

Value

Returns a numeric vector with the standardization transformation applied.

Details

Standardization vs Normalization

  • Standardization refers to a transformation that reduces the range to mean 0, standard deviation 1

  • Normalization refers to a transformation that reduces the min-max range: (0, 1)

See also

Examples

library(dplyr)

d10_daily <- m4_daily %>% dplyr::filter(id == "D10")

# --- VECTOR ----

value_std <- standardize_vec(d10_daily$value)
#> Standardization Parameters
#> mean: 2261.60682492582
#> standard deviation: 175.603721730477
value     <- standardize_inv_vec(value_std,
                                 mean = 2261.60682492582,
                                 sd   = 175.603721730477)

# --- MUTATE ----

m4_daily %>%
    group_by(id) %>%
    mutate(value_std = standardize_vec(value))
#> Standardization Parameters
#> mean: 2261.60682492582
#> standard deviation: 175.603721730477
#> Standardization Parameters
#> mean: 9243.15525375268
#> standard deviation: 4663.16194403596
#> Standardization Parameters
#> mean: 8259.78634615385
#> standard deviation: 927.592527167825
#> Standardization Parameters
#> mean: 8287.72878932316
#> standard deviation: 2456.05840988041
#> # A tibble: 9,743 × 4
#> # Groups:   id [4]
#>    id    date       value value_std
#>    <fct> <date>     <dbl>     <dbl>
#>  1 D10   2014-07-03 2076.     -1.06
#>  2 D10   2014-07-04 2073.     -1.07
#>  3 D10   2014-07-05 2049.     -1.21
#>  4 D10   2014-07-06 2049.     -1.21
#>  5 D10   2014-07-07 2006.     -1.45
#>  6 D10   2014-07-08 2018.     -1.39
#>  7 D10   2014-07-09 2019.     -1.38
#>  8 D10   2014-07-10 2007.     -1.45
#>  9 D10   2014-07-11 2010      -1.43
#> 10 D10   2014-07-12 2002.     -1.48
#> # ℹ 9,733 more rows