parallel_apply

parallel_apply(
    data,
    func,
    show_progress=True,
    threads=None,
    desc='Processing...',
    **kwargs,
)

The parallel_apply function parallelizes the application of a function on grouped dataframes using concurrent.futures.

Parameters

Name	Type	Description	Default
data	pd.core.groupby.generic.DataFrameGroupBy	The `data` parameter is a Pandas DataFrameGroupBy object, which is the result of grouping a DataFrame by one or more columns. It represents the grouped data that you want to apply the function to.	required
func	Callable	The `func` parameter is the function that you want to apply to each group in the grouped dataframe. This function should take a single argument, which is a dataframe representing a group, and return a result. The result can be a scalar value, a pandas Series, or a pandas DataFrame.	required
show_progress	bool	A boolean parameter that determines whether to display progress using tqdm. If set to True, progress will be displayed. If set to False, progress will not be displayed.	`True`
threads	int	The `threads` parameter specifies the number of threads to use for parallel processing. If `threads` is set to `None`, it will use all available processors. If `threads` is set to `-1`, it will use all available processors as well.	`None`
**kwargs		The `**kwargs` parameter is a dictionary of keyword arguments that are passed to the `func` function.	`{}`

Returns

Name	Type	Description
	pd.DataFrame	The `parallel_apply` function returns a combined result after applying the specified function on all groups in the grouped dataframe. The result can be a pandas DataFrame or a pandas Series, depending on the function applied.

Examples:

# Example 1 - Single argument returns Series

import pytimetk as tk
import pandas as pd

df = pd.DataFrame({
    'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
    'B': [1, 2, 3, 4, 5, 6]
})

grouped = df.groupby('A')

result = grouped.apply(lambda df: df['B'].sum())
result

result = tk.parallel_apply(grouped, lambda df: df['B'].sum(), show_progress=True, threads=2)
result

A
bar    12
foo     9
dtype: int64

# Example 2 - Multiple arguments returns MultiIndex DataFrame

import pytimetk as tk
import pandas as pd

df = pd.DataFrame({
    'A': ['foo', 'foo', 'bar', 'bar', 'foo', 'bar', 'foo', 'foo'],
    'B': ['one', 'one', 'one', 'two', 'two', 'two', 'one', 'two'],
    'C': [1, 3, 5, 7, 9, 2, 4, 6]
})

def calculate(group):
    return pd.DataFrame({
        'sum': [group['C'].sum()],
        'mean': [group['C'].mean()]
    })

grouped = df.groupby(['A', 'B'])

result = grouped.apply(calculate)
result

result = tk.parallel_apply(grouped, calculate, show_progress=True)
result

			sum	mean
A	B
bar	one	0	5	5.000000
bar	two	0	9	4.500000
foo	one	0	8	2.666667
foo	two	0	15	7.500000

# Example 3 - Multiple arguments returns MultiIndex DataFrame

import pytimetk as tk
import pandas as pd

df = pd.DataFrame({
    'A': ['foo', 'foo', 'bar', 'bar', 'foo', 'bar', 'foo', 'foo'],
    'B': ['one', 'one', 'one', 'two', 'two', 'two', 'one', 'two'],
    'C': [1, 3, 5, 7, 9, 2, 4, 6]
})

def calculate(group):
    return group.head(2)

grouped = df.groupby(['A', 'B'])

result = grouped.apply(calculate)
result

result = tk.parallel_apply(grouped, calculate, show_progress=True)
result

			A	B	C
A	B
bar	one	2	bar	one	5
	two	3	bar	two	7
	two	5	bar	two	2
foo	one	0	foo	one	1
	one	1	foo	one	3
	two	4	foo	two	9
	two	7	foo	two	6

# Example 4 - Single Grouping Column Returns DataFrame

import pytimetk as tk
import pandas as pd

df = pd.DataFrame({
    'A': ['foo', 'foo', 'bar', 'bar', 'foo', 'bar', 'foo', 'foo'],
    'B': [1, 3, 5, 7, 9, 2, 4, 6]
})

def calculate(group):
    return pd.DataFrame({
        'sum': [group['B'].sum()],
        'mean': [group['B'].mean()]
    })

grouped = df.groupby(['A'])

result = grouped.apply(calculate)
result

result = tk.parallel_apply(grouped, calculate, show_progress=True)
result

		sum	mean
A
bar	0	14	4.666667
foo	0	23	4.600000