Python Polars: How to add a progress bars to apply loops

Question:

Is it possible to add a progress bar to a Polars apply loop with a custom function?

For example, how would I add a progress bar to the following toy example:

        import polars as pl

        df = pl.DataFrame(
            {
                "team": ["A", "A", "A", "B", "B", "C"],
                "conference": ["East", "East", "East", "West", "West", "East"],
                "points": [11, 8, 10, 6, 6, 5],
                "rebounds": [7, 7, 6, 9, 12, 8]
            }
        )

        df.groupby('team').apply(lambda x: x.select(pl.col('points').mean()))

Edit 1:

After help from @Jcurious, I have the following ‘tools’ that can be re-used for other functions, however it does not print to console correctly.

        def pl_progress_applier(func, task_id, progress, **kwargs):
            progress.update(task_id, advance=1, refresh=True)
            return func(**kwargs)

        def pl_groupby_progress_apply(data, group_by, func, drop_cols=[], **kwargs):

            global progress
            with Progress() as progress:
                num_groups = len(data.select(group_by).unique())
                task_id = progress.add_task("Applying", total=num_groups)
                return (
                    data
                        .groupby(group_by)
                        .apply(lambda x: pl_progress_applier(
                            x=x.drop(drop_cols), func=func, task_id=task_id, progress=progress, **kwargs)
                        )
                )

        # and using the function custom_func, we can return a table, howevef the progress bar jumps to 100%

        def custom_func(x):
            return x.select(pl.col('points').mean())

        pl_groupby_progress_apply(
            data=df,
            group_by='team',
            func=custom_func
        )

Any ideas on how to get the progress bar to actually work?

Edit 2:

It seems like the above functions do indeed work, however if you’re using PyCharm (like me), then it does not work. Enjoy non-PyCharm users!

Asked By: Sharma

||

Answers:

You could use rich.progress which also comes bundled with pip.

progress.update() can manually update a progress bar.

from pip._vendor.rich.progress import Progress

def my_custom_function(group):
    progress.update(task_id, advance=1)
    return group.select(pl.col('points').mean())
   
with Progress() as progress:     
    num_groups = df.get_column("team").unique().len()
    task_id = progress.add_task("Applying", total=num_groups)
    
    df.groupby('team').apply(my_custom_function)

Although perhaps you should share what you’re actually doing as .groupby.apply() is going to be "slow" – there may be a better way.

Answered By: jqurious

The best solution I found is tqdm. We want a solution that

  1. enable us stay in the polars coding style.
  2. General

To do so, all we have to define is this function:

import polars as pl
from tqdm import tqdm

def w_pbar(pbar, func):
    def foo(*args, **kwargs):
        pbar.update(1)
        return func(*args, **kwargs)

    return foo

Now, we could take your original code, generate pbar and add ‘w_pbar’ in the appropriate place:

df = pl.DataFrame(
    {
        "team": ["A", "A", "A", "B", "B", "C"],
        "conference": ["East", "East", "East", "West", "West", "East"],
        "points": [11, 8, 10, 6, 6, 5],
        "rebounds": [7, 7, 6, 9, 12, 8]
    }
)
num_groups = df.get_column("team").unique().len()
with tqdm(total=num_groups) as pbar:
    res = df.groupby('team').apply(w_pbar(pbar, lambda x: x.select(pl.col('points').mean())))

You can generate pbar (the tqdm object) with every setting you want. And add w_pbar to any usage of ‘apply’.

bty, it also works for ‘apply’ without ‘groupby’:

pbar = tqdm(total=len(df), desc='adding 1 to points', colour='green')
df1 = df.with_columns(pl.col('points').apply(w_pbar(pbar, lambda x: x + 1)).alias('points+1'))
pbar.close() 
Answered By: Evyatar Cohen
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.