Ungroup pandas dataframe after bfill

Question:

I’m trying to write a function that will backfill columns in a dataframe adhearing to a condition. The upfill should only be done within groups. I am however having a hard time getting the group object to ungroup. I have tried reset_index as in the example bellow but that gets an AttributeError.

Accessing the original df through result.obj doesn’t lead to the updated value because there is no inplace for the groupby bfill.

def upfill(df:DataFrameGroupBy)->DataFrameGroupBy:
    for column in df.obj.columns:
        if column.startswith("x"):
            df[column].bfill(axis="rows", inplace=True)
    return df 

Assigning the dataframe column in the function doesn’t work because groupbyobject doesn’t support item assingment.

def upfill(df:DataFrameGroupBy)->DataFrameGroupBy:
    for column in df.obj.columns:
        if column.startswith("x"):
            df[column] = df[column].bfill()
    return df 

The test I’m trying to get to pass:


def test_upfill():
    df = DataFrame({
        "id":[1,2,3,4,5],
        "group":[1,2,2,3,3],
        "x_value": [4,4,None,None,5],
    })
    grouped_df = df.groupby("group")
    result = upfill(grouped_df)
    result.reset_index()
    assert result["x_value"].equals(Series([4,4,None,5,5]))


Asked By: jhylands

||

Answers:

You should use ‘transform’ method on the grouped DataFrame, like this:

import pandas as pd

def test_upfill():
    df = pd.DataFrame({
        "id":[1,2,3,4,5],
        "group":[1,2,2,3,3],
        "x_value": [4,4,None,None,5],
    })
    result = df.groupby("group").transform(lambda x: x.bfill())
    assert result["x_value"].equals(pd.Series([4,4,None,5,5]))

test_upfill()

Here you can find find more information about the transform method on Groupby objects

Based on the accepted answer this is the full solution I got to although I have read elsewhere there are issues using the obj attribute.

def upfill(df:DataFrameGroupBy)->DataFrameGroupBy:
    columns = [column for column in df.obj.columns if column.startswith("x")]
    df.obj[columns] = df[columns].transform(lambda x:x.bfill())
    return df 
def test_upfill():
    df = DataFrame({
        "id":[1,2,3,4,5],
        "group":[1,2,2,3,3],
        "x_value": [4,4,None,None,5],
    })
    grouped_df = df.groupby("group")
    result = upfill(grouped_df)
    assert df["x_value"].equals(Series([4,4,None,5,5]))

Answered By: jhylands
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.