Mutate multiple pandas dataframe inplace using a function

Question:

I would like to write a function that takes multiple dataframes that have the same structure, does specific transformations and saves the transformations inplace.

Dummy dataframes

df = pd.DataFrame({"Full name" : ["John Doe","Deep Smith","Julia Carter","Kate Newton","Sandy Thompson"], 
                     "Monthly Sales" : [25,30,35,40,45]}) 

df2 = pd.DataFrame({"Full name" : ["Alicia Williams","Kriten John","Jessica Adams","Isaac Newton","Whitney Gordon"], 
                     "Monthly Sales" : [35,20,50,15,40]})

Transformative function

I don’t want to return the dataframe, but rather save those transformations in place.

def tidy_dfs(dfs):
    for df in dfs:
        # Drop first row
        df = df.iloc[1: , :]
        # Replace spaces in columns
        df.columns = [c.replace(' ', '_') for c in df]
        # change cols to lower
        df.columns = [c.lower() for c in df]
    return df

saving df,df2 = tidy_dfs([df,df2]) of course won’t work as we’re outside the loop.

Results
What would be a way to call this function and save the transformation inplace?

tidy_dfs([df,df2])

Answers:

EDIT: If pass list of DataFrames, you can return another list (out) or modify existing list dfs. So not possible inplace list of DataFrame without assign back like last step.

Your function not return list of DataFrame, so you need create empty list and append cleaned DataFrame:

def tidy_dfs(dfs):
    out = []
    for df in dfs:
        # Drop first row
        df = df.iloc[1: , :]
        # Replace spaces in columns
        df.columns = [c.replace(' ', '_') for c in df]
        # change cols to lower
        df.columns = [c.lower() for c in df]
        out.append(df)
    return out

df,df2 = tidy_dfs([df,df2])

For inplace operations:

def tidy_dfs(dfs):
    for df in dfs:
        # Drop first row
        df.drop(df.index[0], inplace=True)
        # Replace spaces in columns and lowercase
        df.rename(columns = lambda x: x.replace(' ', '_').lower(), inplace=True)

    return dfs

df, df2 = tidy_dfs([df,df2])
Answered By: jezrael

The problem is, that you can not reassign the outer variable to the new dataframe address. And pandas tries to avoid such thing as it may be dangerous and always tries to conserve the original dataframe.

It is possible to drop everything and then append new values inplace at the end of the loop. However, this is "ugly" (error-prone and cumbersome)..

Answered By: tturbo
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.