Modifying a dataframe in a function does not reflect in the calling scope

Question:

If

(A) Mutable objects modified in functions are also mutated in the calling context

and

(B) pandas dataframes are mutable objects,

then in the following example, why is an empty dataframe not printed in the last output (Outside-After)?

import pandas as pd
def foo(df):
    df=df[0:0] # clear the df
    print(df)

df=pd.DataFrame([[1,2,3],[4,5,6]])
print("nOutside - Before:")
print(df)
print("nInside function:")
foo(df)
print("nOutside - After:")
print(df)

Output:

Outside - Before:
   0  1  2
0  1  2  3
1  4  5  6

Inside function:
Empty DataFrame
Columns: [0, 1, 2]
Index: []

Outside - After:
   0  1  2
0  1  2  3
1  4  5  6
Asked By: str31

||

Answers:

Your problem is not with the dataframe itself, but rather with the df identifier inside foo. The df inside foo is a different identifier than the df outside of foo. Setting the version inside the function doesn’t affect the version outside the function. To illustrate…this code is functionally equivalent to yours:

import pandas as pd
def foo(some_df):
    some_df=some_df[0:0] # clear the df
    print(some_df)

df=pd.DataFrame([[1,2,3],[4,5,6]])
print("nOutside - Before:")
print(df)
print("nInside function:")
foo(df)
print("nOutside - After:")
print(df)

This causes some_df to be set to the value of df by way of df being passed into foo as a parameter. df is unaffected from that point on. Hopefully this makes it more clear why df doesn’t change.

To get the result you desire, you can do this:

import pandas as pd
def foo(df):
    df=df[0:0] # clear the df
    print(df)
    return df

df=pd.DataFrame([[1,2,3],[4,5,6]])
print("nOutside - Before:")
print(df)
print("nInside function:")
df = foo(df)
print("nOutside - After:")
print(df)

As you can see, the value of the df outside the function gets set by means of assigning the return value of df to it. Since you’re returning the value of the df inside the function, changing the inner one ends up changing the outer one as well.

Answered By: CryptoFool

If you ever have a value assignment inside a function, the new value will only be available inside that function:

def foo(v):
  v = 1

i = 0
foo(i)
# i is still 0

There is no way to reassign a value inside a function and have that change reflected outside it unless you declare the variable as a global or nonlocal (which you typically shouldn’t do).

However… you can change the contents of the value:

def foo(df):
    df[:]=0

df=pd.DataFrame([[1,2,3],[4,5,6]])
foo(df)
# df is now all zero
Answered By: Simon Lundberg
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.