DataFrame modified inside a function

Question:

I face a problem of modification of a dataframe inside a function that I have never observed previously. Is there a method to deal with this so that the initial dataframe is not modified.

def test(df):
    df['tt'] = np.nan
    return df

dff = pd.DataFrame(data=[])

Now, when I print dff, the output is

Empty DataFrame
Columns: []
Index: []

If I pass dff to test() defined above, dff is modified. In other words,

df = test(dff)
print(dff)

now prints

Empty DataFrame
Columns: [tt]
Index: []

How do I make sure dff is not modified after being passed to test()?

Asked By: Alexis G

||

Answers:

def test(df):
    df = df.copy(deep=True)
    df['tt'] = np.nan
    return df

If you pass the dataframe into a function and manipulate it and return the same dataframe, you are going to get the same dataframe in modified version. If you want to keep your old dataframe and create a new dataframe with your modifications then by definition you have to have 2 dataframes. The one that you pass in that you don’t want modified and the new one that is modified. Therefore, if you don’t want to change the original dataframe your best bet is to make a copy of the original dataframe. In my example I rebound the variable “df” in the function to the new copied dataframe. I used the copy method and the argument “deep=True” makes a copy of the dataframe and its contents. You can read more here:http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html

Answered By: Skorpeo

As Skorpeo mentioned, since a dataframe can be modified in-place, it can be modified inside a function. One way to not modify the original is to make a new copy inside the function as in Skorpeo’s answer.

If you don’t want to change the function, passing a copy is also an option:

def test(df):
    df['tt'] = np.nan
    return df

df = test(dff.copy())            # <---- pass a copy of `dff`
Answered By: cottontail