New dataframe carrying changes to original in pandas/gspread script

Question:

I am writing a code to read data from google sheets using gspread module.

First I read the spreadsheet and store values in a variable called df. Afterwards, I create a variable called df2 from df to make some transformations (string to numeric), while keeping df (the original database intact ). However this transformation made in df2 is carried to df (original variable where I store the original database). This should not behave like that, the change sould occur only in df2.

Does anyone know why this is happening?

Pls see the code below:

import gspread
import pandas as pd

sa = gspread.service_account(filename = "keys.json") 
sheet = sa.open("chupacabra") 
worksheet = sheet.worksheet("vaca_loca")

df = pd.DataFrame(worksheet.get("B2:I101"))

df

[df loaded](https://i.stack.imgur.com/lV3GJ.png)

df2 = df

df2["Taxa"] = df2["Taxa"].str.replace(",",".")
df2["Taxa"] = df2["Taxa"].str.replace("%","")
df2["Taxa"] = pd.to_numeric(df2["Taxa"])
df2["Taxa"] = df2["Taxa"]/100

df2

[df2 after string transformation](https://i.stack.imgur.com/cFWOg.png)

df 

[df carrying the transformation changes made in df2](https://i.stack.imgur.com/KsSsa.png)

I was trying to perform only transformation in df2, while df should remain intact.

Asked By: Dico Gomes

||

Answers:

In your script, I’m worried that the reason for your issue might be due to the call by reference. If my understanding is correct, how about the following modification?

From:

df2 = df

To:

df2 = df.copy()
  • By this modification, df is copied as the pass-by-value.
Answered By: Tanaike
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.