overwriting dataframes in pandas

Question:

I have a given dataframe

new_df :

ID summary text_len
1 xxx 45
2 aaa 34

I am performing some df manipulation by concatenating keywords from different df, like that:

keywords = df["keyword"].to_list()
for key in keywords:
    new_df[key] = new_df["summary"].str.lower().str.count(key)
new_df

from here I need two separate dataframes to perform few actions (to each of them add some columns, do some calculations etc).

I need a dataframe with occurrences as per given piece of code and a binary dataframe.

WHAT I DID:

  1. assign dataframe for occurrences:
    df_freq = new_df (because it is already calculated an done)

  2. I created another dataframe – binary one – on the top of new_df:

    #select only numeric columns to change them to binary

    numeric_cols = new_df.select_dtypes("number", exclude=’float64′).columns.tolist()

    new_df_binary = new_df

    new_df_binary[‘text_length’] = new_df_binary[‘text_length’].astype(int)

    new_df_binary[numeric_cols] = (new_df_binary[numeric_cols] > 0).astype(int)

  3. Everything works fine – I perform the math I need, but when I want to come back to df_freq – it is no longer dataframe with occurrences.. looks like it changed along with binary code

I need separate tables and perform separate math on them. Do you know how I can avoid this hmm overwriting issue?

Asked By: Kas

||

Answers:

You may use pandas’ copy method with the deep argument set to True:

df_freq = new_df.copy(deep=True)

Setting deep=True (which is the default parameter) ensures that modifications to the data or indices of the copy do not impact the original dataframe.

Answered By: Sheldon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.