I’m trying to set the entire column of a dataframe to a specific value.
In : df Out : issueid industry 0 001 xxx 1 002 xxx 2 003 xxx 3 004 xxx 4 005 xxx
From what I’ve seen,
loc is the best practice when replacing values in a dataframe (or isn’t it?):
In : df.loc[:,'industry'] = 'yyy'
However, I still received this much talked-about warning message:
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value instead
If I do
In : df['industry'] = 'yyy'
I got the same warning message.
Any ideas? Working with Python 3.5.2 and pandas 0.18.1.
You can do :
df['industry'] = 'yyy'
Assuming your Data frame is like ‘Data’ you have to consider if your data is a string or an integer. Both are treated differently. So in this case you need be specific about that.
import pandas as pd data = [('001','xxx'), ('002','xxx'), ('003','xxx'), ('004','xxx'), ('005','xxx')] df = pd.DataFrame(data,columns=['issueid', 'industry']) print("Old DataFrame") print(df) df.loc[:,'industry'] = str('yyy') print("New DataFrame") print(df)
Now if want to put numbers instead of letters you must create and array
list_of_ones = [1,1,1,1,1] df.loc[:,'industry'] = list_of_ones print(df)
Or if you are using Numpy
import numpy as np n = len(df) df.loc[:,'industry'] = np.ones(n) print(df)
Python can do unexpected things when new objects are defined from existing ones. You stated in a comment above that your dataframe is defined along the lines of
df = df_all.loc[df_all['issueid']==specific_id,:]. In this case,
df is really just a stand-in for the rows stored in the
df_all object: a new object is NOT created in memory.
To avoid these issues altogether, I often have to remind myself to use the
copy module, which explicitly forces objects to be copied in memory so that methods called on the new objects are not applied to the source object. I had the same problem as you, and avoided it using the
In your case, this should get rid of the warning message:
from copy import deepcopy df = deepcopy(df_all.loc[df_all['issueid']==specific_id,:]) df['industry'] = 'yyy'
EDIT: Also see David M.’s excellent comment below!
df = df_all.loc[df_all['issueid']==specific_id,:].copy() df['industry'] = 'yyy'
This provides you with the possibility of adding conditions on the rows and then change all the cells of a specific column corresponding to those rows:
df.loc[(df['issueid'] == '001'), 'industry'] = str('yyy')
You can use the
df = df.assign(industry='yyy')
df.loc[:,'industry'] = 'yyy'
This does the magic. You are to add ‘.loc’ with ‘:’ for all rows. Hope it helps
Seems to me that:
df1 = df[df['col1']==some_value] will not create a new DataFrame, basically, changes in
df1 will be reflected in the parent
df. This leads to the warning.
df1 = df[df['col1]]==some_value].copy() will create a new DataFrame, and changes in
df1 will not be reflected in
copy method is recommended if you don’t want to make changes to your original
I had a similar issue before even with this approach
df.loc[:,'industry'] = 'yyy', but once I refreshed the notebook, it ran well.
You may want to try refreshing the cells after you have
df.loc[:,'industry'] = 'yyy'.
if you just create new but empty data frame, you cannot directly sign a value to a whole column. This will show as NaN because the system wouldn’t know how many rows the data frame will have!You need to either define the size or have some existing columns.
df = pd.DataFrame() df["A"] = 1 df["B"] = 2 df["C"] = 3
Only use them instead:
df.iloc[:]['industry'] = 'yyy'
remember: this only works with exist columns in dataframe
this for people who didn’t work .loc
For anyone else coming for this answer and doesn’t want to use copy –
df['industry'] = df['industry'].apply(lambda x: '')