Unable to assign different values in each cell of a column in dataframe, containing 99,000 records
Question:
I want to change values greater than 70 in column CT_feat7 but it only changes till 59000. After that, I have to run the iteration again, with a different index value.
Please, explain why this happens. Is there a better way?
Dataset before replacement. After I run this code:
for index,j in enumerate(df['CT_feat7']):
if j>70:
df.loc[index,'CT_feat7'] = 11+random.random()
values are changed only up to index 59180.
i,j = 59180,2
while i <= 99195:
if df.loc[i,'CT_feat7']>70:
df.loc[i,'CT_feat7'] = j
j+=0.1
if j>12:
j=2
i+=1
Answers:
I think it is because enumerate()
is not the proper iterator to use with .loc
. Try:
for index,j in df['CT_feat7'].items():
if j>70:
df.loc[index,'CT_feat7'] = 11+random.random()
enumerate()
works on the first ~50,000 rows because that is (I suspect) how many rows are in df
. This is because enumerate()
iterates over the values j
in the passed Series and for each j
, the corresponding index
is the location of j
in the Series, ranging from 0 to the length of the Series. However, when slicing with .loc
, you must give the label (not the location) of the item(s) you want. See this answer for more information.
I want to change values greater than 70 in column CT_feat7 but it only changes till 59000. After that, I have to run the iteration again, with a different index value.
Please, explain why this happens. Is there a better way?
Dataset before replacement. After I run this code:
for index,j in enumerate(df['CT_feat7']):
if j>70:
df.loc[index,'CT_feat7'] = 11+random.random()
values are changed only up to index 59180.
i,j = 59180,2
while i <= 99195:
if df.loc[i,'CT_feat7']>70:
df.loc[i,'CT_feat7'] = j
j+=0.1
if j>12:
j=2
i+=1
I think it is because enumerate()
is not the proper iterator to use with .loc
. Try:
for index,j in df['CT_feat7'].items():
if j>70:
df.loc[index,'CT_feat7'] = 11+random.random()
enumerate()
works on the first ~50,000 rows because that is (I suspect) how many rows are in df
. This is because enumerate()
iterates over the values j
in the passed Series and for each j
, the corresponding index
is the location of j
in the Series, ranging from 0 to the length of the Series. However, when slicing with .loc
, you must give the label (not the location) of the item(s) you want. See this answer for more information.