Loop through the columns of a Dataframe to modify a slice of it
Question:
I Have the following dataframe and I am trying to modify a slice of it by iterating through the columns using a for loop.
data = {'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25-25', '59-59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165-171', '175-182'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85-90', '90-95'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19-21', '20-22'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12-15', '12-15'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']}
df = pd.DataFrame(data)
for col in df:
if col =='id':
continue
else:
df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]
But I am experiencing something strange where after running the loop, it seems like it doesn’t affect all the columns. I am expecting this:
#Expected
pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece', 'New York'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens', 'Albany']})
But I am getting this instead:
pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']})
Answers:
If you are interested to know what wrong is there in your code then follow below one;
Issue actually was in the loop… It updates the ’employment’ column too before updating last 2 columns… & when it comes to the their turn then their is actually no values of ’12-15′ in employement column as they got updated to ’12’… So, just changing the order of loop of columns in your code will solve the problem, where ’employment’ will update at the end…
lst_cols = list(df.columns)
lst_cols.remove('employment')
lst_cols = lst_cols + ['employment']
for col in lst_cols:
if col =='id':
continue
else:
df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]
Get the indexes of the dataframe slice that you are trying to modify, assign them to a variable (so they won’t change while/after running the loop) and use iloc instead of loc. This will also work when slicing the dataframe on conditions based multiple columns
index = df.loc[(df['employment']=='12-15') & (df['weight']=='90-95')].index
for ind,col in enumerate(df.columns):
if ind==0:
continue
else:
df.iloc[index,ind] = df.iloc[index,ind].str.split('-').str[0]
I Have the following dataframe and I am trying to modify a slice of it by iterating through the columns using a for loop.
data = {'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25-25', '59-59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165-171', '175-182'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85-90', '90-95'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19-21', '20-22'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12-15', '12-15'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']}
df = pd.DataFrame(data)
for col in df:
if col =='id':
continue
else:
df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]
But I am experiencing something strange where after running the loop, it seems like it doesn’t affect all the columns. I am expecting this:
#Expected
pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece', 'New York'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens', 'Albany']})
But I am getting this instead:
pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']})
If you are interested to know what wrong is there in your code then follow below one;
Issue actually was in the loop… It updates the ’employment’ column too before updating last 2 columns… & when it comes to the their turn then their is actually no values of ’12-15′ in employement column as they got updated to ’12’… So, just changing the order of loop of columns in your code will solve the problem, where ’employment’ will update at the end…
lst_cols = list(df.columns)
lst_cols.remove('employment')
lst_cols = lst_cols + ['employment']
for col in lst_cols:
if col =='id':
continue
else:
df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]
Get the indexes of the dataframe slice that you are trying to modify, assign them to a variable (so they won’t change while/after running the loop) and use iloc instead of loc. This will also work when slicing the dataframe on conditions based multiple columns
index = df.loc[(df['employment']=='12-15') & (df['weight']=='90-95')].index
for ind,col in enumerate(df.columns):
if ind==0:
continue
else:
df.iloc[index,ind] = df.iloc[index,ind].str.split('-').str[0]