Loop through the columns of a Dataframe to modify a slice of it

Question

I Have the following dataframe and I am trying to modify a slice of it by iterating through the columns using a for loop.

data = {'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
       'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25-25', '59-59'],
       'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165-171', '175-182'],
       'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85-90', '90-95'],
       'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19-21', '20-22'],
       'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12-15', '12-15'],
       'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
       'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']}

df = pd.DataFrame(data)

for col in df:
    if col =='id':
        continue
    else:
        df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]

But I am experiencing something strange where after running the loop, it seems like it doesn’t affect all the columns. I am expecting this:

#Expected

pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
       'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
       'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
       'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
       'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
       'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
       'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece', 'New York'],
       'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens', 'Albany']})

But I am getting this instead:

pd.DataFrame({'id':[12, 84, 156, 228, 300, 372, 444, 516, 588, 660, 732],
       'age':['18-18', '22-22', '35-35', '33-33', '45-45', '40-40', '55-55', '60-60', '47-47', '25', '59'],
       'height':['175-177', '165-167', '175-178', '165-168', '175-179', '165-169', '175-180', '165-170', '175-181', '165', '175'],
       'weight':['65-70', '65-70', '80-85', '75-80', '90-95', '100-105', '80-85', '70-75', '70-75', '85', '90'],
       'education':['10-12', '11-13', '12-14', '13-15', '14-16', '15-17', '16-18', '17-19', '18-20', '19', '20'],
       'employment':['1-4', '8-11', '8-11', '4-7', '5-8', '5-8', '9-12', '15-18', '13-16', '12', '12'],
       'country':['France-EU', 'Austria-EU', 'Netherland-EU', 'Italy-EU', 'Texas-US', 'California-US', 'Washington-US', 'Poland-EU', 'Spain-EU', 'Greece-EU', 'New York-US'],
       'city':['Paris-FR', 'Vienna-AUS', 'Amsterdam-NL', 'Rome-ITA', 'Austin-TX', 'LA-CAL', 'Olympia-WAS', 'Warsaw-PL', 'Madrid-SPA', 'Athens-GR', 'Albany-NY']})

Asked By: Brandon

||

Source

Answer 1

If you are interested to know what wrong is there in your code then follow below one;

Issue actually was in the loop… It updates the ’employment’ column too before updating last 2 columns… & when it comes to the their turn then their is actually no values of ’12-15′ in employement column as they got updated to ’12’… So, just changing the order of loop of columns in your code will solve the problem, where ’employment’ will update at the end…

lst_cols = list(df.columns)
lst_cols.remove('employment')
lst_cols =  lst_cols + ['employment']

for col in lst_cols:
    if col =='id':
        continue
    else:
        df.loc[df['employment']=='12-15',col] = df[col].str.split('-').str[0]

Answered By: Sachin Kohli

Answer 2

Get the indexes of the dataframe slice that you are trying to modify, assign them to a variable (so they won’t change while/after running the loop) and use iloc instead of loc. This will also work when slicing the dataframe on conditions based multiple columns

index = df.loc[(df['employment']=='12-15') & (df['weight']=='90-95')].index

for ind,col in enumerate(df.columns):
    if ind==0:
        continue
    else:
        df.iloc[index,ind] = df.iloc[index,ind].str.split('-').str[0]

Answered By: Brandon

Loop through the columns of a Dataframe to modify a slice of it

Question:

Answers: