for Loop multiple conditions
Question:
I am trying to create a new variable (column) which values depend on the values of other variable, therefore I try to create a for loop with multiple conditions.
Here is the code that I tried:
ABC_Levels = []
for i in range(len(df_ABC)):
if df_ABC.loc[i,'ABC']>=5.7:
ABC_Levels.append('High')
elif df_ABC.loc[i,'ABC']<5.4:
ABC_Levels.append('Low')
else:
ABC_Levels.append('Intermediate')
ABC_Levels
It returns the following error:
KeyError: 0
The above exception was the direct cause of the following exception:
I guess that is something related with the first row because if I change the code for this one:
ABC_Levels = []
for i in range(len(df_ABC['ABC'])):
if df_ABC.loc[i]>=5.7:
ABC_Levels.append('High')
elif df_ABC.loc[i]<5.4:
ABC_Levels.append('Low')
else:
ABC_Levels.append('Intermediate')
ABC_Levels
It returns the value of the first row.
I’ll appreciate any help
Answers:
Not an answer (at least a full one), but personally I would try to use np.select
instead of enumerating the dataframe. For example:
series = df_ABC['ABC']
conds = [series >= 5.7, series < 5.4]
choices = ['High', 'Low']
np.select(conds, choices, default='Intermediate').tolist()
UPD
Modification which worked for OP:
conditions = [ (df_ABC['ABC'] >= 5.7), (df_ABC['ABC'] <= 5.4) ]
choices = ['High', 'Low']
df_ABC['ABC_expression'] = np.select(conditions, choices, default='Intermediate')
df_ABC
I am trying to create a new variable (column) which values depend on the values of other variable, therefore I try to create a for loop with multiple conditions.
Here is the code that I tried:
ABC_Levels = []
for i in range(len(df_ABC)):
if df_ABC.loc[i,'ABC']>=5.7:
ABC_Levels.append('High')
elif df_ABC.loc[i,'ABC']<5.4:
ABC_Levels.append('Low')
else:
ABC_Levels.append('Intermediate')
ABC_Levels
It returns the following error:
KeyError: 0
The above exception was the direct cause of the following exception:
I guess that is something related with the first row because if I change the code for this one:
ABC_Levels = []
for i in range(len(df_ABC['ABC'])):
if df_ABC.loc[i]>=5.7:
ABC_Levels.append('High')
elif df_ABC.loc[i]<5.4:
ABC_Levels.append('Low')
else:
ABC_Levels.append('Intermediate')
ABC_Levels
It returns the value of the first row.
I’ll appreciate any help
Not an answer (at least a full one), but personally I would try to use np.select
instead of enumerating the dataframe. For example:
series = df_ABC['ABC']
conds = [series >= 5.7, series < 5.4]
choices = ['High', 'Low']
np.select(conds, choices, default='Intermediate').tolist()
UPD
Modification which worked for OP:
conditions = [ (df_ABC['ABC'] >= 5.7), (df_ABC['ABC'] <= 5.4) ]
choices = ['High', 'Low']
df_ABC['ABC_expression'] = np.select(conditions, choices, default='Intermediate')
df_ABC