Python pandas and numpy: assign numerical values to new variable based on multiple conditions for existing variables

Question:

This is trivial in Excel, why is this so hard in Python?

The goal is to compute the state variable based on several conditions, including the previous value of the state variable as well.
Value is the known integer, Min(3) and Max(3) are simply the minimum and maximum value of a 3 period rolling window shifted forward by one period. This is how far I got.

Index Value Max(3) Min(3) 
0     10    nan    nan    
1     20    nan    nan        
2     15    nan    nan         
3     25    20     10    
4     15    25     15     
5     10    25     15     
6     15    20     10         

What is the best way to calculate the state variable based on the following conditions:

  • a) If Value > Max(3) then 1
  • b) If Value < Min(3) then 4
  • c) If Value <= Max(3) & Value >= Min (3) & previous State = 1 or 2 then 2
  • d) If Value <= Max(3) & Value >= Min (3) & previous State = 4 or 3 then 3

Should look like this in final DataFrame:

Index Value Max(3) Min(3) State
0     10    nan    nan    nan
1     20    nan    nan    nan    
2     15    nan    nan    nan     
3     25    20     10     1
4     15    25     15     2
5     10    25     15     4
6     15    20     10     3    

I have mostly tried this using np.where() funtion but always run into problems once I approach c) and d) conditions.

Asked By: Lucas

||

Answers:

You can use this:

df.loc[df.Value.gt(df['Max(3)']), 'State'] = 1
df.loc[df.Value.lt(df['Min(3)']), 'State'] = 4
df.loc[df.Value.between(df['Min(3)'], df['Max(3)']) & (df.State.shift(1).isin((3, 4))), 'State'] = 3
df.loc[df.Value.between(df['Min(3)'], df['Max(3)']) & (df.State.shift(1).isin((1,2))), 'State'] = 2

output:
enter image description here

Explaination:
the first statement checks where df.Value is greater than df[‘Max(3)’] and creates a new column ‘State’ filled with NaNs, and only 1s on the location of the condition

the seconds line sets 4s where df.Value is smaller than df.[‘Min(3)’]

The last two statements check if df.Value is within Max(3) and Min(3) and compares the df.State last value (.shift). Note: you can also use .between here instead of .isin if the numbers are subsequential.

Answered By: n4321d

np.select can handle multiple conditions quite well with good readability

df['Value'] = df['Value'].astype(float)

conditions =
[
    df['Value']>df['Max(3)'],
    df['Value']<df['Min(3)'],
    (df['Value']<=df['Max(3)']) & (df['Value']>= df['Min(3)']) & (df['State'].shift().isin((1,2))),
    (df['Value']<=df['Max(3)']) & (df['Value']>= df['Min(3)']) & (df['State'].shift().isin((3,4)))
]

choicelist = [1, 4, 2, 3]

df['State'] = np.select(conditions, choicelist, np.nan)
print(df)
   Index  Value  Max(3)  Min(3)  State
0      0   10.0     NaN     NaN    NaN
1      1   20.0     NaN     NaN    NaN
2      2   15.0     NaN     NaN    NaN
3      3   25.0    20.0    10.0    1.0
4      4   15.0    25.0    15.0    2.0
5      5   10.0    25.0    15.0    4.0
6      6   15.0    20.0    10.0    3.0
Answered By: Rabinzel
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.