Get nearest low and high values across multiple dataframe columns
Question:
I have a dataframe similar to this:
import pandas as pd
id = [1001, 1002, 1003]
a = [156, 224, 67]
b = [131, 203, 61]
c = [97, 165, 54]
d = [68, 122, 50]
value = [71, 180, 66]
df = pd.DataFrame({'id':id, 'a':a, 'b':b, 'c':c, 'd':d, 'value':value})
id
a
b
c
d
value
1001
156
131
97
68
71
1002
224
203
165
122
180
1003
67
61
54
50
66
For each row, I would like to evaluate columns a-d and within them identify the next lowest and next highest values, as compared to value. So in this example, the expected result would look like:
id
a
b
c
d
value
nxt_low
nxt_high
1001
156
131
97
68
71
68
97
1002
224
203
165
122
180
165
203
1003
67
61
54
50
66
61
67
I have tried creating a single column with a numpy array from a-d and trying to do some operations that way, but I’m not applying it correctly and have been unable to get the desired result. Any help is greatly appreciated.
Answers:
you can get nearest low following code:
df.apply(lambda x: x[x < x[-1]].max(), axis=1)
output:
0 68
1 165
2 61
dtype: int64
get nearest low and high and make result to columns:
df[['nxt_low', 'nxt_high']] = df.apply(lambda x: [x[x < x[-1]].max(), x[x > x[-1]].min()], axis=1, result_type='expand')
df
:
id a b c d value nxt_low nxt_high
0 1001 156 131 97 68 71 68 97
1 1002 224 203 165 122 180 165 203
2 1003 67 61 54 50 66 61 67
If id
is nearest low or high, modify code a bit.
df[['nxt_low', 'nxt_high']] = df.iloc[:, 1:].apply(lambda x: [x[x < x[-1]].max(), x[x > x[-1]].min()], axis=1, result_type='expand')
Here is a way:
cols = ['a','b','c','d']
df2 = df[cols].sub(df['value'],axis=0)
df = (df.assign(nxt_low = df.where(df2.lt(0)).max(axis=1),
nxt_high = df.where(df2.gt(0)).min(axis=1)))
Output:
id a b c d value nxt_low nxt_high
0 1001 156 131 97 68 71 68.0 97.0
1 1002 224 203 165 122 180 165.0 203.0
2 1003 67 61 54 50 66 61.0 67.0
I have a dataframe similar to this:
import pandas as pd
id = [1001, 1002, 1003]
a = [156, 224, 67]
b = [131, 203, 61]
c = [97, 165, 54]
d = [68, 122, 50]
value = [71, 180, 66]
df = pd.DataFrame({'id':id, 'a':a, 'b':b, 'c':c, 'd':d, 'value':value})
id | a | b | c | d | value |
---|---|---|---|---|---|
1001 | 156 | 131 | 97 | 68 | 71 |
1002 | 224 | 203 | 165 | 122 | 180 |
1003 | 67 | 61 | 54 | 50 | 66 |
For each row, I would like to evaluate columns a-d and within them identify the next lowest and next highest values, as compared to value. So in this example, the expected result would look like:
id | a | b | c | d | value | nxt_low | nxt_high |
---|---|---|---|---|---|---|---|
1001 | 156 | 131 | 97 | 68 | 71 | 68 | 97 |
1002 | 224 | 203 | 165 | 122 | 180 | 165 | 203 |
1003 | 67 | 61 | 54 | 50 | 66 | 61 | 67 |
I have tried creating a single column with a numpy array from a-d and trying to do some operations that way, but I’m not applying it correctly and have been unable to get the desired result. Any help is greatly appreciated.
you can get nearest low following code:
df.apply(lambda x: x[x < x[-1]].max(), axis=1)
output:
0 68
1 165
2 61
dtype: int64
get nearest low and high and make result to columns:
df[['nxt_low', 'nxt_high']] = df.apply(lambda x: [x[x < x[-1]].max(), x[x > x[-1]].min()], axis=1, result_type='expand')
df
:
id a b c d value nxt_low nxt_high
0 1001 156 131 97 68 71 68 97
1 1002 224 203 165 122 180 165 203
2 1003 67 61 54 50 66 61 67
If id
is nearest low or high, modify code a bit.
df[['nxt_low', 'nxt_high']] = df.iloc[:, 1:].apply(lambda x: [x[x < x[-1]].max(), x[x > x[-1]].min()], axis=1, result_type='expand')
Here is a way:
cols = ['a','b','c','d']
df2 = df[cols].sub(df['value'],axis=0)
df = (df.assign(nxt_low = df.where(df2.lt(0)).max(axis=1),
nxt_high = df.where(df2.gt(0)).min(axis=1)))
Output:
id a b c d value nxt_low nxt_high
0 1001 156 131 97 68 71 68.0 97.0
1 1002 224 203 165 122 180 165.0 203.0
2 1003 67 61 54 50 66 61.0 67.0