Return the closest matched value to [reference] from [ABCD] columns
Question:
What is the cleanest way to return the closest matched value to [reference] from [ABCD] columns.
Output is the closest value. e.g. for the first row, absolute delta is [19 40 45 95] so the closest value to return is -21.
df1 = pd.DataFrame(np.random.randint(-100,300,size=(100, 4)), columns=list('ABCD')) # Generate Random Dataframe
df2 = pd.DataFrame(np.random.randint(-100,100,size=(100, 1)), columns=['reference'])
df = pd.concat([df1,df2], axis=1)
df['closest_value'] = "?"
df
Answers:
You can apply a lambda function on rows and get the closest value from the desired columns based on absolute difference from the reference
column
df['closest_value'] = (df
.apply(
lambda x: x.values[(np.abs(x[[i for i in x.index if i != 'reference']].values
- x['reference'])).argmin()]
, axis=1)
)
OUTPUT:
A B C D reference closest_value
0 -2 227 -88 268 -68 -88
1 185 182 18 279 -59 18
2 140 40 264 98 61 40
3 0 98 -32 81 47 81
4 -6 70 -6 -9 -53 -9
.. ... ... ... ... ... ...
95 -29 -34 141 166 -76 -34
96 14 22 175 205 69 22
97 265 11 -25 284 -88 -25
98 283 31 -91 252 11 31
99 6 -59 84 95 -15 6
[100 rows x 6 columns]
Try this :
idx = df.drop(['reference'], axis=1).sub(df.reference, axis=0).abs().idxmin(1)
df['closest_value'] = df.lookup(df.index, idx)
>>> display(df)
Edit:
Since pandas.DataFrame.lookup
will be (or is?) deprecated, you can :
Replace this line :
df.lookup(df.index, df['col'])
By these:
out = df.set_index(idx, append=True)
out['closest_value'] = df.stack()
The cleanest way:
Using a conversion to numpy.
data = df[list('ABCD')].to_numpy()
reference = df[['reference']].to_numpy()
indices = np.abs(data - reference).argmin(axis=1)
df['closest_value'] = data[np.arange(len(data)), indices]
Result:
A B C D reference closest_value
0 -60 254 80 -46 89 80
1 5 10 72 259 41 10
2 219 14 269 -70 0 14
3 171 36 132 45 -55 36
4 7 233 -65 231 -76 -65
.. ... ... ... ... ... ...
95 229 213 -54 129 62 129
96 16 -26 -30 79 94 79
97 105 157 -3 148 -48 -3
98 -27 60 218 273 62 60
99 140 131 -49 28 -46 -49
[100 rows x 6 columns]
What is the cleanest way to return the closest matched value to [reference] from [ABCD] columns.
Output is the closest value. e.g. for the first row, absolute delta is [19 40 45 95] so the closest value to return is -21.
df1 = pd.DataFrame(np.random.randint(-100,300,size=(100, 4)), columns=list('ABCD')) # Generate Random Dataframe
df2 = pd.DataFrame(np.random.randint(-100,100,size=(100, 1)), columns=['reference'])
df = pd.concat([df1,df2], axis=1)
df['closest_value'] = "?"
df
You can apply a lambda function on rows and get the closest value from the desired columns based on absolute difference from the reference
column
df['closest_value'] = (df
.apply(
lambda x: x.values[(np.abs(x[[i for i in x.index if i != 'reference']].values
- x['reference'])).argmin()]
, axis=1)
)
OUTPUT:
A B C D reference closest_value
0 -2 227 -88 268 -68 -88
1 185 182 18 279 -59 18
2 140 40 264 98 61 40
3 0 98 -32 81 47 81
4 -6 70 -6 -9 -53 -9
.. ... ... ... ... ... ...
95 -29 -34 141 166 -76 -34
96 14 22 175 205 69 22
97 265 11 -25 284 -88 -25
98 283 31 -91 252 11 31
99 6 -59 84 95 -15 6
[100 rows x 6 columns]
Try this :
idx = df.drop(['reference'], axis=1).sub(df.reference, axis=0).abs().idxmin(1)
df['closest_value'] = df.lookup(df.index, idx)
>>> display(df)
Edit:
Since pandas.DataFrame.lookup
will be (or is?) deprecated, you can :
Replace this line :
df.lookup(df.index, df['col'])
By these:
out = df.set_index(idx, append=True)
out['closest_value'] = df.stack()
The cleanest way:
Using a conversion to numpy.
data = df[list('ABCD')].to_numpy()
reference = df[['reference']].to_numpy()
indices = np.abs(data - reference).argmin(axis=1)
df['closest_value'] = data[np.arange(len(data)), indices]
Result:
A B C D reference closest_value
0 -60 254 80 -46 89 80
1 5 10 72 259 41 10
2 219 14 269 -70 0 14
3 171 36 132 45 -55 36
4 7 233 -65 231 -76 -65
.. ... ... ... ... ... ...
95 229 213 -54 129 62 129
96 16 -26 -30 79 94 79
97 105 157 -3 148 -48 -3
98 -27 60 218 273 62 60
99 140 131 -49 28 -46 -49
[100 rows x 6 columns]