Return the closest matched value to [reference] from [ABCD] columns

Question:

What is the cleanest way to return the closest matched value to [reference] from [ABCD] columns.

Output is the closest value. e.g. for the first row, absolute delta is [19 40 45 95] so the closest value to return is -21.

df1 = pd.DataFrame(np.random.randint(-100,300,size=(100, 4)), columns=list('ABCD')) # Generate Random Dataframe
df2 = pd.DataFrame(np.random.randint(-100,100,size=(100, 1)), columns=['reference'])
df = pd.concat([df1,df2], axis=1)
df['closest_value'] = "?"
df

enter image description here

Asked By: JoshZ

||

Answers:

You can apply a lambda function on rows and get the closest value from the desired columns based on absolute difference from the reference column

df['closest_value'] = (df
.apply(
    lambda x: x.values[(np.abs(x[[i for i in x.index if i != 'reference']].values
                               - x['reference'])).argmin()]
    , axis=1)
)
      

OUTPUT:

      A    B    C    D  reference  closest_value
0    -2  227  -88  268        -68            -88
1   185  182   18  279        -59             18
2   140   40  264   98         61             40
3     0   98  -32   81         47             81
4    -6   70   -6   -9        -53             -9
..  ...  ...  ...  ...        ...            ...
95  -29  -34  141  166        -76            -34
96   14   22  175  205         69             22
97  265   11  -25  284        -88            -25
98  283   31  -91  252         11             31
99    6  -59   84   95        -15              6
[100 rows x 6 columns]
Answered By: ThePyGuy

Try this :

idx = df.drop(['reference'], axis=1).sub(df.reference, axis=0).abs().idxmin(1)
df['closest_value'] = df.lookup(df.index, idx)
>>> display(df)

enter image description here

Edit:

Since pandas.DataFrame.lookup will be (or is?) deprecated, you can :
Replace this line :

df.lookup(df.index, df['col'])

By these:

out = df.set_index(idx, append=True)
out['closest_value'] = df.stack()
Answered By: L'Artiste

The cleanest way:

Using a conversion to numpy.

data = df[list('ABCD')].to_numpy()
reference = df[['reference']].to_numpy()

indices = np.abs(data - reference).argmin(axis=1)
df['closest_value'] = data[np.arange(len(data)), indices]

Result:

      A    B    C    D  reference  closest_value
0   -60  254   80  -46         89             80
1     5   10   72  259         41             10
2   219   14  269  -70          0             14
3   171   36  132   45        -55             36
4     7  233  -65  231        -76            -65
..  ...  ...  ...  ...        ...            ...
95  229  213  -54  129         62            129
96   16  -26  -30   79         94             79
97  105  157   -3  148        -48             -3
98  -27   60  218  273         62             60
99  140  131  -49   28        -46            -49

[100 rows x 6 columns]
Answered By: Vladimir Fokow
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.