Search values in dataframe by condition on another column
Question:
I need to get the value in column A for the closest value in column B for each multiple of ‘trigger’
for instance, in the dataframe below :
import random
trigger = 100
info2 = {'A': [0]*100,'B': [0]*100}
dfA = pd.DataFrame(info2)
for i in range(1, len(dfA)):
dfA.loc[i,'B'] = i*3.78
dfA.loc[i,'A'] = i*10
dfA
Since the closest value to trigger1 would be 98.28 from row n°26
The closest value to trigger2 would be 200.34 from row n°53
The closest value to trigger*3 would be 298.62 from row n°79
The expected result would be :
result = [260,530,790]
Answers:
This could do
import numpy as np
triggers = {'100': 100, '200': 200, '300': 300}
for k, v in triggers.items():
dfA['delta_val'] = np.abs(dfA['B'] - v)
triggers[k] = dfA[dfA.delta_val == dfA.delta_val.min()]['A'].values[0]
print(triggers)
# {'100': 260, '200': 530, '300': 790}
Another approach is :
import pandas as pd
import numpy as np
trigger = 100
info2 = {'A': [0]*100,'B': [0]*100}
dfA = pd.DataFrame(info2)
for i in range(1, len(dfA)):
dfA.loc[i,'B'] = i*3.78
dfA.loc[i,'A'] = i*10
result = []
for t in np.arange(trigger, trigger*4, trigger):
idx = (np.abs(dfA['B'] - t)).idxmin()
result.append(dfA.loc[idx, 'A'])
print(result)
which gives what you expected.
difference = abs(dfA - target)
min_index = difference.sum(axis=1).idxmin()
result = dfA.loc[min_index, :]
print(result)
Use merge_asof
:
pd.merge_asof(pd.Series(np.arange(trigger, dfA['B'].max(), trigger), name='B'),
dfA, on='B', direction='nearest')
NB. dfA must first be sorted on B.
Output:
B A
0 100.0 260
1 200.0 530
2 300.0 790
If you also want the value of B:
pd.merge_asof(pd.Series(np.arange(trigger, dfA['B'].max(), trigger), name='trigger'),
dfA, left_on='trigger', right_on='B', direction='nearest')
Output:
trigger A B
0 100.0 260 98.28
1 200.0 530 200.34
2 300.0 790 298.62
I need to get the value in column A for the closest value in column B for each multiple of ‘trigger’
for instance, in the dataframe below :
import random
trigger = 100
info2 = {'A': [0]*100,'B': [0]*100}
dfA = pd.DataFrame(info2)
for i in range(1, len(dfA)):
dfA.loc[i,'B'] = i*3.78
dfA.loc[i,'A'] = i*10
dfA
Since the closest value to trigger1 would be 98.28 from row n°26
The closest value to trigger2 would be 200.34 from row n°53
The closest value to trigger*3 would be 298.62 from row n°79
The expected result would be :
result = [260,530,790]
This could do
import numpy as np
triggers = {'100': 100, '200': 200, '300': 300}
for k, v in triggers.items():
dfA['delta_val'] = np.abs(dfA['B'] - v)
triggers[k] = dfA[dfA.delta_val == dfA.delta_val.min()]['A'].values[0]
print(triggers)
# {'100': 260, '200': 530, '300': 790}
Another approach is :
import pandas as pd
import numpy as np
trigger = 100
info2 = {'A': [0]*100,'B': [0]*100}
dfA = pd.DataFrame(info2)
for i in range(1, len(dfA)):
dfA.loc[i,'B'] = i*3.78
dfA.loc[i,'A'] = i*10
result = []
for t in np.arange(trigger, trigger*4, trigger):
idx = (np.abs(dfA['B'] - t)).idxmin()
result.append(dfA.loc[idx, 'A'])
print(result)
which gives what you expected.
difference = abs(dfA - target)
min_index = difference.sum(axis=1).idxmin()
result = dfA.loc[min_index, :]
print(result)
Use merge_asof
:
pd.merge_asof(pd.Series(np.arange(trigger, dfA['B'].max(), trigger), name='B'),
dfA, on='B', direction='nearest')
NB. dfA must first be sorted on B.
Output:
B A
0 100.0 260
1 200.0 530
2 300.0 790
If you also want the value of B:
pd.merge_asof(pd.Series(np.arange(trigger, dfA['B'].max(), trigger), name='trigger'),
dfA, left_on='trigger', right_on='B', direction='nearest')
Output:
trigger A B
0 100.0 260 98.28
1 200.0 530 200.34
2 300.0 790 298.62