How to use dictionary on np.where clause in pandas

Question:

I have the following dataframe

import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,2,2,2],
                    'time': [1,2,3,1,2,3],
             'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
             'col_a': [1,2,3,4,5,6],
             'col_b': [-1,-2,-3,-4,-5,-6],
                'col_c': [10,20,30,40,50,60]})

id  time col_id  col_a  col_b  col_c
0   1     1    ffp      1     -1     10
1   1     2    ffp      2     -2     20
2   1     3    ffp      3     -3     30
3   2     1    hie      4     -4     40
4   2     2    hie      5     -5     50
5   2     3    ttt      6     -6     60

I would like to create a new col in foo, which will take the value of either col_a or col_b or col_c, depending on the value of col_id.

I am doing the following:

foo['col'] = np.where(foo.col_id == "ffp", foo.col_a, 
                      np.where(foo.col_id == "hie",foo.col_b, foo.col_c))

which gives

  id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

Since I have a lot of columns, I was wondering if there is a cleaner way to do that, with using a dictionary for example:

dict_cols_matching = {"ffp" : "col_a", "hie": "col_b", "ttt": "col_c"}

Any ideas ?

Asked By: quant

||

Answers:

You can map the values of the dictionary on col_id, then perform indexing lookup:

import numpy as np

idx, cols = pd.factorize(foo['col_id'].map(dict_cols_matching))

foo['col'] = foo.reindex(cols, axis=1).to_numpy()[np.arange(len(foo)), idx]

Output:

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60
Answered By: mozway

With np.select function to arrange condition list to choice list:

foo['col'] = np.select([foo.col_id.eq("ffp"), foo.col_id.eq("hie"), foo.col_id.eq("ttt")],
                       [foo.col_a, foo.col_b, foo.col_c])

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60
Answered By: RomanPerekhrest

You can use lambda function to select the column based on your id, but the method depends on the order of the columns, adjust the parameter 3 if you change the order.

import pandas as pd
import numpy as np

foo = pd.DataFrame({'id': [1,1,1,2,2,2],
                    'time': [1,2,3,1,2,3],
                    'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
                    'col_a': [1,2,3,4,5,6],
                    'col_b': [-1,-2,-3,-4,-5,-6],
                    'col_c': [10,20,30,40,50,60]})

idSet = np.unique(foo['col_id'].to_numpy()).tolist()
foo['col'] = foo.apply(lambda x: x[idSet.index(x.col_id)+3], axis=1)
display(foo)

Output

    id  time    col_id  col_a   col_b   col_c   col
0   1   1   ffp 1   -1  10  1
1   1   2   ffp 2   -2  20  2
2   1   3   ffp 3   -3  30  3
3   2   1   hie 4   -4  40  -4
4   2   2   hie 5   -5  50  -5
5   2   3   ttt 6   -6  60  60
Answered By: Slybot

You might use a reset_index in combination with a rowwise apply:

foo[["col_id"]].reset_index().apply(lambda u: foo.loc[u["index"],dict_cols_matching[u["col_id"]]], axis=1)
Answered By: P.Jo
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.