Pandas – select column using other column value as column name

Question

I have a dataframe that contains a column, let’s call it "names". "names" has the name of other columns. I would like to add a new column that would have for each row the value based on the column name contained on that "names" column.

Example:

Input dataframe:
pd.DataFrame.from_dict({"a": [1, 2, 3,4], "b": [-1,-2,-3,-4], "names":['a','b','a','b']})

  a  |  b  | names |
 --- | --- | ----  |
  1  |  -1 | 'a'   |
  2  |  -2 | 'b'   |
  3  |  -3 | 'a'   |
  4  |  -4 | 'b'   |

Output dataframe:
pd.DataFrame.from_dict({"a": [1, 2, 3,4], "b": [-1,-2,-3,-4], "names":['a','b','a','b'], "new_col":[1,-2,3,-4]})

  a  |  b  | names | new_col | 
 --- | --- | ----  | ------  |
  1  |  -1 | 'a'   |    1    |
  2  |  -2 | 'b'   |   -2    |
  3  |  -3 | 'a'   |    3    |
  4  |  -4 | 'b'   |   -4    |

Asked By: ab3

||

Source

Answer 1

You can use lookup:

df['new_col'] = df.lookup(df.index, df.names)
df
#   a    b  names   new_col
#0  1   -1      a   1
#1  2   -2      b   -2
#2  3   -3      a   3
#3  4   -4      b   -4

EDIT

lookup has been deprecated, here’s the currently recommended solution:

idx, cols = pd.factorize(df['names'])
df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

Answered By: Psidom

Answer 2

Because DataFrame.lookup is deprecated as of Pandas 1.2.0, the following is what I came up with using DataFrame.melt:

df['new_col'] = df.melt(id_vars='names', value_vars=['a', 'b'], ignore_index=False).query('names == variable').loc[df.index, 'value']

Output:

>>> df
   a  b names  new_col
0  1 -1     a        1
1  2 -2     b       -2
2  3 -3     a        3
3  4 -4     b       -4

Can this be simplified? For correctness, the index must not be ignored.

Additional reference:

Looking up values by index/column labels (archive)

Answered By: Asclepius

Answer 3

Solution using pd.factorize (from a pandas issue):

idx, cols = pd.factorize(df['names'])
df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

Answered By: ignoring_gravity

Answer 4

With the straightforward and easy solution (lookup) deprecated, another alternative to the pandas-based ones proposed here is to convert df into a numpy array and use numpy indexing:

df['new_col'] = df.values[df.index.get_indexer(df['names'].index), df.columns.get_indexer(df['names'])]

Let me explain what this does. df.values is a numpy array based on the DataFrame. As numpy arrays have to be indexed numerically, we need to use the get_indexer function to convert the pandas row and column index names to index numbers that can be used with numpy:

>>> df.index.get_indexer(df['names'].index)
array([0, 1, 2, 3], dtype=int64)

>>> df.columns.get_indexer(df['names'])
array([0, 1, 0, 1], dtype=int64)

(In this case, where the row index is already numerical, you could get away with simply using df.index as the first argument inside the bracket, but this does not work generally.)

Answered By: Erlend Magnus Viggen

Answer 5

Here’s a short solution using df.melt and df.merge:

df.merge(df.melt(var_name='names', ignore_index=False), on=[None, 'names'])

Outputs:

   key_0  a  b names value
0      0  1 -1     a     1
1      1  2 -2     b    -2
2      2  3 -3     a     3
3      3  4 -4     b    -4

There’s a redundant key_0 column which you need to drop with df.drop.

Answered By: Leif Metcalf

Pandas – select column using other column value as column name

Question:

Answers:

EDIT