Pandas dataframe replace al non-nan values by a value of specific column


I would like to transform a dataframe such that all values that are not nan are replaced with the corresponding value of the column ‘id’.


df = pd.DataFrame({'id': ['X', 'Y', 'Z'],
                   'A': [1, np.nan,0],
                   'B': [0, 0, np.nan],
                   'C': [np.nan, 1, 1]})


df = pd.DataFrame({'id': ['X', 'Y', 'Z'],
                   'A': ['X', np.nan,'Z'],
                   'B': ['X', 'Y', np.nan],
                   'C': [np.nan, 'Y', 'Z']})

Doing it with looping over column and row indices would probably take very long on large dataframes, so I would prefer a solution using the pandas functions.

Asked By: Bela9



You can use a mask and multiplication of the boolean mask as string:

m = df.notna()

out = m.mul(df['id'], axis=0).where(m)

Or with :

import numpy as np

m = df.notna()
out = pd.DataFrame(np.where(m, np.repeat(df['id'].to_numpy()[:,None],
                                         df.shape[1], axis=1),
                   index=df.index, columns=df.columns)

Another idea with reindexing:

out = df[['id']].reindex(columns=df.columns).ffill(axis=1).where(df.notna())


  id    A    B    C
0  X    X    X  NaN
1  Y  NaN    Y    Y
2  Z    Z  NaN    Z
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.