Pandas Fillna of Multiple Columns with Mode of Each Column


Working with census data, I want to replace NaNs in two columns (“workclass” and “native-country”) with the respective modes of those two columns. I can get the modes easily:

mode = df.filter(["workclass", "native-country"]).mode()

which returns a dataframe:

  workclass native-country
0   Private  United-States


df.filter(["workclass", "native-country"]).fillna(mode)

does not replace the NaNs in each column with anything, let alone the mode corresponding to that column. Is there a smooth way to do this?

Asked By: Nick



You can do it like that:

df[["workclass", "native-country"]]=df[["workclass", "native-country"]].fillna(value=mode.iloc[0])

For example,

    import pandas as pd
    'key3': [1,4,4,4,5],
    'key2': [6,6,4],
    'key1': [6,4,4],


Then df is

  key3  key2    key1
0   1   6       6
1   4   6       4
2   4   4       4
3   4   NaN     NaN
4   5   NaN     NaN

Then by doing:

l=df.filter(["key1", "key2"]).mode()
df[["key1", "key2"]]=df[["key1", "key2"]].fillna(value=l.iloc[0])

we get that df is

  key3  key2    key1
0   1   6        6
1   4   6        4
2   4   4        4
3   4   6        4
4   5   6        4
Answered By: Miriam Farber

If you want to impute missing values with the mode in some columns a dataframe df, you can just fillna by Series created by select by position by iloc:

cols = ["workclass", "native-country"]



Your solution:



df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan],

print (df)
   col native-country workclass
0    2  United-States   Private
1    3            NaN   Private
2    7         Canada       NaN
3    8            NaN   another
4    9  United-States       NaN

mode = df.filter(["workclass", "native-country"]).mode()
print (mode)
  workclass native-country
0   Private  United-States

cols = ["workclass", "native-country"]
print (df)
   col native-country workclass
0    2  United-States   Private
1    3  United-States   Private
2    7         Canada   Private
3    8  United-States   another
4    9  United-States   Private
Answered By: jezrael

I think it’s cleanest to use a dict as the fillna parameter ‘value’


create a toy df from @miriam-farber’s response

import pandas as pd
    'key3': [1,4,4,4,5],
    'key2': [6,6,4],
    'key1': [6,4,4],


create a dict

mode_dict = d_df.loc[:,['key2','key1']].mode().to_dict('records')[0]

use this dict in fillna method

d_df.fillna(mode_dict, inplace=True)
Answered By: Krishna

This code impute mean to the int columns and mode to the object columns making a list of both types of columns and imputing the missing value according to the conditions.


for column in df:
    if df[column].isnull().any():
        if(column in cateogry_columns):
Answered By: bhavesh singh

You can also use the SimpleImputer to solve this problem as follows:

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='most_frequent', missing_values=np.nan)
df[["workclass", "native-country"]] = imputer.fit_transform(df[["workclass", "native-country"]])
Answered By: Hamzah
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.