Fillna in multiple columns in place in Python Pandas

Question:

I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by ‘.’, and the NAN values in float columns by 0.

Consider this small fictitious example:

df = pd.DataFrame({'Name':['Jack','Sue',pd.np.nan,'Bob','Alice','John'],
    'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
    'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
    'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})

Now, I can do it in 3 lines:

df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)

Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?

Asked By: ozzy

||

Answers:

You can either list the string columns by hand or glean them from df.dtypes. Once you have the list of string/object columns, you can call fillna on all those columns at once.

# str_cols = ['Name','City']
str_cols = df.columns[df.dtypes==object]
df[str_cols] = df[str_cols].fillna('.')
df = df.fillna(0)
Answered By: Bob Baxley

define a function:

def myfillna(series):
    if series.dtype is pd.np.dtype(float):
        return series.fillna(0)
    elif series.dtype is pd.np.dtype(object):
        return series.fillna('.')
    else:
        return series

you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe

df = df.apply(myfillna)

this is the same as ‘inplace’

Answered By: latorrefabian

You could use apply for your columns with checking dtype whether it’s numeric or not by checking dtype.kind:

res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))

print(res)
     A      B     City   Name
0  1.0   0.25  Seattle   Jack
1  2.1   0.00       SF    Sue
2  0.0   0.00       LA      .
3  4.7   4.00       OC    Bob
4  5.6  12.20        .  Alice
5  6.8  14.40        .   John
Answered By: Anton Protopopov

Came across this page while looking for an answer to this problem, but didn’t like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I’d contribute for anyone else that happens upon this.

If you have multiple columns, but only want to replace the NaN in a subset of them, you can use:

df.fillna({'Name':'.', 'City':'.'}, inplace=True)

This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN values, you can just throw another fillna on the end:

df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)

Edit (22 Apr 2021)

Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 inplace fillna() operations. You can still chain, but now must assign that chain to the df instead of modifying in place, e.g. like so:

df = df.fillna({'Name':'.', 'City':'.'}).fillna(0)
Answered By: Rob Bulmahn

There is a simpler way, that can be done in one line:

df.fillna({'Name':0,'City':0},inplace=True)

Not an awesome improvement but if you multiply it by 100, writting only the column names + ‘:0’ is way faster than copying and pasting everything 100 times.

Answered By: Vinicius Raphael

Much easy way is :dt.replace(pd.np.nan, "NA").
In case you want other replacement, you should use the next:dt.replace("pattern", "replaced by (new pattern)")

Answered By: A. chahid

If you want to replace a list of columns ("lst") with the same value ("v")

def nan_to_zero(df, lst, v):
    d = {x:v for x in lst}
    df.fillna(d, inplace=True)
    return df
Answered By: Tom

If you don’t want to specify individual per-column replacement values, you can do it this way:

df[['Name', 'City']].fillna('.',inplace=True)

If you don’t like inplace (like me) you can do it like this:

columns = ['Name', 'City']
df[columns] = df.copy()[columns].fillna('.')

The .copy() is added to avoid the SettingWithCopyWarning, which is designed to warn you that the original values of a dataframe is overwritten, which is what we want.

If you don’t like that syntax, you can see this question to see other ways of dealing with this: How to deal with SettingWithCopyWarning in Pandas

Answered By: Devyzr

The most concise and readable way to accomplish this, especially with many columns is to use df.select_dtypes.columns.
(df.select_dtypes, df.columns)

df.select_dtypes returns a new df containing only the columns that match the dtype you need.

df.columns returns a list of the column names in your df.

Full code:

float_column_names = df.select_dtypes(float).columns
df[float_column_names] = df[float_column_names].fillna(0)

string_column_names = df.select_dtypes(object).columns
df[string_column_names] df[string_column_names].fillna('.')
Answered By: Berel Levy
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.