Pandas Replace NaN with blank/empty string
Question:
I have a Pandas Dataframe as shown below:
1 2 3
0 a NaN read
1 b l unread
2 c NaN read
I want to remove the NaN values with an empty string so that it looks like so:
1 2 3
0 a "" read
1 b l unread
2 c "" read
Answers:
import numpy as np
df1 = df.replace(np.nan, '', regex=True)
This might help. It will replace all NaNs with an empty string.
df = df.fillna('')
This will fill na’s (e.g. NaN’s) with ''
.
inplace
is possible but should be avoided as it will be deprecated:
df.fillna('', inplace=True)
To fill only a single column:
df.column1 = df.column1.fillna('')
One can use df['column1']
instead of df.column1
.
If you are reading the dataframe from a file (say CSV or Excel) then use :
df.read_csv(path , na_filter=False)
df.read_excel(path , na_filter=False)
This will automatically consider the empty fields as empty strings ''
If you already have the dataframe
df = df.replace(np.nan, '', regex=True)
df = df.fillna('')
If you are converting DataFrame to JSON, NaN
will give error so best solution is in this use case is to replace NaN
with None
.
Here is how:
df1 = df.where((pd.notnull(df)), None)
Use a formatter, if you only want to format it so that it renders nicely when printed. Just use the df.to_string(... formatters
to define custom string-formatting, without needlessly modifying your DataFrame or wasting memory:
df = pd.DataFrame({
'A': ['a', 'b', 'c'],
'B': [np.nan, 1, np.nan],
'C': ['read', 'unread', 'read']})
print df.to_string(
formatters={'B': lambda x: '' if pd.isnull(x) else '{:.0f}'.format(x)})
To get:
A B C
0 a read
1 b 1 unread
2 c read
using keep_default_na=False
should help you:
df = pd.read_csv(filename, keep_default_na=False)
I tried with one column of string values with nan.
To remove the nan and fill the empty string:
df.columnname.replace(np.nan,'',regex = True)
To remove the nan and fill some values:
df.columnname.replace(np.nan,'value',regex = True)
I tried df.iloc also. but it needs the index of the column. so you need to look into the table again. simply the above method reduced one step.
Try this,
add inplace=True
import numpy as np
df.replace(np.NaN, '', inplace=True)
I have a Pandas Dataframe as shown below:
1 2 3
0 a NaN read
1 b l unread
2 c NaN read
I want to remove the NaN values with an empty string so that it looks like so:
1 2 3
0 a "" read
1 b l unread
2 c "" read
import numpy as np
df1 = df.replace(np.nan, '', regex=True)
This might help. It will replace all NaNs with an empty string.
df = df.fillna('')
This will fill na’s (e.g. NaN’s) with ''
.
inplace
is possible but should be avoided as it will be deprecated:
df.fillna('', inplace=True)
To fill only a single column:
df.column1 = df.column1.fillna('')
One can use df['column1']
instead of df.column1
.
If you are reading the dataframe from a file (say CSV or Excel) then use :
df.read_csv(path , na_filter=False)
df.read_excel(path , na_filter=False)
This will automatically consider the empty fields as empty strings ''
If you already have the dataframe
df = df.replace(np.nan, '', regex=True)
df = df.fillna('')
If you are converting DataFrame to JSON, NaN
will give error so best solution is in this use case is to replace NaN
with None
.
Here is how:
df1 = df.where((pd.notnull(df)), None)
Use a formatter, if you only want to format it so that it renders nicely when printed. Just use the df.to_string(... formatters
to define custom string-formatting, without needlessly modifying your DataFrame or wasting memory:
df = pd.DataFrame({
'A': ['a', 'b', 'c'],
'B': [np.nan, 1, np.nan],
'C': ['read', 'unread', 'read']})
print df.to_string(
formatters={'B': lambda x: '' if pd.isnull(x) else '{:.0f}'.format(x)})
To get:
A B C
0 a read
1 b 1 unread
2 c read
using keep_default_na=False
should help you:
df = pd.read_csv(filename, keep_default_na=False)
I tried with one column of string values with nan.
To remove the nan and fill the empty string:
df.columnname.replace(np.nan,'',regex = True)
To remove the nan and fill some values:
df.columnname.replace(np.nan,'value',regex = True)
I tried df.iloc also. but it needs the index of the column. so you need to look into the table again. simply the above method reduced one step.
Try this,
add inplace=True
import numpy as np
df.replace(np.NaN, '', inplace=True)