Convert Pandas series containing string to boolean
Question:
I have a DataFrame named df
as
Order Number Status
1 1668 Undelivered
2 19771 Undelivered
3 100032108 Undelivered
4 2229 Delivered
5 00056 Undelivered
I would like to convert the Status
column to boolean (True
when Status is Delivered and False
when Status is Undelivered)
but if Status is neither ‘Undelivered’ neither ‘Delivered’ it should be considered as NotANumber
or something like that.
I would like to use a dict
d = {
'Delivered': True,
'Undelivered': False
}
so I could easily add other string which could be either considered as True
or False
.
Answers:
You’ve got everything you need. You’ll be happy to discover replace
:
df.replace(d)
You can just use map
:
In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
'SomethingElse']})
In [8]: df
Out[8]:
Status
0 Delivered
1 Delivered
2 Undelivered
3 SomethingElse
In [9]: d = {'Delivered': True, 'Undelivered': False}
In [10]: df['Status'].map(d)
Out[10]:
0 True
1 True
2 False
3 NaN
Name: Status, dtype: object
An example of replace
method to replace values only in the specified column C2
and get result as DataFrame
type.
import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})
C1 C2
0 X Y
1 Y Y
2 X X
3 Y X
df.replace({'C2': {'X': True, 'Y': False}})
C1 C2
0 X False
1 Y False
2 X True
3 Y True
Expanding on the previous answers:
Map method explained:
- Pandas will lookup each row’s value in the corresponding
d
dictionary, replacing any found keys with values from d
.
- Values without keys in
d
will be set as NaN
. This can be corrected with fillna()
methods.
- Does not work on multiple columns, since pandas operates through serialization of
pd.Series
here.
- Documentation: pd.Series.map
d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)
Replace method explained:
- Pandas will lookup each row’s value in the corresponding
d
dictionary, and attempt to replace any found keys with values from d
.
- Values without keys in
d
will be be retained.
- Works with single and multiple columns (
pd.Series
or pd.DataFrame
objects).
- Documentation: pd.DataFrame.replace
d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)
Overall, the replace method is more robust and allows finer control over how data is mapped + how to handle missing or nan values.
I have a DataFrame named df
as
Order Number Status
1 1668 Undelivered
2 19771 Undelivered
3 100032108 Undelivered
4 2229 Delivered
5 00056 Undelivered
I would like to convert the Status
column to boolean (True
when Status is Delivered and False
when Status is Undelivered)
but if Status is neither ‘Undelivered’ neither ‘Delivered’ it should be considered as NotANumber
or something like that.
I would like to use a dict
d = {
'Delivered': True,
'Undelivered': False
}
so I could easily add other string which could be either considered as True
or False
.
You’ve got everything you need. You’ll be happy to discover replace
:
df.replace(d)
You can just use map
:
In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
'SomethingElse']})
In [8]: df
Out[8]:
Status
0 Delivered
1 Delivered
2 Undelivered
3 SomethingElse
In [9]: d = {'Delivered': True, 'Undelivered': False}
In [10]: df['Status'].map(d)
Out[10]:
0 True
1 True
2 False
3 NaN
Name: Status, dtype: object
An example of replace
method to replace values only in the specified column C2
and get result as DataFrame
type.
import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})
C1 C2
0 X Y
1 Y Y
2 X X
3 Y X
df.replace({'C2': {'X': True, 'Y': False}})
C1 C2
0 X False
1 Y False
2 X True
3 Y True
Expanding on the previous answers:
Map method explained:
- Pandas will lookup each row’s value in the corresponding
d
dictionary, replacing any found keys with values fromd
. - Values without keys in
d
will be set asNaN
. This can be corrected withfillna()
methods. - Does not work on multiple columns, since pandas operates through serialization of
pd.Series
here. - Documentation: pd.Series.map
d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)
Replace method explained:
- Pandas will lookup each row’s value in the corresponding
d
dictionary, and attempt to replace any found keys with values fromd
. - Values without keys in
d
will be be retained. - Works with single and multiple columns (
pd.Series
orpd.DataFrame
objects). - Documentation: pd.DataFrame.replace
d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)
Overall, the replace method is more robust and allows finer control over how data is mapped + how to handle missing or nan values.