Convert Pandas series containing string to boolean

Question:

I have a DataFrame named df as

  Order Number       Status
1         1668  Undelivered
2        19771  Undelivered
3    100032108  Undelivered
4         2229    Delivered
5        00056  Undelivered

I would like to convert the Status column to boolean (True when Status is Delivered and False when Status is Undelivered)
but if Status is neither ‘Undelivered’ neither ‘Delivered’ it should be considered as NotANumber or something like that.

I would like to use a dict

d = {
  'Delivered': True,
  'Undelivered': False
}

so I could easily add other string which could be either considered as True or False.

Asked By: working4coins

||

Answers:

You’ve got everything you need. You’ll be happy to discover replace:

df.replace(d)
Answered By: Dan Allan

You can just use map:

In [7]: df = pd.DataFrame({'Status':['Delivered', 'Delivered', 'Undelivered',
                                     'SomethingElse']})

In [8]: df
Out[8]:
          Status
0      Delivered
1      Delivered
2    Undelivered
3  SomethingElse

In [9]: d = {'Delivered': True, 'Undelivered': False}

In [10]: df['Status'].map(d)
Out[10]:
0     True
1     True
2    False
3      NaN
Name: Status, dtype: object
Answered By: joris

An example of replace method to replace values only in the specified column C2 and get result as DataFrame type.

import pandas as pd
df = pd.DataFrame({'C1':['X', 'Y', 'X', 'Y'], 'C2':['Y', 'Y', 'X', 'X']})

  C1 C2
0  X  Y
1  Y  Y
2  X  X
3  Y  X

df.replace({'C2': {'X': True, 'Y': False}})

  C1     C2
0  X  False
1  Y  False
2  X   True
3  Y   True
Answered By: Kappa Leonis

Expanding on the previous answers:

Map method explained:

  • Pandas will lookup each row’s value in the corresponding d dictionary, replacing any found keys with values from d.
  • Values without keys in d will be set as NaN. This can be corrected with fillna() methods.
  • Does not work on multiple columns, since pandas operates through serialization of pd.Series here.
  • Documentation: pd.Series.map
d = {'Delivered': True, 'Undelivered': False}
df["Status"].map(d)

Replace method explained:

  • Pandas will lookup each row’s value in the corresponding d dictionary, and attempt to replace any found keys with values from d.
  • Values without keys in d will be be retained.
  • Works with single and multiple columns (pd.Series or pd.DataFrame objects).
  • Documentation: pd.DataFrame.replace
d = {'Delivered': True, 'Undelivered': False}
df["Status"].replace(d)

Overall, the replace method is more robust and allows finer control over how data is mapped + how to handle missing or nan values.

Answered By: Yaakov Bressler