Creating a dummy variable based on a string pattern in python

Question:

I have the following dataset. None is defined as a python missing value. The type is object (from dt.types)

import pandas as pd
import numpy as np

df = pd.DataFrame(columns=['triparty'])
df["triparty"] = ["AB65", "None", "GDW322", "DASED", "None"]

I want to create a dummy that takes the value 1 when triparty is None and 0 otherwise. I tried out several variations of

df["triparty"]=[0 if df["triparty"] == np.NaN else 1 for x in df["triparty"]]

df["triparty"]=[0 if df["triparty"] == "None" else 1 for x in df["triparty"]]

but it does not seem to work. I get the error message ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I solve the problem?

Asked By: Lisa

||

Answers:

You can do it with np.where

df["dummy"] = np.where(df["triparty"] == "None", 0, 1)
print(df)

Or create column of bool as int type.

df["dummy"] = (df["triparty"] != "None").astype(int)
# or
df["dummy"] = (~(df["triparty"] == "None")).astype(int)

Output

  triparty  dummy
0     AB65      1
1     None      0
2   GDW322      1
3    DASED      1
4     None      0
Answered By: Guy