Dataframe new columns to tell if the row contains column's header text
Question:
2 columns dataframe as the first screenshot. I want to add new columns (by the contents in the Note column from the original dataframe) to tell if the Note column contains the new column’s header text.
Example as the second screenshot.
Some lines work for a few columns. When there are a lot of new columns, it’s not efficient.
What is a good way to do so?
import pandas as pd
from io import StringIO
csvfile = StringIO(
'''NametNote
MiketBright, Kind
LilytFriendly
KatetConsiderate, energetic
JohntReliable, friendly
AletBright''')
df = pd.read_csv(csvfile, sep = 't', engine='python')
col_list = df['Note'].tolist()
n_list = []
for c in col_list:
for _ in c.split(','):
n_list.append(_)
df = df.assign(**dict.fromkeys(n_list, ''))
df["Bright"][df['Note'].str.contains("Bright")] = "Yes"
Answers:
You can try .str.get_dummies
then replace 1
with Yes
df = df.join(df['Note'].str.get_dummies(', ').replace({1: 'Yes', 0: ''}))
print(df)
Name Note Bright Considerate Friendly Kind Reliable energetic friendly
0 Mike Bright, Kind Yes Yes
1 Lily Friendly Yes
2 Kate Considerate, energetic Yes Yes
3 John Reliable, friendly Yes Yes
4 Ale Bright Yes
2 columns dataframe as the first screenshot. I want to add new columns (by the contents in the Note column from the original dataframe) to tell if the Note column contains the new column’s header text.
Example as the second screenshot.
Some lines work for a few columns. When there are a lot of new columns, it’s not efficient.
What is a good way to do so?
import pandas as pd
from io import StringIO
csvfile = StringIO(
'''NametNote
MiketBright, Kind
LilytFriendly
KatetConsiderate, energetic
JohntReliable, friendly
AletBright''')
df = pd.read_csv(csvfile, sep = 't', engine='python')
col_list = df['Note'].tolist()
n_list = []
for c in col_list:
for _ in c.split(','):
n_list.append(_)
df = df.assign(**dict.fromkeys(n_list, ''))
df["Bright"][df['Note'].str.contains("Bright")] = "Yes"
You can try .str.get_dummies
then replace 1
with Yes
df = df.join(df['Note'].str.get_dummies(', ').replace({1: 'Yes', 0: ''}))
print(df)
Name Note Bright Considerate Friendly Kind Reliable energetic friendly
0 Mike Bright, Kind Yes Yes
1 Lily Friendly Yes
2 Kate Considerate, energetic Yes Yes
3 John Reliable, friendly Yes Yes
4 Ale Bright Yes