Pandas new column from counts of column contents
Question:
A simple data frame that I want to add a column, to tell how many Teams that the Project has, according to a name dictionary.
The way I came up with seems working OK but doesn’t look very smart.
What is a better way to do so?
import pandas as pd
from io import StringIO
dict_name = {
"William": "A",
"James": "C",
"Ava": "A",
"Elijah": "A",
"Mason": "B",
"Ethan": "B",
"Noah": "B",
"Benjamin": "B",
"Lucas": "B",
"Oliver": "B",
"Olivia": "C",
"Emma": "C"}
csvfile = StringIO(
"""
Project ID Members
A58 Noah, Oliver
A34 William, Elijah, James, Benjamin
A157 Lucas, Mason, Ethan, Olivia
A49 Emma, Ava""")
df = pd.read_csv(csvfile, sep = 't', engine='python')
final_count_list = []
final_which_list = []
for names in df.Members.to_list():
team_list = []
for each in names.split(', '):
team_list.append(dict_name[each])
final_count_list.append(len(list(set(team_list))))
final_which_list.append(list(set(team_list)))
df['How many teams?'] = final_count_list
df['Which teams?'] = final_which_list
print (df)
Answers:
Approach 1: (faster)
c = ['Which teams?', 'How many teams?']
df[c] = df['Members'].map(lambda x: (z:={dict_name[y] for y in x.split(', ')}, len(z))).tolist()
Approach 2: (looks better)
c = ['How many teams?', 'Which teams?']
df[c] = (
df['Members']
.str.split(', ')
.explode()
.map(dict_name)
.groupby(level=0)
.agg(['nunique', 'unique'])
)
Result
Project ID Members How many teams? Which teams?
0 A58 Noah, Oliver 1 [B]
1 A34 William, Elijah, James, Benjamin 3 [A, C, B]
2 A157 Lucas, Mason, Ethan, Olivia 2 [B, C]
3 A49 Emma, Ava 2 [C, A]
A simple data frame that I want to add a column, to tell how many Teams that the Project has, according to a name dictionary.
The way I came up with seems working OK but doesn’t look very smart.
What is a better way to do so?
import pandas as pd
from io import StringIO
dict_name = {
"William": "A",
"James": "C",
"Ava": "A",
"Elijah": "A",
"Mason": "B",
"Ethan": "B",
"Noah": "B",
"Benjamin": "B",
"Lucas": "B",
"Oliver": "B",
"Olivia": "C",
"Emma": "C"}
csvfile = StringIO(
"""
Project ID Members
A58 Noah, Oliver
A34 William, Elijah, James, Benjamin
A157 Lucas, Mason, Ethan, Olivia
A49 Emma, Ava""")
df = pd.read_csv(csvfile, sep = 't', engine='python')
final_count_list = []
final_which_list = []
for names in df.Members.to_list():
team_list = []
for each in names.split(', '):
team_list.append(dict_name[each])
final_count_list.append(len(list(set(team_list))))
final_which_list.append(list(set(team_list)))
df['How many teams?'] = final_count_list
df['Which teams?'] = final_which_list
print (df)
Approach 1: (faster)
c = ['Which teams?', 'How many teams?']
df[c] = df['Members'].map(lambda x: (z:={dict_name[y] for y in x.split(', ')}, len(z))).tolist()
Approach 2: (looks better)
c = ['How many teams?', 'Which teams?']
df[c] = (
df['Members']
.str.split(', ')
.explode()
.map(dict_name)
.groupby(level=0)
.agg(['nunique', 'unique'])
)
Result
Project ID Members How many teams? Which teams?
0 A58 Noah, Oliver 1 [B]
1 A34 William, Elijah, James, Benjamin 3 [A, C, B]
2 A157 Lucas, Mason, Ethan, Olivia 2 [B, C]
3 A49 Emma, Ava 2 [C, A]