Pandas new column from counts of column contents

Question:

A simple data frame that I want to add a column, to tell how many Teams that the Project has, according to a name dictionary.

enter image description here

The way I came up with seems working OK but doesn’t look very smart.

What is a better way to do so?

import pandas as pd
from io import StringIO

dict_name = {
"William":  "A",
"James":    "C",
"Ava":  "A",
"Elijah":   "A",
"Mason":    "B",
"Ethan":    "B",
"Noah": "B",
"Benjamin": "B",
"Lucas":    "B",
"Oliver":   "B",
"Olivia":   "C",
"Emma": "C"}

csvfile = StringIO(
"""
Project ID  Members
A58 Noah, Oliver
A34 William, Elijah, James, Benjamin
A157    Lucas, Mason, Ethan, Olivia
A49 Emma, Ava""")

df = pd.read_csv(csvfile, sep = 't', engine='python')

final_count_list = []
final_which_list = []

for names in df.Members.to_list():
    team_list = []
    for each in names.split(', '):
        team_list.append(dict_name[each])

    final_count_list.append(len(list(set(team_list))))
    final_which_list.append(list(set(team_list)))

df['How many teams?'] = final_count_list
df['Which teams?'] = final_which_list

print (df)

enter image description here

Asked By: Mark K

||

Answers:

Approach 1: (faster)

c = ['Which teams?', 'How many teams?']
df[c] = df['Members'].map(lambda x: (z:={dict_name[y] for y in x.split(', ')}, len(z))).tolist()

Approach 2: (looks better)

c = ['How many teams?', 'Which teams?']
df[c] = (
    df['Members']
    .str.split(', ')
    .explode()
    .map(dict_name)
    .groupby(level=0)
    .agg(['nunique', 'unique'])
)

Result

  Project ID                           Members  How many teams? Which teams?
0        A58                      Noah, Oliver                1          [B]
1        A34  William, Elijah, James, Benjamin                3    [A, C, B]
2       A157       Lucas, Mason, Ethan, Olivia                2       [B, C]
3        A49                         Emma, Ava                2       [C, A]
Answered By: Shubham Sharma
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.