List of tuples to DataFrame w. column for elements, column for tuple length

Question:

I have a list of tuples of different lenghts, where the tuples can be thought to encode teams of people, such as:

data = [('Alice',),
        ('Bob', 'Betty'),
        ('Charlie', 'Cindy', 'Cramer')]

From this, I would like to create a DataFrame with a column of team member names, and a column with the size of the team they were on:

   name     teamsize
0  Alice    1
1  Bob      2
2  Betty    2
3  Charlie  3
4  Cindy    3
5  Cramer   3

I have tried my hand at some double for loops, but I couldn’t not get things to work out, and have the impression that it is not a very good way to go about it. Any tips would be appreciated.

Asked By: Rasmus

||

Answers:

Use a list comprehension and the DataFrame constructor:

out = pd.DataFrame([[name, len(l)] for l in data for name in l],
                   columns=['name', 'teamsize'])

Output:

      name  teamsize
0    Alice         1
1      Bob         2
2    Betty         2
3  Charlie         3
4    Cindy         3
5   Cramer         3

For fun here is a pure pandas solution (but likely less efficient!):

out = (pd.DataFrame({'name': data})
         .assign(teamsize=lambda d: d['name'].str.len())
         .explode('name', ignore_index=True)
      )
Answered By: mozway

you can use:

name = []
teamsize = []
for i in data:
    for  n in i:
        name.append(n)
        teamsize.append(len(i))

df = pd.DataFrame(list(zip(name, teamsize)),
                  columns =['name', 'teamsize'])

Another Pandas solution:

df = (pd.DataFrame(data).T.melt(value_name='name').dropna()
        .assign(teamsize=lambda x: x.groupby(x.pop('variable')).transform('count'))
print(df)

# Output
      name  teamsize
0    Alice         1
3      Bob         2
4    Betty         2
6  Charlie         3
7    Cindy         3
8   Cramer         3
Answered By: Corralien