Stack and explode columns in pandas

Question:

I have a dataframe to which I want to apply explode and stack at the same time. Explode the ‘Attendees’ column and assign the correct values to courses. For example, for Course 1 ‘intro to’ the number of attendees was 24 but for Course 2 ‘advanced’ the number of attendees was 46. In addition to that, I want all the course names in one column.

   import pandas as pd
import numpy as np
df = pd.DataFrame({'Session':['session1', 'session2','session3'],
                    'Course 1':['intro to','advanced','Cv'],
                    'Course 2':['Computer skill',np.nan,'Write cover letter'],
                    'Attendees':['24 & 46','23','30']})

If I apply the explode function to ‘Attendees’ I get the result

Course_df = Course_df.assign(Attendees=Course_df['Attendees'].str.split(' & ')).explode('Attendees')

    Session        Course 1 Course 2           Attendees
0   session1       intro to     Computer skill     24
0   session1       intro to     Computer skill     46
1   session2       advanced.    NaN                23

and when I apply the stack function

Course_df = (Course_df.set_index(['Session','Attendees']).stack().reset_index().rename({0:'Courses'}, axis = 1))

This is the result I get

  Session     level_1             Courses      Attendees
0  session1  Course 1            intro to        24
1  session1  Course 2      Computer skill        46
2  session2  Course 1            advanced        23
3  session3  Course 1                  Cv        30

Whereas the result I want is

   Session     level_1             Courses      Attendees
0  session1  Course 1            intro to        24
1  session1  Course 2      Computer skill        46
2  session2  Course 1            advanced        23
3  session3  Course 1                  Cv        30
4  session3  Course 2   Write cover letter        30
Asked By: hyeri

||

Answers:

Melted df first

melted_df = pd.melt(df, id_vars=['Session', 'Attendees'], value_vars=['Course 1', 'Course 2'])

and then explode

result = melted_df.assign(Attendees=melted_df['Attendees'].str.split(' & ')).explode('Attendees')
Answered By: kevin wu

A general Solution to your problem as far as I understood your problem would be to iterate over the attendees counts or the courses.
Here I loop over the attendees counts.
Therefore, I basically do the explode step manually and set all but the current/intended course to pd.NA.

With df = Course_df:

df = df.assign(Attendees = df["Attendees"].str.split(" & "))

dfn = df.iloc[:0]  # Create empty dataframe with same columns as df

for didx, d in df.iterrows():

    # Explode manually
    for ci, attend_count in enumerate(d.Attendees):
        dfr = d.to_frame().T
        dfr.Attendees = attend_count

        # Set other courses than "Course <ci+1>" to NaN
        other_courses = [x for x in d.index if x.startswith("Course ") and x != f'Course {ci + 1}']
        # other_courses = d.index.to_series().filter(regex = f'Course [^{ci+1}]').index  # Alternative
        for c in other_courses:
            dfr[c] = pd.NA

        dfn = pd.concat([dfn, dfr])

dfn.set_index(["Session", "Attendees"]).stack().reset_index().rename({0: "Courses"}, axis = 1)

This returns:

    Session Attendees   level_2         Courses
0  session1        24  Course 1        intro to
1  session1        46  Course 2  Computer skill
2  session2        23  Course 1        advanced
Answered By: Night Train
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.