Reformat Excel data-frame

Question

I managed to get my python script working to scrape data from a website using Playwright.
The website data is in a format not usable to us at the moment. Here is an example of the initial extract:

Name	Group 1	Group 2	Group 3	Group 4	Group 5
Joe Black	A			U
Joe Blue		A	A
Joe Green	U				A
Joe Red		A		U

The A in the table above means the users are admins of the group. I need to get the data above into a table that has the groups in the 1st row and in the 2nd row if they are admins of the group have their names listed. So basically I need to get it to this:

Groups	Admins
Group 1	Joe Blue,Joe Red
Group 2	Joe Red
Group 3	Joe Blue
Group 4	Joe Blue
Group 5	Joe Green

I am trying to use Pandas but completely lost on how to get the format correct. Just need some advice or a reference to a similar problem I can work off?

Asked By: Dinerz

||

Source

Answer 1

You can reshape with melt, then dropna and groupby.agg:

out = (df.melt('Name', var_name='Group').dropna(subset='value')
         .groupby('Group')['Name'].agg(', '.join).reset_index(name='Admins')
       )

Variant with a stack:

(df.set_index('Name').rename_axis(index='Admins', columns='Group')
   .stack().reset_index()
   .groupby('Group', as_index=False)['Admins'].agg(', '.join)
)

Output:

     Group             Admins
0  Group 1          Joe Black
1  Group 2  Joe Blue, Joe Red
2  Group 3           Joe Blue
3  Group 5          Joe Green

Answered By: mozway

Answer 2

If you unstack it, then you get a Series with a MultiIndex. You can then use a groupby and join the names corresponding to "A"-values:

def getAdmins(x):
    sel = x[x == "A"]
    return ",".join(sel.index.get_level_values(1)) if sel.any() else np.nan

df_new = df.unstack().groupby(level=0).agg(getAdmins)

Answered By: P.Jo

Answer 3

Should you need to be robust against empty string/NAs:

df = pd.DataFrame({
    'Name': ['Joe Red', 'Joe Blue', 'Joe Green'],
    'Group 1': ['A', pd.NA, ''],
    'Group 2': ['', 'A', 'A'],
    'Group 3': ['', np.nan, 'A'],
})

df_t = df.set_index('Name').T.replace({
    'A': True,
    'U': False,
    '': False,
    pd.NA: False,
    np.nan: False,
})

df_t.apply(
    lambda x: df_t.columns[x].str.cat(sep=','), axis=1
).reset_index(name='Admins').rename(columns={'index': 'Groups'})

Output:

    Groups  Admins
0   Group 1 Joe Red
1   Group 2 Joe Blue,Joe Green
2   Group 3 Joe Green

Answered By: sharmu1

Reformat Excel data-frame

Question:

Answers: