Reformat Excel data-frame

Question:

I managed to get my python script working to scrape data from a website using Playwright.
The website data is in a format not usable to us at the moment. Here is an example of the initial extract:

Name Group 1 Group 2 Group 3 Group 4 Group 5
Joe Black A U
Joe Blue A A
Joe Green U A
Joe Red A U

The A in the table above means the users are admins of the group. I need to get the data above into a table that has the groups in the 1st row and in the 2nd row if they are admins of the group have their names listed. So basically I need to get it to this:

Groups Admins
Group 1 Joe Blue,Joe Red
Group 2 Joe Red
Group 3 Joe Blue
Group 4 Joe Blue
Group 5 Joe Green

I am trying to use Pandas but completely lost on how to get the format correct. Just need some advice or a reference to a similar problem I can work off?

Asked By: Dinerz

||

Answers:

You can reshape with melt, then dropna and groupby.agg:

out = (df.melt('Name', var_name='Group').dropna(subset='value')
         .groupby('Group')['Name'].agg(', '.join).reset_index(name='Admins')
       )

Variant with a stack:

(df.set_index('Name').rename_axis(index='Admins', columns='Group')
   .stack().reset_index()
   .groupby('Group', as_index=False)['Admins'].agg(', '.join)
)

Output:

     Group             Admins
0  Group 1          Joe Black
1  Group 2  Joe Blue, Joe Red
2  Group 3           Joe Blue
3  Group 5          Joe Green
Answered By: mozway

If you unstack it, then you get a Series with a MultiIndex. You can then use a groupby and join the names corresponding to "A"-values:

def getAdmins(x):
    sel = x[x == "A"]
    return ",".join(sel.index.get_level_values(1)) if sel.any() else np.nan

df_new = df.unstack().groupby(level=0).agg(getAdmins)
Answered By: P.Jo

Should you need to be robust against empty string/NAs:

df = pd.DataFrame({
    'Name': ['Joe Red', 'Joe Blue', 'Joe Green'],
    'Group 1': ['A', pd.NA, ''],
    'Group 2': ['', 'A', 'A'],
    'Group 3': ['', np.nan, 'A'],
})

df_t = df.set_index('Name').T.replace({
    'A': True,
    'U': False,
    '': False,
    pd.NA: False,
    np.nan: False,
})

df_t.apply(
    lambda x: df_t.columns[x].str.cat(sep=','), axis=1
).reset_index(name='Admins').rename(columns={'index': 'Groups'})

Output:

    Groups  Admins
0   Group 1 Joe Red
1   Group 2 Joe Blue,Joe Green
2   Group 3 Joe Green
Answered By: sharmu1
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.