Groupby multiple columns in pandas dataframe

Question:

I have a dataframe looks like:

page    reference       ids                 -         subject           word
1       apple           ['aaaa', 'bbbbb', 'cccc']       name            app
1       apple           ['bndv', 'asasa', 'swdsd']      fruit           is
1       apple           ['bsnm', 'dfsd', 'dgdf']        fruit           text
1       bat             ['asas', 'ddfgd', 'ff']         thing           sport
1       cat             ['sds', 'dffd', 'gdg']          fruit           color
1       bat             ['sds', 'fsss', 'ssfd']         thing           was
1       bat             ['fsf', 'sff', 'fss']           place           that
2       dog             ['fffds', 'gd', 'sdg']          name            mud
2       egg             ['dfff', 'sdf', 'vcv']          place           gun
2       dog             ['dsfd', 'fds', 'gfdg']         thing           kit
2       egg             ['ddd', 'fg', 'dfg']            place           hut

I want to groupby reference column and subject column. The output should look like this:

output:
page    reference   ids                                                subject          word
1       apple   [['bndv', 'asasa', 'swdsd'],['bsnm', 'dfsd', 'dgdf']]   fruit           [[is], [text]]
1       apple   ['aaaa', 'bbbbb', 'cccc']                               name            [app]
1       bat     [['asas', 'ddfgd', 'ff'], [['sds', 'fsss', 'ssfd']]     thing           [[sport], [was]]
1       bat     ['fsf', 'sff', 'fss']                                   place           [that]
1       cat     ['sds', 'dffd', 'gdg']                                  fruit           [color]
2       dog     ['fffds', 'gd', 'sdg']                                  name            [mud]
2       dog     ['dsfd', 'fds', 'gfdg']                                 thing           [kit]
2       egg     [['dfff', 'sdf', 'vcv'], ['ddd', 'fg', 'dfg']]          place           [[gun], [hut]]
Asked By: Anonymous

||

Answers:

First to group and aggregate necessary fields:

res = df.groupby(["reference", "subject"]).agg({"page": min, "ids": list, "word": lambda l: [[ll] for ll in l]}).reset_index

  reference subject  page                                         ids              word
0     apple   fruit     1  [[bndv, asasa, swdsd], [bsnm, dfsd, dgdf]]    [[is], [text]]
1     apple    name     1                       [[aaaa, bbbbb, cccc]]           [[app]]
2       bat   place     1                           [[fsf, sff, fss]]          [[that]]
3       bat   thing     1      [[asas, ddfgd, ff], [sds, fsss, ssfd]]  [[sport], [was]]
4       cat   fruit     1                          [[sds, dffd, gdg]]         [[color]]
5       dog    name     2                          [[fffds, gd, sdg]]           [[mud]]
6       dog   thing     2                         [[dsfd, fds, gfdg]]           [[kit]]
7       egg   place     2          [[dfff, sdf, vcv], [ddd, fg, dfg]]    [[gun], [hut]]

Note that this also wraps each word value in a list, just like what you want in your desired output. I’m also just assuming to take the minimum page value in each group since you didn’t mention the rule for this variable. You can update the min value in agg function to whatever you think is appropriate.

Then you can cleanup the lists if length is 1:

res["word"] = res["word"].apply(lambda l: l[0] if len(l) == 1 else l)
res["ids"] = res["ids"].apply(lambda l: l[0] if len(l) == 1 else l)

  reference subject  page                                         ids              word
0     apple   fruit     1  [[bndv, asasa, swdsd], [bsnm, dfsd, dgdf]]    [[is], [text]]
1     apple    name     1                         [aaaa, bbbbb, cccc]             [app]
2       bat   place     1                             [fsf, sff, fss]            [that]
3       bat   thing     1      [[asas, ddfgd, ff], [sds, fsss, ssfd]]  [[sport], [was]]
4       cat   fruit     1                            [sds, dffd, gdg]           [color]
5       dog    name     2                            [fffds, gd, sdg]             [mud]
6       dog   thing     2                           [dsfd, fds, gfdg]             [kit]
7       egg   place     2          [[dfff, sdf, vcv], [ddd, fg, dfg]]    [[gun], [hut]]
Answered By: TYZ
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.