Groupby and concatenate unique values by separator in Pandas dataframa

Question:

I have following pandas dataframe.

    org_id  org_name    location_id             loc_status  city            country
0   100023310   advance GmbH    LOC-100052061   ACTIVE      Planegg         Germany
1   100023310   advance GmbH    LOC-100032442   ACTIVE      Planegg         Germany
2   100023310   advance GmbH    LOC-100042003   INACTIVE    Planegg         Germany
3   100004261   Beacon Limited  LOC-100005615   ACTIVE      Tunbridge Wells United Kingdom
4   100004261   Beacon Limited  LOC-100000912   ACTIVE      Crowborough     United Kingdom

I would like to group the rows by column org_id, org_name and find unique and concatenate value by a separator ‘|’ other column values.

I am using following lines of code.

gr_columns = [x for x in df.columns if x not in ['location_id', 'loc_status','city', 'country']]
df.groupby(gr_columns).agg(lambda col: '|'.join(col))

However, the final dataframe has some of the columns missing (city and country). I am getting following output.

    org_id  org_name    location_id             loc_status
1   100023310   advance GmbH    LOC-100052061|LOC-100032442|LOC-100042003   ACTIVE|INACTIVE     
2   100004261   Beacon Limited  LOC-100005615   ACTIVE     

With the following warning as well.


FutureWarning: Dropping invalid columns in DataFrameGroupBy.agg is deprecated. In a future version, a TypeError will be raised. Before calling .agg, select only columns which should be valid for the function.
  df.groupby(gr_columns).agg(lambda col: ','.join(col))

The expected output is:

    org_id  org_name    location_id             loc_status  city            country
1   100023310   advance GmbH    LOC-100052061|LOC-100032442|LOC-100042003   ACTIVE|INACTIVE     Planegg         Germany
2   100004261   Beacon Limited  LOC-100005615   ACTIVE      Tunbridge Wells|Crowborough United Kingdom

Any help is highly appreciated.

Asked By: rshar

||

Answers:

Update

In fact, it seems you want to join everything with unique values:

join_unique = lambda x: '|'.join(x.unique())
out = df.groupby(['org_id', 'org_name'], as_index=False).agg(join_unique)
print(out)

# Output with pd.pandas.set_option('display.max_columns', None)
      org_id        org_name                                location_id  
0  100004261  Beacon Limited                LOC-100005615|LOC-100000912   
1  100023310    advance GmbH  LOC-100052061|LOC-100032442|LOC-100042003   

        loc_status                         city         country  
0           ACTIVE  Tunbridge Wells|Crowborough  United Kingdom  
1  ACTIVE|INACTIVE                      Planegg         Germany  

Old answer

You can use groupby_agg:

>>> (df.groupby(['org_id', 'org_name'], as_index=False)
       .agg({'location_id': '|'.join, 'city': 'first', 'country': 'first'}))

      org_id        org_name                                location_id             city         country
0  100004261  Beacon Limited                LOC-100005615|LOC-100000912  Tunbridge Wells  United Kingdom
1  100023310    advance GmbH  LOC-100052061|LOC-100032442|LOC-100042003          Planegg         Germany
Answered By: Corralien

I think you are looking for:

df.groupby(['org_id', 'org_name'], as_index=False).agg(lambda x: '|'.join(x.unique()))




    org_id        org_name                                location_id  
0  100004261  Beacon Limited                LOC-100005615|LOC-100000912   
1  100023310    advance GmbH  LOC-100052061|LOC-100032442|LOC-100042003   

        loc_status                         city  country  
0           ACTIVE  Tunbridge Wells|Crowborough  Kingdom  
1  ACTIVE|INACTIVE                      Planegg  Germany 
Answered By: SomeDude

Here’s a way to do what your question asks:

print( df.groupby(['org_id','org_name']).agg(lambda col: '|'.join(col.drop_duplicates())).reset_index() )

Output:

      org_id        org_name                                location_id       loc_status                         city         country
0  100004261  Beacon Limited                LOC-100005615|LOC-100000912           ACTIVE  Tunbridge Wells|Crowborough  United Kingdom
1  100023310    advance GmbH  LOC-100052061|LOC-100032442|LOC-100042003  ACTIVE|INACTIVE                      Planegg         Germany
Answered By: constantstranger
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.