Concatenate string in groupby with conditions


I have the following dataframe:

id  v1  v2
1   a   b
1   a   d
2   c   e
2   d   e
2   f   g

which can be created with the command:

df = pd.DataFrame({'id':[1,1,2,2,2],'v1':['a','a','c','d','f'],'v2':['b','d','e','e','g']})

I need to concatenate v1 and v2 using comma in each id group, with one conditions:

  • If a value already exists in the group, and it’s the only value in the group, do not concatenate it.
    So the output should look like:
id  v1  v2
1   a   b,d
2   c,d,f   e,e,g

In this case, when id=2, v2="e,e,g", both "e" stay because e is not the only value in this group. However, when id=1, v1="a", because "a" is the only value in this group.

I have done concatenation part, but I am not sure how to implement the conditions. Here is my code so far:

df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(x)).reset_index()
Asked By: Larry



You want to add an if statement inside your lambda function.

print (df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(set(x)) if len(set(x))==1 else ', '.join(x)).reset_index())

If the set(x) has only one element, then you just pass set(x) else you join the values.

The output of this will be:

   id       v1       v2
0   1        a     b, d
1   2  c, d, f  e, e, g
Answered By: Joe Ferndz

Please Try

Answered By: wwnde
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.