Concatenate string in groupby with conditions
Question:
I have the following dataframe:
id v1 v2
1 a b
1 a d
2 c e
2 d e
2 f g
which can be created with the command:
df = pd.DataFrame({'id':[1,1,2,2,2],'v1':['a','a','c','d','f'],'v2':['b','d','e','e','g']})
I need to concatenate v1 and v2 using comma in each id group, with one conditions:
- If a value already exists in the group, and it’s the only value in the group, do not concatenate it.
So the output should look like:
id v1 v2
1 a b,d
2 c,d,f e,e,g
In this case, when id=2, v2="e,e,g", both "e" stay because e is not the only value in this group. However, when id=1, v1="a", because "a" is the only value in this group.
I have done concatenation part, but I am not sure how to implement the conditions. Here is my code so far:
df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(x)).reset_index()
Answers:
You want to add an if statement inside your lambda function.
print (df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(set(x)) if len(set(x))==1 else ', '.join(x)).reset_index())
If the set(x) has only one element, then you just pass set(x) else you join the values.
The output of this will be:
id v1 v2
0 1 a b, d
1 2 c, d, f e, e, g
Please Try
g=df.groupby('id').agg(list)
I have the following dataframe:
id v1 v2
1 a b
1 a d
2 c e
2 d e
2 f g
which can be created with the command:
df = pd.DataFrame({'id':[1,1,2,2,2],'v1':['a','a','c','d','f'],'v2':['b','d','e','e','g']})
I need to concatenate v1 and v2 using comma in each id group, with one conditions:
- If a value already exists in the group, and it’s the only value in the group, do not concatenate it.
So the output should look like:
id v1 v2
1 a b,d
2 c,d,f e,e,g
In this case, when id=2, v2="e,e,g", both "e" stay because e is not the only value in this group. However, when id=1, v1="a", because "a" is the only value in this group.
I have done concatenation part, but I am not sure how to implement the conditions. Here is my code so far:
df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(x)).reset_index()
You want to add an if statement inside your lambda function.
print (df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(set(x)) if len(set(x))==1 else ', '.join(x)).reset_index())
If the set(x) has only one element, then you just pass set(x) else you join the values.
The output of this will be:
id v1 v2
0 1 a b, d
1 2 c, d, f e, e, g
Please Try
g=df.groupby('id').agg(list)