pandas groupby join data with"|" but exclude null,nan,emptystring,whitespace
Question:
here is the sample data for my question
pid
date
task
language
8522
17-09-2022
listen
english
7851
17-09-2022
read
hindi
7546
17-09-2022
telugu
3584
write
8522
18-09-2022
read
7851
19-09-2022
hindi
8522
speak
hindi
7546
19-09-2022
speak
english
3584
write
tamil
8522
25-09-2022
i tried to code and got the output
data=df.groupby('pid')[['date','task','language']].transform(lambda x: '|'.join(x))
and i got this output
and i want the output to be with no empty/null values to be added so my desired final output would look like this but i have not idea how to filter or remove the values and get the output
any answer or suggestion is helpful
Answers:
You can use pandas.Series.dropna
in the lambda expression before joining.
Here is what works for me:
out = (
df
.groupby('pid')[['date','task','language']]
.agg(lambda x: '|'.join(x
.replace(r'^s*$', np.nan, regex=True)
.dropna()
.astype(str)
)
)
)
print(out)
date task language
pid
3584 write|write tamil
7546 17-09-2022|19-09-2022 speak telugu|english
7851 17-09-2022|19-09-2022 read hindi|hindi
8522 17-09-2022|18-09-2022|25-09-2022 listen|read|speak english|hindi
here is the sample data for my question
pid | date | task | language |
---|---|---|---|
8522 | 17-09-2022 | listen | english |
7851 | 17-09-2022 | read | hindi |
7546 | 17-09-2022 | telugu | |
3584 | write | ||
8522 | 18-09-2022 | read | |
7851 | 19-09-2022 | hindi | |
8522 | speak | hindi | |
7546 | 19-09-2022 | speak | english |
3584 | write | tamil | |
8522 | 25-09-2022 |
i tried to code and got the output
data=df.groupby('pid')[['date','task','language']].transform(lambda x: '|'.join(x))
and i got this output
and i want the output to be with no empty/null values to be added so my desired final output would look like this but i have not idea how to filter or remove the values and get the output
any answer or suggestion is helpful
You can use pandas.Series.dropna
in the lambda expression before joining.
Here is what works for me:
out = (
df
.groupby('pid')[['date','task','language']]
.agg(lambda x: '|'.join(x
.replace(r'^s*$', np.nan, regex=True)
.dropna()
.astype(str)
)
)
)
print(out)
date task language
pid
3584 write|write tamil
7546 17-09-2022|19-09-2022 speak telugu|english
7851 17-09-2022|19-09-2022 read hindi|hindi
8522 17-09-2022|18-09-2022|25-09-2022 listen|read|speak english|hindi