pandas groupby join data with"|" but exclude null,nan,emptystring,whitespace

Question

here is the sample data for my question

pid	date	task	language
8522	17-09-2022	listen	english
7851	17-09-2022	read	hindi
7546	17-09-2022		telugu
3584		write
8522	18-09-2022	read
7851	19-09-2022		hindi
8522		speak	hindi
7546	19-09-2022	speak	english
3584		write	tamil
8522	25-09-2022

i tried to code and got the output

data=df.groupby('pid')[['date','task','language']].transform(lambda x: '|'.join(x))

and i got this output

image1:-

and i want the output to be with no empty/null values to be added so my desired final output would look like this but i have not idea how to filter or remove the values and get the output

image2:-

any answer or suggestion is helpful

Asked By: Siva Rama krishna Kumar

||

Source

Answer 1

You can use pandas.Series.dropna in the lambda expression before joining.

Here is what works for me:

out = (
    df
    .groupby('pid')[['date','task','language']]
    .agg(lambda x: '|'.join(x
                            .replace(r'^s*$', np.nan, regex=True)
                            .dropna()
                            .astype(str)
                           )
        )
)
print(out)

                                  date               task        language
pid                                                                      
3584                                          write|write           tamil
7546             17-09-2022|19-09-2022              speak  telugu|english
7851             17-09-2022|19-09-2022               read     hindi|hindi
8522  17-09-2022|18-09-2022|25-09-2022  listen|read|speak   english|hindi

Answered By: Rabinzel

pandas groupby join data with"|" but exclude null,nan,emptystring,whitespace

Question:

Answers: