Pandas dataframe groupby and aggreagate with conditions


Hi is there a way where I can group my dataframe based on specific columns and include empty value as well but only when all of the values of the specific column is empty.

I have a dataframe that look like this:
enter image description here

I am trying to group the dataframe based on Name and Subject.
and my expected output looks like this:
enter image description here

So, if a person takes more than one subject but one of them is empty, then drop the row so when aggregating the other rows it wont be included. If a person takes only one subject and it is empty then dont drop the row

Original dataframe
enter image description here

Outcome will still be the same. It will takes the first row value if all subjects of a person is empty

[Updated] Another new dataframe

enter image description here

Outcome will have the same number of subjects but there will be 3 year
enter image description here

Asked By: cyntha



Here is a proposition with GroupBy.agg :

df = df.drop_duplicates(subset=["ID", "Name", "Subject"])

m = (df.groupby(["ID", "Name"])["Subject"].transform("size").gt(1)
     & df["Subject"].isnull())

out = df.loc[~m].groupby(["ID", "Name"], as_index=False).agg(list)

Output :


   ID Name          Subject    Year
0   1   CC  [Math, English]  [1, 3]
1   2   DD        [Physics]     [2]
2   3   EE      [Chemistry]     [1]
3   4   FF            [nan]     [0]
4   5   GG            [nan]     [0]
Answered By: Timeless
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.