Function to highlight unique observations within groups

Question:

I would like help creating a function that dynamically iterates through a dataframe by group (non specified color), looks at ID to see if which id’s do not line up with the majority of the id’s that are in each grouping by color(so whatever number of observations there are per color, whatever is equivalent to more than half of the populate id’s per color, for this case are correct). The real dataset will most likely have 10-50 rows per color and there could be multiple instances where there is an out of place id. It would be great if we could include the string note ‘Flag for later research’, or if easier a simple 0/1 output and i can write the corresponding text functionality. I am having trouble figuring out where to start. With either a groupby nunique function or a loop or something that combines the two.

  • as you can see below, in the ‘note’ column, the comment i have left (flag for later research) corresponds to the id that is not like the others within its corresponding color grouping.

Sample of data:

color    id    commitment    Note  *(where i need help) 
blue     1     10
blue     1     5
blue     1     15
blue     2     10            Flag for later research
blue     1     9
green    3     10
green    3     11
green    2     12            Flag for later research
green    3     15
Asked By: John

||

Answers:

This code:

df['Note'] = ~df.duplicated(['color','id'], keep=False)

gives your:

   color  id  commitment   Note
0   blue   1          10  False
1   blue   1           5  False
2   blue   1          15  False
3   blue   2          10   True
4   blue   1           9  False
5  green   3          10  False
6  green   3          11  False
7  green   2          12   True
8  green   3          15  False
Answered By: Quang Hoang
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.