Remove the group of rows based on the condition of rows
Question:
I have a dataframe which has two columns, ‘Group’ and ‘Sample Number’
The column ‘Group’ has sample number ’11’ which is UNIQUE. and each group will have only one ’11’ Sample Number, followed by the sample numbers in range of 21 to 29 ( for example, 21, 22 23, 24, 25, 26, 27 , 28 , 29) and followed by the sample numbers in range of 31 to 39 (for example, 31, 32, 33, 34, 35, 36, 37, 38, 39). Hence each group should have one ’11’ sample number, at least one sample number in the range of 21 to 29 and at least one sample number in the rande of 31 to 39.
I wish to compute in such a way that my code goes through each group and
-
Check if there is a sample number 11 in the group or not.
-
Check if there is at least one sample number in the range of 21
to 29 .
-
Check if there is at least one sample number in the range
of 31 to 39
If any of these three conditions does not match then the code removes the entire group from the dataframe
Below is the dataframe in table format:
Group
Sample_Number
Z007
11
Z007
21
Z007
22
Z007
23
Z007
31
Z007
32
Z008
11
Z008
31
Z008
32
Z008
33
Z009
11
Z009
21
Z009
22
Z009
23
Z010
21
Z010
22
Z010
23
Z010
24
Z010
31
Z010
32
Z010
33
Z010
34
df = pd.DataFrame([[Z007, 11],[Z007, 21] , [Z007, 22], [Z007, 23], [Z007, 31],[Z007, 32],[Z008, 11],[Z008, 31],[Z008, 32],[Z008, 33],[Z009, 11],[Z009, 21],[Z009, 22],[Z009, 23], [Z010, 21],[Z010, 22],[Z010, 23], [Z010, 24],[Z010, 31],[Z010, 32],[Z010, 33],[Z010, 34], columns=['Group', 'Sample_Number'])
The code should remove the group ‘Z008’ as it does not have the sample number in the range of 21 to 29. It should remove the group ‘Z009’ as it does not have the sample number in the range of 31 to 39. Also it should remove the group ‘Z010′ as it does not have the sample number ’11’.
Expected answer is below:
Group
Sample_Number
Z007
11
Z007
21
Z007
22
Z007
23
Z007
31
Z007
32
I could do it only for sample number 11 but struggling to do the same for the other sample numbers in the range of (21 to 29 ) and (31 to 39), below is the code for sample number 11
invalid_group_no = [i for i in df['Group'].unique() if
df[df['Group']== i]["Sample_Number"].to_list().count(11)!=1]
Can anyone please help me with the other sample numbers? Please feel free to implement your own ways. Any help is appreciated.
Answers:
Try this:
groups = set(df['Group'][df['Sample_Number'] == 11]) & set(df['Group'][df['Sample_Number'].isin(range(21,30))]) & set(df['Group'][df['Sample_Number'].isin(range(31,40))])
df = df[df['Group'].isin(groups)]
Group Sample_Number
0 Z007 11
1 Z007 21
2 Z007 22
3 Z007 23
4 Z007 31
5 Z007 32
I have a dataframe which has two columns, ‘Group’ and ‘Sample Number’
The column ‘Group’ has sample number ’11’ which is UNIQUE. and each group will have only one ’11’ Sample Number, followed by the sample numbers in range of 21 to 29 ( for example, 21, 22 23, 24, 25, 26, 27 , 28 , 29) and followed by the sample numbers in range of 31 to 39 (for example, 31, 32, 33, 34, 35, 36, 37, 38, 39). Hence each group should have one ’11’ sample number, at least one sample number in the range of 21 to 29 and at least one sample number in the rande of 31 to 39.
I wish to compute in such a way that my code goes through each group and
-
Check if there is a sample number 11 in the group or not.
-
Check if there is at least one sample number in the range of 21
to 29 . -
Check if there is at least one sample number in the range
of 31 to 39
If any of these three conditions does not match then the code removes the entire group from the dataframe
Below is the dataframe in table format:
Group | Sample_Number |
---|---|
Z007 | 11 |
Z007 | 21 |
Z007 | 22 |
Z007 | 23 |
Z007 | 31 |
Z007 | 32 |
Z008 | 11 |
Z008 | 31 |
Z008 | 32 |
Z008 | 33 |
Z009 | 11 |
Z009 | 21 |
Z009 | 22 |
Z009 | 23 |
Z010 | 21 |
Z010 | 22 |
Z010 | 23 |
Z010 | 24 |
Z010 | 31 |
Z010 | 32 |
Z010 | 33 |
Z010 | 34 |
df = pd.DataFrame([[Z007, 11],[Z007, 21] , [Z007, 22], [Z007, 23], [Z007, 31],[Z007, 32],[Z008, 11],[Z008, 31],[Z008, 32],[Z008, 33],[Z009, 11],[Z009, 21],[Z009, 22],[Z009, 23], [Z010, 21],[Z010, 22],[Z010, 23], [Z010, 24],[Z010, 31],[Z010, 32],[Z010, 33],[Z010, 34], columns=['Group', 'Sample_Number'])
The code should remove the group ‘Z008’ as it does not have the sample number in the range of 21 to 29. It should remove the group ‘Z009’ as it does not have the sample number in the range of 31 to 39. Also it should remove the group ‘Z010′ as it does not have the sample number ’11’.
Expected answer is below:
Group | Sample_Number |
---|---|
Z007 | 11 |
Z007 | 21 |
Z007 | 22 |
Z007 | 23 |
Z007 | 31 |
Z007 | 32 |
I could do it only for sample number 11 but struggling to do the same for the other sample numbers in the range of (21 to 29 ) and (31 to 39), below is the code for sample number 11
invalid_group_no = [i for i in df['Group'].unique() if
df[df['Group']== i]["Sample_Number"].to_list().count(11)!=1]
Can anyone please help me with the other sample numbers? Please feel free to implement your own ways. Any help is appreciated.
Try this:
groups = set(df['Group'][df['Sample_Number'] == 11]) & set(df['Group'][df['Sample_Number'].isin(range(21,30))]) & set(df['Group'][df['Sample_Number'].isin(range(31,40))])
df = df[df['Group'].isin(groups)]
Group Sample_Number
0 Z007 11
1 Z007 21
2 Z007 22
3 Z007 23
4 Z007 31
5 Z007 32