For each ID element in column 1, check if it has 2 defined rows in column 2 in any order. Write the boolean result on column 3. Python
Question:
So I have the following dataframe
df = pd.DataFrame(data={'ID': [1,1,1,2,2,2,2,3,3,3,4,4,4],
'value': ["a","b","NA","a","a","NA","NA","a","NA","b","NA","b","NA"]})
| | ID | value |
|---:|-----:|:--------|
| 0 | 1 | a |
| 1 | 1 | b |
| 2 | 1 | NA |
| 3 | 2 | a |
| 4 | 2 | a |
| 5 | 2 | NA |
| 6 | 2 | NA |
| 7 | 3 | a |
| 8 | 3 | NA |
| 9 | 3 | b |
| 10 | 4 | NA |
| 11 | 4 | b |
| 12 | 4 | NA |
I want to check if for each element in "ID" column there are the values "a" and "b" in the "value" column , and write the result on "result" column, as it is shown in the table below. In the example only IDs "1" and "3", have the values "a", "b" in the "value" column, so they have "yes" values in the "result" column
df = pd.DataFrame(data={'ID': [1,1,1,2,2,2,2,3,3,3,4,4,4],
'value': ["a","b","NA","a","a","NA","NA","a","NA","b","NA","b","NA"],
'result': ["yes","yes","yes","no","no","no","no","yes","yes","yes","no","no","no"]})
| | ID | value | result |
|---:|-----:|:--------|:---------|
| 0 | 1 | a | yes |
| 1 | 1 | b | yes |
| 2 | 1 | NA | yes |
| 3 | 2 | a | no |
| 4 | 2 | a | no |
| 5 | 2 | NA | no |
| 6 | 2 | NA | no |
| 7 | 3 | a | yes |
| 8 | 3 | NA | yes |
| 9 | 3 | b | yes |
| 10 | 4 | NA | no |
| 11 | 4 | b | no |
| 12 | 4 | NA | no |
Any suggestion? Thank you very much in advance
Answers:
One solution can be this:
df["result"] = df.groupby("ID")["value"].transform(
lambda x: "yes" if 'a' in x.values and 'b' in x.values else "no")
ID value result
0 1 a yes
1 1 b yes
2 1 NA yes
3 2 a no
4 2 a no
5 2 NA no
6 2 NA no
7 3 a yes
8 3 NA yes
9 3 b yes
10 4 NA no
11 4 b no
12 4 NA no
Let us do correct the NA to NaN
then transform
with nunique
df.value = df.value.replace('NA',np.nan)
df['new'] = df.groupby('ID')['value'].transform('nunique')==2
df
Out[135]:
ID value new
0 1 a True
1 1 b True
2 1 None True
3 2 a False
4 2 a False
5 2 None False
6 2 None False
7 3 a True
8 3 None True
9 3 b True
10 4 None False
11 4 b False
12 4 None False
Try this. This will work if there are values other than just a
and b
in the df.
l = ['a','b']
df['result'] = df['ID'].map(df.groupby(['ID','value']).size().loc[(slice(None),l)].unstack().gt(0).all(axis=1))
or
df['ID'].map(df.groupby('ID')['value'].agg(set).ge({'a','b'}).map({True:'yes',False:'No'}))
So I have the following dataframe
df = pd.DataFrame(data={'ID': [1,1,1,2,2,2,2,3,3,3,4,4,4],
'value': ["a","b","NA","a","a","NA","NA","a","NA","b","NA","b","NA"]})
| | ID | value |
|---:|-----:|:--------|
| 0 | 1 | a |
| 1 | 1 | b |
| 2 | 1 | NA |
| 3 | 2 | a |
| 4 | 2 | a |
| 5 | 2 | NA |
| 6 | 2 | NA |
| 7 | 3 | a |
| 8 | 3 | NA |
| 9 | 3 | b |
| 10 | 4 | NA |
| 11 | 4 | b |
| 12 | 4 | NA |
I want to check if for each element in "ID" column there are the values "a" and "b" in the "value" column , and write the result on "result" column, as it is shown in the table below. In the example only IDs "1" and "3", have the values "a", "b" in the "value" column, so they have "yes" values in the "result" column
df = pd.DataFrame(data={'ID': [1,1,1,2,2,2,2,3,3,3,4,4,4],
'value': ["a","b","NA","a","a","NA","NA","a","NA","b","NA","b","NA"],
'result': ["yes","yes","yes","no","no","no","no","yes","yes","yes","no","no","no"]})
| | ID | value | result |
|---:|-----:|:--------|:---------|
| 0 | 1 | a | yes |
| 1 | 1 | b | yes |
| 2 | 1 | NA | yes |
| 3 | 2 | a | no |
| 4 | 2 | a | no |
| 5 | 2 | NA | no |
| 6 | 2 | NA | no |
| 7 | 3 | a | yes |
| 8 | 3 | NA | yes |
| 9 | 3 | b | yes |
| 10 | 4 | NA | no |
| 11 | 4 | b | no |
| 12 | 4 | NA | no |
Any suggestion? Thank you very much in advance
One solution can be this:
df["result"] = df.groupby("ID")["value"].transform(
lambda x: "yes" if 'a' in x.values and 'b' in x.values else "no")
ID value result
0 1 a yes
1 1 b yes
2 1 NA yes
3 2 a no
4 2 a no
5 2 NA no
6 2 NA no
7 3 a yes
8 3 NA yes
9 3 b yes
10 4 NA no
11 4 b no
12 4 NA no
Let us do correct the NA to NaN
then transform
with nunique
df.value = df.value.replace('NA',np.nan)
df['new'] = df.groupby('ID')['value'].transform('nunique')==2
df
Out[135]:
ID value new
0 1 a True
1 1 b True
2 1 None True
3 2 a False
4 2 a False
5 2 None False
6 2 None False
7 3 a True
8 3 None True
9 3 b True
10 4 None False
11 4 b False
12 4 None False
Try this. This will work if there are values other than just a
and b
in the df.
l = ['a','b']
df['result'] = df['ID'].map(df.groupby(['ID','value']).size().loc[(slice(None),l)].unstack().gt(0).all(axis=1))
or
df['ID'].map(df.groupby('ID')['value'].agg(set).ge({'a','b'}).map({True:'yes',False:'No'}))