For each ID element in column 1, check if it has 2 defined rows in column 2 in any order. Write the boolean result on column 3. Python

Question

So I have the following dataframe

df = pd.DataFrame(data={'ID': [1,1,1,2,2,2,2,3,3,3,4,4,4],
                        'value': ["a","b","NA","a","a","NA","NA","a","NA","b","NA","b","NA"]})

|    |   ID | value   |
|---:|-----:|:--------|
|  0 |    1 | a       |
|  1 |    1 | b       |
|  2 |    1 | NA      |
|  3 |    2 | a       |
|  4 |    2 | a       |
|  5 |    2 | NA      |
|  6 |    2 | NA      |
|  7 |    3 | a       |
|  8 |    3 | NA      |
|  9 |    3 | b       |
| 10 |    4 | NA      |
| 11 |    4 | b       |
| 12 |    4 | NA      |

I want to check if for each element in "ID" column there are the values "a" and "b" in the "value" column , and write the result on "result" column, as it is shown in the table below. In the example only IDs "1" and "3", have the values "a", "b" in the "value" column, so they have "yes" values in the "result" column

df = pd.DataFrame(data={'ID': [1,1,1,2,2,2,2,3,3,3,4,4,4],
                        'value': ["a","b","NA","a","a","NA","NA","a","NA","b","NA","b","NA"],
                        'result': ["yes","yes","yes","no","no","no","no","yes","yes","yes","no","no","no"]})

|    |   ID | value   | result   |
|---:|-----:|:--------|:---------|
|  0 |    1 | a       | yes      |
|  1 |    1 | b       | yes      |
|  2 |    1 | NA      | yes      |
|  3 |    2 | a       | no       |
|  4 |    2 | a       | no       |
|  5 |    2 | NA      | no       |
|  6 |    2 | NA      | no       |
|  7 |    3 | a       | yes      |
|  8 |    3 | NA      | yes      |
|  9 |    3 | b       | yes      |
| 10 |    4 | NA      | no       |
| 11 |    4 | b       | no       |
| 12 |    4 | NA      | no       |

Any suggestion? Thank you very much in advance

Asked By: Víctor

||

Source

Answer 1

One solution can be this:

df["result"] = df.groupby("ID")["value"].transform(
               lambda x: "yes" if 'a' in x.values and 'b' in x.values else "no")

    ID value result
0    1     a    yes
1    1     b    yes
2    1    NA    yes
3    2     a     no
4    2     a     no
5    2    NA     no
6    2    NA     no
7    3     a    yes
8    3    NA    yes
9    3     b    yes
10   4    NA     no
11   4     b     no
12   4    NA     no

Answered By: Pablo C

Answer 2

Let us do correct the NA to NaN then transform with nunique

df.value = df.value.replace('NA',np.nan)
df['new'] = df.groupby('ID')['value'].transform('nunique')==2
df
Out[135]: 
    ID value    new
0    1     a   True
1    1     b   True
2    1  None   True
3    2     a  False
4    2     a  False
5    2  None  False
6    2  None  False
7    3     a   True
8    3  None   True
9    3     b   True
10   4  None  False
11   4     b  False
12   4  None  False

Answered By: BENY

Answer 3

Try this. This will work if there are values other than just a and b in the df.

l = ['a','b']
df['result'] = df['ID'].map(df.groupby(['ID','value']).size().loc[(slice(None),l)].unstack().gt(0).all(axis=1))

or

df['ID'].map(df.groupby('ID')['value'].agg(set).ge({'a','b'}).map({True:'yes',False:'No'}))

Answered By: rhug123

For each ID element in column 1, check if it has 2 defined rows in column 2 in any order. Write the boolean result on column 3. Python

Question:

Answers: