For each ID element in column 1, check if it has 2 defined rows in column 2 in any order. Write the boolean result on column 3. Python

Question:

So I have the following dataframe

df = pd.DataFrame(data={'ID': [1,1,1,2,2,2,2,3,3,3,4,4,4],
                        'value': ["a","b","NA","a","a","NA","NA","a","NA","b","NA","b","NA"]})

|    |   ID | value   |
|---:|-----:|:--------|
|  0 |    1 | a       |
|  1 |    1 | b       |
|  2 |    1 | NA      |
|  3 |    2 | a       |
|  4 |    2 | a       |
|  5 |    2 | NA      |
|  6 |    2 | NA      |
|  7 |    3 | a       |
|  8 |    3 | NA      |
|  9 |    3 | b       |
| 10 |    4 | NA      |
| 11 |    4 | b       |
| 12 |    4 | NA      |

I want to check if for each element in "ID" column there are the values "a" and "b" in the "value" column , and write the result on "result" column, as it is shown in the table below. In the example only IDs "1" and "3", have the values "a", "b" in the "value" column, so they have "yes" values in the "result" column

df = pd.DataFrame(data={'ID': [1,1,1,2,2,2,2,3,3,3,4,4,4],
                        'value': ["a","b","NA","a","a","NA","NA","a","NA","b","NA","b","NA"],
                        'result': ["yes","yes","yes","no","no","no","no","yes","yes","yes","no","no","no"]})

|    |   ID | value   | result   |
|---:|-----:|:--------|:---------|
|  0 |    1 | a       | yes      |
|  1 |    1 | b       | yes      |
|  2 |    1 | NA      | yes      |
|  3 |    2 | a       | no       |
|  4 |    2 | a       | no       |
|  5 |    2 | NA      | no       |
|  6 |    2 | NA      | no       |
|  7 |    3 | a       | yes      |
|  8 |    3 | NA      | yes      |
|  9 |    3 | b       | yes      |
| 10 |    4 | NA      | no       |
| 11 |    4 | b       | no       |
| 12 |    4 | NA      | no       |

Any suggestion? Thank you very much in advance

Asked By: Víctor

||

Answers:

One solution can be this:

df["result"] = df.groupby("ID")["value"].transform(
               lambda x: "yes" if 'a' in x.values and 'b' in x.values else "no")

    ID value result
0    1     a    yes
1    1     b    yes
2    1    NA    yes
3    2     a     no
4    2     a     no
5    2    NA     no
6    2    NA     no
7    3     a    yes
8    3    NA    yes
9    3     b    yes
10   4    NA     no
11   4     b     no
12   4    NA     no

Answered By: Pablo C

Let us do correct the NA to NaN then transform with nunique

df.value = df.value.replace('NA',np.nan)
df['new'] = df.groupby('ID')['value'].transform('nunique')==2
df
Out[135]: 
    ID value    new
0    1     a   True
1    1     b   True
2    1  None   True
3    2     a  False
4    2     a  False
5    2  None  False
6    2  None  False
7    3     a   True
8    3  None   True
9    3     b   True
10   4  None  False
11   4     b  False
12   4  None  False
Answered By: BENY

Try this. This will work if there are values other than just a and b in the df.

l = ['a','b']
df['result'] = df['ID'].map(df.groupby(['ID','value']).size().loc[(slice(None),l)].unstack().gt(0).all(axis=1))

or

df['ID'].map(df.groupby('ID')['value'].agg(set).ge({'a','b'}).map({True:'yes',False:'No'}))
Answered By: rhug123
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.