Python. Dataframes. Select rows and apply condition on the selection

Question:

python newbie here. I have written the code that solves the issue. However, there should be a much better way of doing it.

I have two Series that come from the same table but due to some earlier process I get as separate sets. (They could be joined into a single dataframe again since the entries belong to the same record)

Ser1                   Ser2                                  
| id |               | section |
| ---|               |-------- |
| 1  |               |    A    |
| 2  |               |    B    |
| 2  |               |    C    |
| 3  |               |    D    |


df2
| id | section |
| ---|---------|
| 1  |    A    |
| 2  |    B    |
| 2  |    Z    |
| 2  |    Y    |
| 4  |    X    |
      

First, I would like to find those entries in Ser1, which match the same id in df2. Then, check if the values in the ser2 can NOT be found in the section column of df2
My expected results:

| id | section |  result |
| ---|-------- |---------|
| 1  |    A    |  False  |    # Both id(1) and section(A) are also in df2
| 2  |    B    |  False  |    # Both id(2) and section(B) are also in df2
| 2  |    C    |  True   |    # id(2) is in df2 but section(C) is not
| 3  |    D    |  False  |    # id(3) is not in df2, in that case the result should also be False


My code:

    for k, v in Ser2.items():
        rslt_df = df2[df2['id'] == Ser[k]]
        if rslt_df.empty:
           print(False)
        if(v not in rslt_df['section'].tolist()):
           print(True)
        else:
           print(False)

I know the code is not very good. But after reading about merging and comprehension lists I am getting confused what the best way would be to improve it.

Asked By: Nebiros

||

Answers:

You can concat the series and compute the "result" with boolean arithmetic (XOR):

out = (
 pd.concat([ser1, ser2], axis=1)
   .assign(result=ser1.isin(df2['id'])!=ser2.isin(df2['section']))
)

Output:

   id section  result
0   1       A   False
1   2       B   False
2   2       C    True
3   3       D   False

Intermediates:

m1 = ser1.isin(df2['id'])
m2 = ser2.isin(df2['section'])

      m1     m2  m1!=m2
0   True   True   False
1   True   True   False
2   True  False    True
3  False  False   False
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.