Check if pandas row contains exact quantity of strings

Question:

I have a df1 32611 x 17:

        0    1    2    3    4    5   ...   11    12    13    14    15    16
0      BSO  PRV  BSI  TUR  WSP  ACP  ...  HLR   HEX   HEX  None  None  None
1      BSO  PRV  BSI  TUR  WSP  ACP  ...  HLF   HLR   HEX   HEX   HEX  None
2      BSO  PRV  BSI  HLF  HLR  TUR  ...  HEX   RSO   RSI   HEX   HEX   HEX
3      BSO  PRV  BSI  HLF  HLR  TUR  ...  RSO   RSI   HEX   HEX   HEX  None
4      BSO  PRV  BSI  HLF  TUR  WSP  ...  RSO   RSI   HLR   HEX   HEX   HEX
    ...  ...  ...  ...  ...  ...  ...  ...   ...   ...   ...   ...   ...
32607  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None
32608  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None
32609  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None
32610  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None
32611  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None

I have another df2 6 x 17:

    0    1    2    3    4    5    6    7    8   9   10  11  12  13  14  15  16
1  ACP  HEX  HEX  HEX  HEX  TUR  NaN  NaN  NaN NaN NaN NaN NaN NaN NaN NaN NaN
2  ACP  HEX  HEX  HEX  HEX  HEX  HEX  TUR  NaN NaN NaN NaN NaN NaN NaN NaN NaN
3  ACP  HEX  HEX  HEX  HEX  HEX  HEX  TUR  TUR NaN NaN NaN NaN NaN NaN NaN NaN
4  ACP  HEX  HEX  HEX  HEX  TUR  TUR  NaN  NaN NaN NaN NaN NaN NaN NaN NaN NaN
5  ACP  HEX  HEX  TUR  NaN  NaN  NaN  NaN  NaN NaN NaN NaN NaN NaN NaN NaN NaN
6  ACP  HEX  HEX  TUR  TUR  NaN  NaN  NaN  NaN NaN NaN NaN NaN NaN NaN NaN NaN

I specifically care about df2’s value counts for each row. What I am trying to accomplish is:

Does Df1.loc[i] contain df2.loc[j].value_counts().

So df2.loc[j].value_counts() is:

HEX    4
ACP    1
TUR    1
Name: 1, dtype: int64

I want to iterate through each row of df1, and check it if it contains 4 HEX, 1 ACP, and 1 TUR, and if it does, assign it a number (in a separate list, this part doesn’t matter), if not pass.

Asked By: Tony Sirico

||

Answers:

Per the conversation in the comments, here is one way to compare on a row-by-row basis (not sure how performant this will be if operating on many records):

import pandas as pd

def contains_value_counts(row1: pd.Series, row2: pd.Series) -> bool:
    """Check if `row1` contains the value counts of `row2`."""
    vc1 = row1.value_counts()
    vc2 = row2.value_counts()
    return vc1.filter(vc2.index).equals(vc2)

df1 = pd.DataFrame(...)
df2 = pd.DataFrame(...)

idx1 = 0
idx2 = 0
equal = compare_value_counts(df1.iloc[idx1], df2.iloc[idx2])
Answered By: acurtis166
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.