Using pytest with dataframes to test specific columns

Question:

I am writing pytest tests that use panda’s dataframes and I am trying to write the code as general as I can. (I can always check element by element but trying to avoid that)

so I have an input dataframe that contains some ID column like this

ID,othervalue, othervalue2
00001,  4,   3
00001,  3,   3
00001,  2,   0
00003,  5,   2
00003,  2,   1
00003,  2,   9

and I do

def test_df_against_angle(df, angle):
    result = do_some_calculation(df, angle)

Now, result is also a dataframe that contains a ID column and it also contains a decision column that can take a value like "plus", "minus" (or "pass", "fail" or something like that) Something like

ID, someresult,  decision, someotherresult
00001,   4,       plus,       3
00001,   2,       plus,       2
00002,   2,       minus,       2
00002,   1,       minus,       5
00002,   0,       minus,       9

I want to add an assertion (or several) that asserts the following (Not all at once, I mean, different assertions since I have not yet decide which would be better):

  1. All decision values corresponding to an ID are the same
  2. The decision values corresponding to an ID are different than the ones of the other ID
  3. The decision of ID 00001 is plus and the one of 00002 is minus

I know that pandas have some assertion to compare equal dataframes but how can I go for this situation?

Asked By: KansaiRobot

||

Answers:

IIUC use for all tests:

#first test number of unique values per groups if 1
assert df.groupby('ID')['decision'].nunique().eq(1).all()

#second test if match all another groups by group ID
assert not df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).all()).all()

#second and third with first 2 unique values per ID
uniq = df['ID'].unique()
s1 = df.loc[df['ID'].eq(uniq[0]), 'decision']
s2 = df.loc[df['ID'].eq(uniq[1]), 'decision']

assert not s1.isin(s2).all() 

#test if all values are plus and minus
assert s1.eq('plus').all() and s2.eq('minus').all()
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.