Using pytest with dataframes to test specific columns
Question:
I am writing pytest tests that use panda’s dataframes and I am trying to write the code as general as I can. (I can always check element by element but trying to avoid that)
so I have an input dataframe that contains some ID column like this
ID,othervalue, othervalue2
00001, 4, 3
00001, 3, 3
00001, 2, 0
00003, 5, 2
00003, 2, 1
00003, 2, 9
and I do
def test_df_against_angle(df, angle):
result = do_some_calculation(df, angle)
Now, result
is also a dataframe that contains a ID column and it also contains a decision
column that can take a value like "plus", "minus" (or "pass", "fail" or something like that) Something like
ID, someresult, decision, someotherresult
00001, 4, plus, 3
00001, 2, plus, 2
00002, 2, minus, 2
00002, 1, minus, 5
00002, 0, minus, 9
I want to add an assertion (or several) that asserts the following (Not all at once, I mean, different assertions since I have not yet decide which would be better):
- All decision values corresponding to an ID are the same
- The decision values corresponding to an ID are different than the ones of the other ID
- The decision of ID 00001 is plus and the one of 00002 is minus
I know that pandas have some assertion to compare equal dataframes but how can I go for this situation?
Answers:
IIUC use for all tests:
#first test number of unique values per groups if 1
assert df.groupby('ID')['decision'].nunique().eq(1).all()
#second test if match all another groups by group ID
assert not df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).all()).all()
#second and third with first 2 unique values per ID
uniq = df['ID'].unique()
s1 = df.loc[df['ID'].eq(uniq[0]), 'decision']
s2 = df.loc[df['ID'].eq(uniq[1]), 'decision']
assert not s1.isin(s2).all()
#test if all values are plus and minus
assert s1.eq('plus').all() and s2.eq('minus').all()
I am writing pytest tests that use panda’s dataframes and I am trying to write the code as general as I can. (I can always check element by element but trying to avoid that)
so I have an input dataframe that contains some ID column like this
ID,othervalue, othervalue2
00001, 4, 3
00001, 3, 3
00001, 2, 0
00003, 5, 2
00003, 2, 1
00003, 2, 9
and I do
def test_df_against_angle(df, angle):
result = do_some_calculation(df, angle)
Now, result
is also a dataframe that contains a ID column and it also contains a decision
column that can take a value like "plus", "minus" (or "pass", "fail" or something like that) Something like
ID, someresult, decision, someotherresult
00001, 4, plus, 3
00001, 2, plus, 2
00002, 2, minus, 2
00002, 1, minus, 5
00002, 0, minus, 9
I want to add an assertion (or several) that asserts the following (Not all at once, I mean, different assertions since I have not yet decide which would be better):
- All decision values corresponding to an ID are the same
- The decision values corresponding to an ID are different than the ones of the other ID
- The decision of ID 00001 is plus and the one of 00002 is minus
I know that pandas have some assertion to compare equal dataframes but how can I go for this situation?
IIUC use for all tests:
#first test number of unique values per groups if 1
assert df.groupby('ID')['decision'].nunique().eq(1).all()
#second test if match all another groups by group ID
assert not df.groupby('ID').apply(lambda x: df.loc[df['ID'].ne(x.name),'decision'].isin(x['decision']).all()).all()
#second and third with first 2 unique values per ID
uniq = df['ID'].unique()
s1 = df.loc[df['ID'].eq(uniq[0]), 'decision']
s2 = df.loc[df['ID'].eq(uniq[1]), 'decision']
assert not s1.isin(s2).all()
#test if all values are plus and minus
assert s1.eq('plus').all() and s2.eq('minus').all()