Confirming equality of two pandas dataframes?
Question:
How to assert that the following two dataframes df1
and df2
are equal?
import pandas as pd
df1 = pd.DataFrame([1, 2, 3])
df2 = pd.DataFrame([1.0, 2, 3])
The output of df1.equals(df2)
is False
.
As of now, I know two ways:
print (df1 == df2).all()[0]
or
df1 = df1.astype(float)
print df1.equals(df2)
It seems a little bit messy. Is there a better way to do this comparison?
Answers:
Using elegant @Divakar’s idea – numpy’s allclose() will do the main trick for numbers:
In [128]: df1
Out[128]:
0 s n
0 1 aaa 1
1 2 aaa 2
2 3 aaa 3
In [129]: df2
Out[129]:
0 s n
0 1.0 aaa 1.0
1 2.0 aaa 2.0
2 3.0 aaa 3.0
In [130]: (np.allclose(df1.select_dtypes(exclude=[object]), df2.select_dtypes(exclude=[object]))
.....: &
.....: df1.select_dtypes(include=[object]).equals(df2.select_dtypes(include=[object]))
.....: )
Out[130]: True
select_dtypes() will help you to separate strings and all other numeric dtypes
You can use assert_frame_equal
and not check the dtype of the columns.
# Pre v. 0.20.3
# from pandas.util.testing import assert_frame_equal
from pandas.testing import assert_frame_equal
assert_frame_equal(df1, df2, check_dtype=False)
How to assert that the following two dataframes df1
and df2
are equal?
import pandas as pd
df1 = pd.DataFrame([1, 2, 3])
df2 = pd.DataFrame([1.0, 2, 3])
The output of df1.equals(df2)
is False
.
As of now, I know two ways:
print (df1 == df2).all()[0]
or
df1 = df1.astype(float)
print df1.equals(df2)
It seems a little bit messy. Is there a better way to do this comparison?
Using elegant @Divakar’s idea – numpy’s allclose() will do the main trick for numbers:
In [128]: df1
Out[128]:
0 s n
0 1 aaa 1
1 2 aaa 2
2 3 aaa 3
In [129]: df2
Out[129]:
0 s n
0 1.0 aaa 1.0
1 2.0 aaa 2.0
2 3.0 aaa 3.0
In [130]: (np.allclose(df1.select_dtypes(exclude=[object]), df2.select_dtypes(exclude=[object]))
.....: &
.....: df1.select_dtypes(include=[object]).equals(df2.select_dtypes(include=[object]))
.....: )
Out[130]: True
select_dtypes() will help you to separate strings and all other numeric dtypes
You can use assert_frame_equal
and not check the dtype of the columns.
# Pre v. 0.20.3
# from pandas.util.testing import assert_frame_equal
from pandas.testing import assert_frame_equal
assert_frame_equal(df1, df2, check_dtype=False)