Pandas Difference in DataFrames

Question:

I am trying to figure out how to display the differences between 2 Pandas DataFrames. I am nearly there but can’t seem to figure out how to display additional data for rows that contain a difference.

Here is what I have so far:

Compare DataFrame A to DataFrame B:

DataFrame A:

Date  ID_1 ID_2 Value
1-Jan   1   1   5
2-Jan   1   2   6
3-Jan   1   3   4
4-Jan   1   4   2
5-Jan   1   5   8

DataFrame B:

Date  ID_1 ID_2 Value
1-Jan   1   1   5
2-Jan   1   2   6
3-Jan   1   3   4
4-Jan   1   4   2
5-Jan   1   5   55

Current Output:

Date    Column  From To
5-Jan   Value   8    55

Desired Output:

Date    ID_1 ID_2 From  To
5-Jan   1    5     8    55

Current Code:

#stack column(s) where dataframes are not equal
ne_stacked = (df1 != df2).stack()

#create new dataframe from ne_stacked
changed = ne_stacked[ne_stacked]

#change column names
changed.index.names = ['Date', 'Column']

#create array where dataframes are not equal
diff_loc = np.where(df1 != df2)

#create 'from' column
changed_from = df1.values[diff_loc]

#create 'to' column
changed_to = df2.values[diff_loc]

#create a summary dataframe
final = pd.DataFrame({'From': changed_from, 'To': changed_to}, index=changed.index)

print final
Asked By: bsheehy

||

Answers:

Try this:

dfm = df1.merge(df2, on=['Date', 'ID_1', 'ID_2']).rename(columns={'Value_x':'From', 'Value_y':'To'})
print dfm[dfm.From != dfm.To]

    Date  ID_1  ID_2  From  To
4  5-Jan     1     5     8  55
Answered By: Primer

use merge:

In [29]:

print df_a
    Date  ID_1  ID_2  Value
0  1-Jan     1     1      5
1  2-Jan     1     2      6
2  3-Jan     1     3      4
3  4-Jan     1     4      2
4  5-Jan     1     5      8
In [30]:

print df_b
    Date  ID_1  ID_2  Value
0  1-Jan     1     1      5
1  2-Jan     1     2      6
2  3-Jan     1     3      4
3  4-Jan     1     4      2
4  5-Jan     1     5     55
In [31]:

df_c = pd.merge(df_a, df_b,
                how='outer',
                on=['Date', 'ID_1', 'ID_2'])
df_c.columns = ['Date', 'ID_1', 'ID_2', 'From', 'To']
df_c = df_c[df_c.From!=df_c.To]
print df_c
    Date  ID_1  ID_2  From  To
4  5-Jan     1     5     8  55
Answered By: CT Zhu
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.