Highlight Changes in the same Column in new Dataframe

Question:

I have a dataframe as such –

| ID   | Status | Data          | Column |
|:----:|:------:|:-------------:|:-------|
| 1    | A      | Old           |111     |
| 2    | B      | Old           |222     |
| 3    | C      | Old           |333     |
| 2    | C      | New           |222     |
| 3    | D      | New           |333     |
| 1    | E      | New           |111     |
| 4    | E      | New           |444     |
| 4    | E      | Old           |444     |

I want the output to highlight changes based on ID and Data like this , this should be regardless of IDs and Data columns position in row as given in sample above.

So, the result should look like this –

| ID   | Change Status | Data          | Final Status| Column | 
|:----:|:-------------:|:-------------:|:-----------:|:------:|
| 1    | A    -> E     | New           | E           | 111    | 
| 2    | B    -> C     | New           | C           | 222    |
| 3    | C    -> D     | New           | D           | 333    |
| 4    | No Change     | New           | E           | 444    |
Asked By: DJKarma

||

Answers:

IIUC you can split the data by grouping them and then merge them together.

(idx1, grp1), (idx2, grp2) = df.groupby('Data')

out = grp2[['ID', 'Status']].merge(grp1, on='ID', suffixes=['_Change', '_Final'])
print(out)
   ID Status_Change Status_Final Data  Column
0   1             A            E  New     111
1   2             B            C  New     222
2   3             C            D  New     333
3   4             E            E  New     444
Answered By: Rabinzel

I would pivot this dataframe to get the changes:

df = df.pivot(index=['ID', 'Column'], columns='Data', values='Status').reset_index()

    Data  ID  Column New Old
0      1     111   E   A
1      2     222   C   B
2      3     333   D   C
3      4     444   E   E

You can then add in the other bits to adjust this into the desired format.

To get the 'Change Status' column, you can use numpy’s select, which will assign a different value if the old and new columns match or not, as below:

values = [df['Old'] + ' -> ' + df['New'], 'No Change']
conditions = [df['New'] != df['Old'], df['New'] == df['Old']]
df['Change Status'] = np.select(conditions, values)

    Data  ID  Column New Old Change Status
0      1     111   E   A        A -> E
1      2     222   C   B        B -> C
2      3     333   D   C        C -> D
3      4     444   E   E     No Change

Then the 'Data' column is just populated with ‘New’, so this can be done using:

df['Data'] = 'New'

    Data  ID  Column New Old Change Status Data
0      1     111   E   A        A -> E  New
1      2     222   C   B        B -> C  New
2      3     333   D   C        C -> D  New
3      4     444   E   E     No Change  New

The 'Final Status' column is just the 'New' column, so you can rename this:

df = df.rename(columns={'New':'Final Status'})

    Data  ID  Column Final Status Old Change Status Data
0      1     111            E   A        A -> E  New
1      2     222            C   B        B -> C  New
2      3     333            D   C        C -> D  New
3      4     444            E   E     No Change  New

Now you can just select the columns you want, in the order above:

df = df[['ID', 'Change Status', 'Data', 'Final Status', 'Column']]

    
    Data  ID Change Status Data Final Status  Column
0      1        A -> E  New            E     111
1      2        B -> C  New            C     222
2      3        C -> D  New            D     333
3      4     No Change  New            E     444

(If you want ID to be a column not an index, you can just use .reset_index())

Answered By: Emi OB
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.