pandas remove the duplicated row base on same columns values

Question:

I have a df like this:

    Date    Model   High     Low    Final
1   9132022 model6  4.36000  2.39   3.10
2   9132022 model4  10.92000 2.87   8.32
3   9132022 model6  4.36000  2.39   3.73
4   9132022 model6  4.36000  2.39   3.10
5   9132022 model6  4.36000  2.39   2.47

6   9142022 model6  41.3600 21.39   31.10
7   9142022 model4  110.920 21.87   81.32
8   9142022 model6  41.3600 21.39   31.73
9   9142022 model6  41.3600 21.39   31.10
10  9142022 model6  41.3600 21.39   21.47

If the Date and Model are the same,just keep the first record,the output should be:

        Date    Model   High     Low    Final
    1   9132022 model6  4.36000  2.39   3.10
    2   9132022 model4  10.92000 2.87   8.32
 

    3   9142022 model6  41.3600 21.39   31.10
    4   9142022 model4  110.920 21.87   81.32
   
Asked By: William

||

Answers:

If the name of the variable for the DataFrame is df then:

df.groupby(['Date', 'Model']).head(1)
Answered By: Alex

Okay so first we need to recreate OP’s dataframe:

df = pd.DataFrame({"Date": [9132022, 9132022, 9132022, 9132022, 9132022, 9142022, 9142022, 9142022, 9142022, 9142022],
                   "Model": ["model6", "model4", "model6", "model6", "model6", "model6", "model4", "model6", "model6", "model6"],
                   "High": [4.36000,10.92000,4.36000,4.36000,4.36000,41.3600,110.920,41.3600,41.3600,41.3600],
                   "Low": [2.39,2.87,2.39,2.39,2.39,21.39,21.87,21.39,21.39,21.39],
                   "Final":[3.10,8.32,3.73,3.10,2.47,31.10,81.32,31.73,31.10,21.47]
                   })

Then what you need to do is group by Date and Model columns, and then return the first occurence of everything by using the first aggregate function:

df.groupby(["Date","Model"],as_index=False).first()

outputs:

0   9132022 model4  10.92   2.87    8.32
1   9132022 model6  4.36    2.39    3.10
2   9142022 model4  110.92  21.87   81.32
3   9142022 model6  41.36   21.39   31.10

This messes up the index a little, but if you want to keep the original index you can df = df.reset_index() before the grouping.

For future reference please consider providing the original dataframe (in code) so that people that want to look into it can recreate it easily, without having to manually copy & paste values.

If this solved your problem please mark the answer as solution. 🙂

Answered By: Nikos Maniaths

here is another way to it

df.drop_duplicates(subset=['Date','Model'])
       Date     Model   High    Low     Final
0   9132022     model6  4.36    2.39    3.10
1   9132022     model4  10.92   2.87    8.32
5   9142022     model6  41.36   21.39   31.10
6   9142022     model4  110.92  21.87   81.32
Answered By: Naveed
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.