Fastest Way to Drop Duplicated Index in a Pandas DataFrame

Question:

If I want to drop duplicated index in a dataframe the following doesn’t work for obvious reasons:

myDF.drop_duplicates(cols=index)

and

myDF.drop_duplicates(cols='index') 

looks for a column named ‘index’

If I want to drop an index I have to do:

myDF['index'] = myDF.index
myDF= myDF.drop_duplicates(cols='index')
myDF.set_index = myDF['index']
myDF= myDF.drop('index', axis =1)

Is there a more efficient way?

Asked By: RukTech

||

Answers:

You can use numpy.unique to obtain the index of unique values and use iloc to get those indices:

>>> df
        val
A  0.021372
B  1.229482
D -1.571025
D -0.110083
C  0.547076
B -0.824754
A -1.378705
B -0.234095
C -1.559653
B -0.531421

[10 rows x 1 columns]

>>> idx = np.unique(df.index, return_index=True)[1]
>>> df.iloc[idx]
        val
A  0.021372
B  1.229482
C  0.547076
D -1.571025

[4 rows x 1 columns]
Answered By: behzad.nouri

Simply: DF.groupby(DF.index).first()

Answered By: CT Zhu

The ‘duplicated’ method works for dataframes and for series. Just select on those rows which aren’t marked as having a duplicate index:

df[~df.index.duplicated()]
Answered By: danielstn
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.