Drop rows in a pandas DataFrame up to a certain value

Question

I’m currently working with a pandas data frame, with approximately 80000 rows, like the following one:

artist	date
Drake	2014-10-12
Kendrick Lamar	2014-10-12
Ed Sheeran	2014-10-12
Maroon 5	2014-10-12
Rihanna	2014-10-19
Foo Fighters	2014-10-19
Bad Bunny	2014-10-19
Eminem	2014-10-19
Drake	2014-10-26
Eminem	2014-10-26
Taylor Swift	2014-10-26
Kendrick Lamar	2014-10-26
Rihanna	2014-11-02
Ed Sheeran	2014-11-02
Kanye West	2014-11-02
Lime Cordiale	2014-11-02

I only want to keep the rows that have a date greater or equal to 2014-10-26. The result should be something like the following table:

artist	date
Drake	2014-10-26
Eminem	2014-10-26
Taylor Swift	2014-10-26
Kendrick Lamar	2014-10-26
Rihanna	2014-11-02
Ed Sheeran	2014-11-02
Kanye West	2014-11-02
Lime Cordiale	2014-11-02

I tried using pandas .drop() method like in the following line:

    dataset = pd.read_csv("charts.csv")
    dataset = pd.DataFrame(dataset)
    dataset = dataset.drop(dataset.loc[dataset['date'] <= "2014-10-19", :])

but after executing I get the following error:

KeyError: "['track_id', 'name', 'country', 'date', 'position', 'streams', 'artists', 'artist_genres', 'duration', 'explicit'] not found in axis"

Asked By: KurtosisCobain

||

Source

Answer 1

not sure what error you got you must have to mentioned error log.

Anyway
You can use index for drop rows, get index by filter data and then drop it

indexx = dataset[ dataset['date'] <= "2014-10-19"  ].index
dataset.drop(indexx , inplace=True)

Answered By: Farid

Answer 2

You could use:

last_date_to_drop = pd.to_datetime("2014-10-19")
dataset["date"] = pd.to_datetime(dataset["date"])
dataset = dataset.loc[dataset["date"].gt(last_date_to_drop)].copy()

You don’t need to sort or drop. Just subset the dataframe and copy as above.

Also drop is not what you think it will do. It won’t drop by row values, it drops by column or index labels.

Answered By: SomeDude

Answer 3

import pandas as pd

df = pd.DataFrame({'artist':['Drake', 'Kendrick Lamar', 'Kendrick Lamar', 'Drake'],
                   'date':['2014-10-12', '2014-10-12', '2014-10-26', '2014-10-26']})

# Be cautious : sort first
df = (df.sort_values(by='date', key=lambda t: pd.to_datetime(t, format='%Y-%m-%d')) 
        .drop_duplicates(subset=['artist'], keep='last'))

print(df)
#            artist        date
# 2  Kendrick Lamar  2014-10-26
# 3           Drake  2014-10-26

Answered By: Laurent B.

Drop rows in a pandas DataFrame up to a certain value

Question:

Answers: