How do I remove rows based on multiple conditions in Python / Pandas dataframe?
Question:
I have a table which looks something like this:
Identified
Software
Version
Date
0
Microsoft Office
2
2022-05-25
0
Microsoft Office
1
2022-03-21
0
Adobe Photoshop
2
2022-04-20
1
Adobe Photoshop
1
2021-04-04
The ‘Identified’ column is a column I have created using this code:
import pandas as pd
import datetime as dt
dfcheck = pd.read_csv('version-data.csv', encoding='utf8')
df = pd.DataFrame(dfcheck)
olderdata = dt.date.today() - pd.DateOffset(years=1)
df['Identified'] = (df['Date'] <= olderdata).astype(int)
In this I have marked everything older than one year. So now what I’m trying to do is create a new dataframe which shows all software packages which have been identified. Here is the output I am looking for:
Identified
Software
Version
Date
0
Adobe Photoshop
2
2022-04-20
1
Adobe Photoshop
1
2021-04-04
How do I achieve this?
Answers:
You can use groupby.filter
:
out = df.groupby('Software').filter(lambda x: (x.Identified==1).any())
print (out)
Identified Software Version Date
2 0 Adobe Photoshop 2 2022-04-20
3 1 Adobe Photoshop 1 2021-04-04
I have a table which looks something like this:
Identified | Software | Version | Date |
---|---|---|---|
0 | Microsoft Office | 2 | 2022-05-25 |
0 | Microsoft Office | 1 | 2022-03-21 |
0 | Adobe Photoshop | 2 | 2022-04-20 |
1 | Adobe Photoshop | 1 | 2021-04-04 |
The ‘Identified’ column is a column I have created using this code:
import pandas as pd
import datetime as dt
dfcheck = pd.read_csv('version-data.csv', encoding='utf8')
df = pd.DataFrame(dfcheck)
olderdata = dt.date.today() - pd.DateOffset(years=1)
df['Identified'] = (df['Date'] <= olderdata).astype(int)
In this I have marked everything older than one year. So now what I’m trying to do is create a new dataframe which shows all software packages which have been identified. Here is the output I am looking for:
Identified | Software | Version | Date |
---|---|---|---|
0 | Adobe Photoshop | 2 | 2022-04-20 |
1 | Adobe Photoshop | 1 | 2021-04-04 |
How do I achieve this?
You can use groupby.filter
:
out = df.groupby('Software').filter(lambda x: (x.Identified==1).any())
print (out)
Identified Software Version Date
2 0 Adobe Photoshop 2 2022-04-20
3 1 Adobe Photoshop 1 2021-04-04