Extract all entries from a pandas df where the values are the same across all years
Question:
I have a dataframe that looks like this (with many more other countries, this is a sample):
df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}
df = pd.DataFrame(df_dict)
I want to extract all the entries where the value
is the same across all years. Sometimes it could be 100
, sometimes it could be another value, but the example here is with 100
.
I’m not really sure how to go about this
The output should look like this.
df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022],
'value': [100, 100, 100, 100, 100, 100,100]}
df2 = pd.DataFrame(df_dict2)
Answers:
If you want to know the countries that have the same value across all years use groupby.nunique
:
s = df.groupby('country')['value'].nunique()
out = list(s[s.eq(1)].index)
Output: ['Greece']
If you also want the value, go for a groupby.agg
with boolean indexing through loc
:
(df.groupby('country')['value'].agg(['nunique', 'first'])
.loc[lambda d: d.pop('nunique').eq(1), 'first']
)
Output:
country
Greece 100
Name: first, dtype: int64
edit: filtering the original DataFrame:
s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s[s.eq(1)].index)]
Or directly:
df[df.groupby('country')['value'].transform('nunique').eq(1)]
Output:
country year value
7 Greece 2016 100
8 Greece 2017 100
9 Greece 2018 100
10 Greece 2019 100
11 Greece 2020 100
12 Greece 2021 100
13 Greece 2022 100
I have a dataframe that looks like this (with many more other countries, this is a sample):
df_dict = {'country': ['Japan','Japan','Japan','Japan','Japan','Japan','Japan', 'Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022,2016, 2017,2018,2019,2020,2021,2022],
'value': [320, 416, 172, 652, 390, 570, 803, 100, 100, 100, 100, 100, 100,100]}
df = pd.DataFrame(df_dict)
I want to extract all the entries where the value
is the same across all years. Sometimes it could be 100
, sometimes it could be another value, but the example here is with 100
.
I’m not really sure how to go about this
The output should look like this.
df_dict2 = {'country': ['Greece','Greece','Greece','Greece','Greece','Greece','Greece'],
'year': [2016, 2017,2018,2019,2020,2021,2022],
'value': [100, 100, 100, 100, 100, 100,100]}
df2 = pd.DataFrame(df_dict2)
If you want to know the countries that have the same value across all years use groupby.nunique
:
s = df.groupby('country')['value'].nunique()
out = list(s[s.eq(1)].index)
Output: ['Greece']
If you also want the value, go for a groupby.agg
with boolean indexing through loc
:
(df.groupby('country')['value'].agg(['nunique', 'first'])
.loc[lambda d: d.pop('nunique').eq(1), 'first']
)
Output:
country
Greece 100
Name: first, dtype: int64
edit: filtering the original DataFrame:
s = df.groupby('country')['value'].nunique()
df[df['country'].isin(s[s.eq(1)].index)]
Or directly:
df[df.groupby('country')['value'].transform('nunique').eq(1)]
Output:
country year value
7 Greece 2016 100
8 Greece 2017 100
9 Greece 2018 100
10 Greece 2019 100
11 Greece 2020 100
12 Greece 2021 100
13 Greece 2022 100