how can I count the occurrences > than a value for each year of a data frame
Question:
I have a data frame with the values of precipitations day per day.
I would like to do a sort of resample, so instead of day per day the data is collected year per year and every year has a column that contains the number of times it rained more than a certain value.
Date
Precipitation
2000-01-01
1
2000-01-03
6
2000-01-03
5
2001-01-01
3
2001-01-02
1
2001-01-03
0
2002-01-01
10
2002-01-02
8
2002-01-03
12
what I want is to count every year how many times Precipitation > 2
Date
Count
2000
2
2001
1
2002
3
I tried using resample()
but with no results
Answers:
@Tatthew you can do this with GroupBy.apply:
import pandas as pd
df = pd.DataFrame({'Date': ['2000-01-01', '2000-01-03',
'2000-01-03', '2001-01-01',
'2001-01-02', '2001-01-03',
'2002-01-01', '2002-01-02',
'2002-01-03'],
'Precipitation': [1, 6, 5, 3, 1, 0,
10, 8, 12]})
df = df.astype({'Date': datetime64})
df.groupby(df.Date.dt.year).apply(lambda df: df.Precipitation[df.Precipitation > 2].count())
You can use this bit of code:
# convert "Precipitation" and "date" values to proper types
df['Precipitation'] = df['Precipitation'].astype(int)
df["date"] = pd.to_datetime(df["date"])
# find rows that have "Precipitation" > 2
df['Count']= df.apply(lambda x: x["Precipitation"] > 2, axis=1)
# group df by year and drop the "Precipitation" column
df.groupby(df['date'].dt.year).sum().drop(columns=['Precipitation'])
@Tatthew you can do this with query
and Groupby.size too.
import pandas as pd
df = pd.DataFrame({'Date': ['2000-01-01', '2000-01-03',
'2000-01-03', '2001-01-01',
'2001-01-02', '2001-01-03',
'2002-01-01', '2002-01-02',
'2002-01-03'],
'Precipitation': [1, 6, 5, 3, 1, 0,
10, 8, 12]})
df = df.astype({'Date': datetime64})
above_threshold = df.query('Precipitation > 2')
above_threshold.groupby(above_threshold.Date.dt.year).size()
I have a data frame with the values of precipitations day per day.
I would like to do a sort of resample, so instead of day per day the data is collected year per year and every year has a column that contains the number of times it rained more than a certain value.
Date | Precipitation |
---|---|
2000-01-01 | 1 |
2000-01-03 | 6 |
2000-01-03 | 5 |
2001-01-01 | 3 |
2001-01-02 | 1 |
2001-01-03 | 0 |
2002-01-01 | 10 |
2002-01-02 | 8 |
2002-01-03 | 12 |
what I want is to count every year how many times Precipitation > 2
Date | Count |
---|---|
2000 | 2 |
2001 | 1 |
2002 | 3 |
I tried using resample()
but with no results
@Tatthew you can do this with GroupBy.apply:
import pandas as pd
df = pd.DataFrame({'Date': ['2000-01-01', '2000-01-03',
'2000-01-03', '2001-01-01',
'2001-01-02', '2001-01-03',
'2002-01-01', '2002-01-02',
'2002-01-03'],
'Precipitation': [1, 6, 5, 3, 1, 0,
10, 8, 12]})
df = df.astype({'Date': datetime64})
df.groupby(df.Date.dt.year).apply(lambda df: df.Precipitation[df.Precipitation > 2].count())
You can use this bit of code:
# convert "Precipitation" and "date" values to proper types
df['Precipitation'] = df['Precipitation'].astype(int)
df["date"] = pd.to_datetime(df["date"])
# find rows that have "Precipitation" > 2
df['Count']= df.apply(lambda x: x["Precipitation"] > 2, axis=1)
# group df by year and drop the "Precipitation" column
df.groupby(df['date'].dt.year).sum().drop(columns=['Precipitation'])
@Tatthew you can do this with query
and Groupby.size too.
import pandas as pd
df = pd.DataFrame({'Date': ['2000-01-01', '2000-01-03',
'2000-01-03', '2001-01-01',
'2001-01-02', '2001-01-03',
'2002-01-01', '2002-01-02',
'2002-01-03'],
'Precipitation': [1, 6, 5, 3, 1, 0,
10, 8, 12]})
df = df.astype({'Date': datetime64})
above_threshold = df.query('Precipitation > 2')
above_threshold.groupby(above_threshold.Date.dt.year).size()