how can I count the occurrences > than a value for each year of a data frame

Question:

I have a data frame with the values of precipitations day per day.
I would like to do a sort of resample, so instead of day per day the data is collected year per year and every year has a column that contains the number of times it rained more than a certain value.

Date Precipitation
2000-01-01 1
2000-01-03 6
2000-01-03 5
2001-01-01 3
2001-01-02 1
2001-01-03 0
2002-01-01 10
2002-01-02 8
2002-01-03 12

what I want is to count every year how many times Precipitation > 2

Date Count
2000 2
2001 1
2002 3

I tried using resample() but with no results

Asked By: Tatthew

||

Answers:

@Tatthew you can do this with GroupBy.apply:

import pandas as pd
df = pd.DataFrame({'Date': ['2000-01-01', '2000-01-03',
                            '2000-01-03', '2001-01-01',
                            '2001-01-02', '2001-01-03',
                            '2002-01-01', '2002-01-02',
                            '2002-01-03'],
                   'Precipitation': [1, 6, 5, 3, 1, 0,
                                     10, 8, 12]})
df = df.astype({'Date': datetime64})
df.groupby(df.Date.dt.year).apply(lambda df: df.Precipitation[df.Precipitation > 2].count())
Answered By: Mahesh Vashishtha

You can use this bit of code:

# convert "Precipitation" and "date" values to proper types
df['Precipitation'] = df['Precipitation'].astype(int)
df["date"] = pd.to_datetime(df["date"])

# find rows that have "Precipitation" > 2
df['Count']= df.apply(lambda x: x["Precipitation"] > 2, axis=1)

# group df by year and drop the "Precipitation" column
df.groupby(df['date'].dt.year).sum().drop(columns=['Precipitation'])
Answered By: amirhossein nazary

@Tatthew you can do this with query and Groupby.size too.

import pandas as pd
df = pd.DataFrame({'Date': ['2000-01-01', '2000-01-03',
                            '2000-01-03', '2001-01-01',
                            '2001-01-02', '2001-01-03',
                            '2002-01-01', '2002-01-02',
                            '2002-01-03'],
                   'Precipitation': [1, 6, 5, 3, 1, 0,
                                     10, 8, 12]})
df = df.astype({'Date': datetime64})
above_threshold = df.query('Precipitation > 2')
above_threshold.groupby(above_threshold.Date.dt.year).size()
Answered By: Mahesh Vashishtha
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.