Write a 'for' loop to compute a formula over each day of the year in a Pandas dataframe
Question:
I am new to Python ‘for’ loops, and I’m trying to compute a formula for each day of the year in a dataframe. The formula I am using is as follows: gdd = (((row_min + row_max) / 2) - 7)
. To further explain, I need to find the maximum and minimum temperature for each day, divide them by two, and then subtract 7 from that quotient.
Here is the data:
import pandas as pd
df = {'Date': ['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02','2021-01-03','2021-01-03','2021-01-03'],
'Time': ['12:00:00 AM', '1:00:00 AM', '2:00:00 AM','12:00:00 AM', '1:00:00 AM', '2:00:00 AM','12:00:00 AM', '1:00:00 AM', '2:00:00 AM'],
'TEMP': ['3', '1', '12','4', '8', '7','9', '12', '8']}
df = pd.DataFrame(df)
Converting the Date column into a datetime
format:
# Convert to datetime format
df['Date']=pd.to_datetime(df['Date'])
# Add column for day of year
df['dayofyear'] = df['Date'].dt.dayofyear
df
The output shows that a day of the year has been correctly assigned.
Here is the loop that I am trying:
for day in df['dayofyear']:
temp = df['TEMP']
row_min = temp.min()
row_max = temp.max()
gdd = (((row_min + row_max) / 2) - 7)
However, the following error is produced:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [83], in <cell line: 15>()
17 row_min = temp.min()
18 row_max = temp.max()
---> 19 gdd = (((row_min + row_max) / 2) - 7)
TypeError: unsupported operand type(s) for /: 'str' and 'int'
How can I correctly write this loop?
Answers:
Here is one way to do it without for loop
df['Date']=pd.to_datetime(df['Date'])
# Add column for day of year
df['dayofyear'] = df['Date'].dt.dayofyear
df['TEMP'] =df['TEMP'].astype(int)
df['gdd']=df.groupby(['dayofyear'])['TEMP'].transform(
lambda x: ((x.max()+ x.min())/2) * 7 )
df
Date Time TEMP gdd dayofyear
0 2021-01-01 12:00:00 AM 3 45.5 1
1 2021-01-01 1:00:00 AM 1 45.5 1
2 2021-01-01 2:00:00 AM 12 45.5 1
3 2021-01-02 12:00:00 AM 4 42.0 2
4 2021-01-02 1:00:00 AM 8 42.0 2
5 2021-01-02 2:00:00 AM 7 42.0 2
6 2021-01-03 12:00:00 AM 9 70.0 3
7 2021-01-03 1:00:00 AM 12 70.0 3
8 2021-01-03 2:00:00 AM 8 70.0 3
Alternately, if you don’t want to convert the TEMP to int, here is one way to do it
df['Date']=pd.to_datetime(df['Date'])
# Add column for day of year
df['dayofyear'] = df['Date'].dt.dayofyear
# convert the series to int prior to taking the max() or min()
df['gdd']=df.groupby(['dayofyear'])['TEMP'].transform(
lambda x: ((x.astype(int).max()+ x.astype(int).min())/2) * 7 )
df
You want:
# ensure data in numeric
df['TEMP'] = pd.to_numeric(df['TEMP'])
# group, get min/max and compute transformation in a vectorial way
g = df.groupby('dayofyear')['TEMP']
df['gdd'] = g.transform('max').add(g.transform('min')).div(2).sub(7)
output:
Date Time TEMP dayofyear gdd
0 2021-01-01 12:00:00 AM 3 1 -0.5
1 2021-01-01 1:00:00 AM 1 1 -0.5
2 2021-01-01 2:00:00 AM 12 1 -0.5
3 2021-01-02 12:00:00 AM 4 2 -1.0
4 2021-01-02 1:00:00 AM 8 2 -1.0
5 2021-01-02 2:00:00 AM 7 2 -1.0
6 2021-01-03 12:00:00 AM 9 3 3.0
7 2021-01-03 1:00:00 AM 12 3 3.0
8 2021-01-03 2:00:00 AM 8 3 3.0
I am new to Python ‘for’ loops, and I’m trying to compute a formula for each day of the year in a dataframe. The formula I am using is as follows: gdd = (((row_min + row_max) / 2) - 7)
. To further explain, I need to find the maximum and minimum temperature for each day, divide them by two, and then subtract 7 from that quotient.
Here is the data:
import pandas as pd
df = {'Date': ['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02','2021-01-03','2021-01-03','2021-01-03'],
'Time': ['12:00:00 AM', '1:00:00 AM', '2:00:00 AM','12:00:00 AM', '1:00:00 AM', '2:00:00 AM','12:00:00 AM', '1:00:00 AM', '2:00:00 AM'],
'TEMP': ['3', '1', '12','4', '8', '7','9', '12', '8']}
df = pd.DataFrame(df)
Converting the Date column into a datetime
format:
# Convert to datetime format
df['Date']=pd.to_datetime(df['Date'])
# Add column for day of year
df['dayofyear'] = df['Date'].dt.dayofyear
df
The output shows that a day of the year has been correctly assigned.
Here is the loop that I am trying:
for day in df['dayofyear']:
temp = df['TEMP']
row_min = temp.min()
row_max = temp.max()
gdd = (((row_min + row_max) / 2) - 7)
However, the following error is produced:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [83], in <cell line: 15>()
17 row_min = temp.min()
18 row_max = temp.max()
---> 19 gdd = (((row_min + row_max) / 2) - 7)
TypeError: unsupported operand type(s) for /: 'str' and 'int'
How can I correctly write this loop?
Here is one way to do it without for loop
df['Date']=pd.to_datetime(df['Date'])
# Add column for day of year
df['dayofyear'] = df['Date'].dt.dayofyear
df['TEMP'] =df['TEMP'].astype(int)
df['gdd']=df.groupby(['dayofyear'])['TEMP'].transform(
lambda x: ((x.max()+ x.min())/2) * 7 )
df
Date Time TEMP gdd dayofyear
0 2021-01-01 12:00:00 AM 3 45.5 1
1 2021-01-01 1:00:00 AM 1 45.5 1
2 2021-01-01 2:00:00 AM 12 45.5 1
3 2021-01-02 12:00:00 AM 4 42.0 2
4 2021-01-02 1:00:00 AM 8 42.0 2
5 2021-01-02 2:00:00 AM 7 42.0 2
6 2021-01-03 12:00:00 AM 9 70.0 3
7 2021-01-03 1:00:00 AM 12 70.0 3
8 2021-01-03 2:00:00 AM 8 70.0 3
Alternately, if you don’t want to convert the TEMP to int, here is one way to do it
df['Date']=pd.to_datetime(df['Date'])
# Add column for day of year
df['dayofyear'] = df['Date'].dt.dayofyear
# convert the series to int prior to taking the max() or min()
df['gdd']=df.groupby(['dayofyear'])['TEMP'].transform(
lambda x: ((x.astype(int).max()+ x.astype(int).min())/2) * 7 )
df
You want:
# ensure data in numeric
df['TEMP'] = pd.to_numeric(df['TEMP'])
# group, get min/max and compute transformation in a vectorial way
g = df.groupby('dayofyear')['TEMP']
df['gdd'] = g.transform('max').add(g.transform('min')).div(2).sub(7)
output:
Date Time TEMP dayofyear gdd
0 2021-01-01 12:00:00 AM 3 1 -0.5
1 2021-01-01 1:00:00 AM 1 1 -0.5
2 2021-01-01 2:00:00 AM 12 1 -0.5
3 2021-01-02 12:00:00 AM 4 2 -1.0
4 2021-01-02 1:00:00 AM 8 2 -1.0
5 2021-01-02 2:00:00 AM 7 2 -1.0
6 2021-01-03 12:00:00 AM 9 3 3.0
7 2021-01-03 1:00:00 AM 12 3 3.0
8 2021-01-03 2:00:00 AM 8 3 3.0