Write a 'for' loop to compute a formula over each day of the year in a Pandas dataframe

Question:

I am new to Python ‘for’ loops, and I’m trying to compute a formula for each day of the year in a dataframe. The formula I am using is as follows: gdd = (((row_min + row_max) / 2) - 7). To further explain, I need to find the maximum and minimum temperature for each day, divide them by two, and then subtract 7 from that quotient.

Here is the data:

import pandas as pd

df = {'Date': ['2021-01-01', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02','2021-01-03','2021-01-03','2021-01-03'],
     'Time': ['12:00:00 AM', '1:00:00 AM', '2:00:00 AM','12:00:00 AM', '1:00:00 AM', '2:00:00 AM','12:00:00 AM', '1:00:00 AM', '2:00:00 AM'],
     'TEMP': ['3', '1', '12','4', '8', '7','9', '12', '8']}

df = pd.DataFrame(df)

Converting the Date column into a datetime format:

# Convert to datetime format
df['Date']=pd.to_datetime(df['Date'])

# Add column for day of year
df['dayofyear'] = df['Date'].dt.dayofyear
df

The output shows that a day of the year has been correctly assigned.

Here is the loop that I am trying:

for day in df['dayofyear']:
    temp = df['TEMP']
    row_min = temp.min()
    row_max = temp.max()
    gdd = (((row_min + row_max) / 2) - 7)

However, the following error is produced:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [83], in <cell line: 15>()
     17 row_min = temp.min()
     18 row_max = temp.max()
---> 19 gdd = (((row_min + row_max) / 2) - 7)

TypeError: unsupported operand type(s) for /: 'str' and 'int'

How can I correctly write this loop?

Asked By: s_o_c_account

||

Answers:

Here is one way to do it without for loop

df['Date']=pd.to_datetime(df['Date'])

# Add column for day of year
df['dayofyear'] = df['Date'].dt.dayofyear

df['TEMP'] =df['TEMP'].astype(int)


df['gdd']=df.groupby(['dayofyear'])['TEMP'].transform(
    lambda x:  ((x.max()+ x.min())/2) * 7 )
df


    Date        Time         TEMP   gdd     dayofyear
0   2021-01-01  12:00:00 AM     3   45.5    1
1   2021-01-01  1:00:00 AM      1   45.5    1
2   2021-01-01  2:00:00 AM     12   45.5    1
3   2021-01-02  12:00:00 AM     4   42.0    2
4   2021-01-02  1:00:00 AM      8   42.0    2
5   2021-01-02  2:00:00 AM      7   42.0    2
6   2021-01-03  12:00:00 AM     9   70.0    3
7   2021-01-03  1:00:00 AM     12   70.0    3
8   2021-01-03  2:00:00 AM      8   70.0    3

Alternately, if you don’t want to convert the TEMP to int, here is one way to do it

df['Date']=pd.to_datetime(df['Date'])

# Add column for day of year
df['dayofyear'] = df['Date'].dt.dayofyear


# convert the series to int prior to taking the max() or min()
df['gdd']=df.groupby(['dayofyear'])['TEMP'].transform(
    lambda x:  ((x.astype(int).max()+ x.astype(int).min())/2) * 7 )
df
Answered By: Naveed

You want:

# ensure data in numeric
df['TEMP'] = pd.to_numeric(df['TEMP'])

# group, get min/max and compute transformation in a vectorial way
g = df.groupby('dayofyear')['TEMP']
df['gdd'] = g.transform('max').add(g.transform('min')).div(2).sub(7)

output:

        Date         Time  TEMP  dayofyear  gdd
0 2021-01-01  12:00:00 AM     3          1 -0.5
1 2021-01-01   1:00:00 AM     1          1 -0.5
2 2021-01-01   2:00:00 AM    12          1 -0.5
3 2021-01-02  12:00:00 AM     4          2 -1.0
4 2021-01-02   1:00:00 AM     8          2 -1.0
5 2021-01-02   2:00:00 AM     7          2 -1.0
6 2021-01-03  12:00:00 AM     9          3  3.0
7 2021-01-03   1:00:00 AM    12          3  3.0
8 2021-01-03   2:00:00 AM     8          3  3.0
Answered By: mozway
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.