Add column based on Date column to the datafarme

Question:

I have a dataframe that has columns like these:

Date          temp_data        holiday              

01.01.2000    10000              0                
02.01.2000    0                  1                
03.01.2000    0                  1                
04.01.2000    0                  1
05.01.2000    0                  1
06.01.2000    23000              0
..
..
..
30.01.2000    200                0                
31.01.2000     0                 1                
01.02.2000     0                 1                 
02.02.2000    2500               0                

holiday = 0 when there is data present – indicates a working day

holiday = 1 when there is no data present – indicated a non-working day

I am trying to extract two new columns
pre_long_holiday
and
post_long_holiday

the dataframe should look like this

 Date          temp_data      holiday   pre_long_hol   post_long_hol 

01.01.2000    10000              0                1            0
02.01.2000    0                  1                0            0
03.01.2000    0                  1                0            0
04.01.2000    0                  1                0            0
05.01.2000    0                  1                0            0
06.01.2000    23000              0                0            1
07.01.2000    2000               0                1            0
08.01.2000    0                  1                0            0
09.01.2000    0                  1                0            0
10.01.2000    0                  1                0            0
11.01.2000    1000               0                0            1
..
..
..
30.01.2000    200                0                0            0          
31.01.2000     0                 1                0            0
01.02.2000     0                 1                0            0
02.02.2000    2500               0                0            0

Long_holiday = holidays >=3 consecutive days
pre and post columns has 1 before and after the holiday period

Can anyone help me with this?

The data I have is a continuous time series.

Asked By: bella_pa

||

Answers:

If need set only one 1 before and after holiday use Series.rolling with sum and test shifted values:

N = 3
m = df['holiday'].eq(0)
s = df['holiday'].rolling(N).sum()
df['pre_long_hol'] =  (s.shift(-N).ge(N) & m).astype(int)
df['post_long_hol'] = (s.shift().ge(N) & m).astype(int)

print (df)
          Date  temp_data  holiday  pre_long_hol  post_long_hol
0   01.01.2000      10000        0             1              0
1   02.01.2000          0        1             0              0
2   03.01.2000          0        1             0              0
3   04.01.2000          0        1             0              0
4   05.01.2000          0        1             0              0
5   06.01.2000      23000        0             0              1
6   07.01.2000       2000        0             1              0
7   08.01.2000          0        1             0              0
8   09.01.2000          0        1             0              0
9   10.01.2000          0        1             0              0
10  11.01.2000       1000        0             0              1
11  30.01.2000        200        0             0              0
12  31.01.2000          0        1             0              0
13  01.02.2000          0        1             0              0
14  02.02.2000       2500        0             0              0

EDIT: For add lengts of consecutive 0,1 is used helper Series created by comapre shifted values with cumulative sum and then Series.map with Series.value_counts, last set 0 in Series.mask:

s = df['holiday'].ne(df['holiday'].shift()).cumsum()
count = s.map(s.value_counts())

df['non-working day'] = count.mask(df['holiday'].eq(0), 0)
df['working day'] = count.mask(df['holiday'].eq(1), 0)

print (df)
          Date  temp_data  holiday  pre_long_hol  post_long_hol  
0   01.01.2000      10000        0             1              0   
1   02.01.2000          0        1             0              0   
2   03.01.2000          0        1             0              0   
3   04.01.2000          0        1             0              0   
4   05.01.2000          0        1             0              0   
5   06.01.2000      23000        0             0              1   
6   07.01.2000       2000        0             1              0   
7   08.01.2000          0        1             0              0   
8   09.01.2000          0        1             0              0   
9   10.01.2000          0        1             0              0   
10  11.01.2000       1000        0             0              1   
11  30.01.2000        200        0             0              0   
12  31.01.2000          0        1             0              0   
13  01.02.2000          0        1             0              0   
14  02.02.2000       2500        0             0              0   

    non-working day  working day  
0                 0            1  
1                 4            0  
2                 4            0  
3                 4            0  
4                 4            0  
5                 0            2  
6                 0            2  
7                 3            0  
8                 3            0  
9                 3            0  
10                0            2  
11                0            2  
12                2            0  
13                2            0  
14                0            1  
Answered By: jezrael

There probably is a more efficient solution but here is what I came up with:

df['pre_holiday'] = 0

for i in range(len(df)):
    limit = len(df) - 3
    if i < limit:
        if df.iloc[i+1].holiday == 1 and df.iloc[i+2].holiday == 1 and df.iloc[i+3].holiday == 1 and df.iloc[i].quantity != 0:
            df.at[i, 'pre_holiday'] = 1

df['post_holiday'] = 0

for i in range(len(df)):
    limit = 3
    if i > limit:
        if df.iloc[i-1].holiday == 1 and df.iloc[i-2].holiday == 1 and df.iloc[i-3].holiday == 1 and df.iloc[i].quantity != 0:
            df.at[i, 'post_holiday'] = 1
Answered By: elnoidelfarro
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.