Add column based on Date column to the datafarme
Question:
I have a dataframe that has columns like these:
Date temp_data holiday
01.01.2000 10000 0
02.01.2000 0 1
03.01.2000 0 1
04.01.2000 0 1
05.01.2000 0 1
06.01.2000 23000 0
..
..
..
30.01.2000 200 0
31.01.2000 0 1
01.02.2000 0 1
02.02.2000 2500 0
holiday = 0 when there is data present – indicates a working day
holiday = 1 when there is no data present – indicated a non-working day
I am trying to extract two new columns
pre_long_holiday
and
post_long_holiday
the dataframe should look like this
Date temp_data holiday pre_long_hol post_long_hol
01.01.2000 10000 0 1 0
02.01.2000 0 1 0 0
03.01.2000 0 1 0 0
04.01.2000 0 1 0 0
05.01.2000 0 1 0 0
06.01.2000 23000 0 0 1
07.01.2000 2000 0 1 0
08.01.2000 0 1 0 0
09.01.2000 0 1 0 0
10.01.2000 0 1 0 0
11.01.2000 1000 0 0 1
..
..
..
30.01.2000 200 0 0 0
31.01.2000 0 1 0 0
01.02.2000 0 1 0 0
02.02.2000 2500 0 0 0
Long_holiday = holidays >=3 consecutive days
pre and post columns has 1 before and after the holiday period
Can anyone help me with this?
The data I have is a continuous time series.
Answers:
If need set only one 1
before and after holiday use Series.rolling
with sum
and test shifted values:
N = 3
m = df['holiday'].eq(0)
s = df['holiday'].rolling(N).sum()
df['pre_long_hol'] = (s.shift(-N).ge(N) & m).astype(int)
df['post_long_hol'] = (s.shift().ge(N) & m).astype(int)
print (df)
Date temp_data holiday pre_long_hol post_long_hol
0 01.01.2000 10000 0 1 0
1 02.01.2000 0 1 0 0
2 03.01.2000 0 1 0 0
3 04.01.2000 0 1 0 0
4 05.01.2000 0 1 0 0
5 06.01.2000 23000 0 0 1
6 07.01.2000 2000 0 1 0
7 08.01.2000 0 1 0 0
8 09.01.2000 0 1 0 0
9 10.01.2000 0 1 0 0
10 11.01.2000 1000 0 0 1
11 30.01.2000 200 0 0 0
12 31.01.2000 0 1 0 0
13 01.02.2000 0 1 0 0
14 02.02.2000 2500 0 0 0
EDIT: For add lengts of consecutive 0,1
is used helper Series
created by comapre shifted values with cumulative sum and then Series.map
with Series.value_counts
, last set 0
in Series.mask
:
s = df['holiday'].ne(df['holiday'].shift()).cumsum()
count = s.map(s.value_counts())
df['non-working day'] = count.mask(df['holiday'].eq(0), 0)
df['working day'] = count.mask(df['holiday'].eq(1), 0)
print (df)
Date temp_data holiday pre_long_hol post_long_hol
0 01.01.2000 10000 0 1 0
1 02.01.2000 0 1 0 0
2 03.01.2000 0 1 0 0
3 04.01.2000 0 1 0 0
4 05.01.2000 0 1 0 0
5 06.01.2000 23000 0 0 1
6 07.01.2000 2000 0 1 0
7 08.01.2000 0 1 0 0
8 09.01.2000 0 1 0 0
9 10.01.2000 0 1 0 0
10 11.01.2000 1000 0 0 1
11 30.01.2000 200 0 0 0
12 31.01.2000 0 1 0 0
13 01.02.2000 0 1 0 0
14 02.02.2000 2500 0 0 0
non-working day working day
0 0 1
1 4 0
2 4 0
3 4 0
4 4 0
5 0 2
6 0 2
7 3 0
8 3 0
9 3 0
10 0 2
11 0 2
12 2 0
13 2 0
14 0 1
There probably is a more efficient solution but here is what I came up with:
df['pre_holiday'] = 0
for i in range(len(df)):
limit = len(df) - 3
if i < limit:
if df.iloc[i+1].holiday == 1 and df.iloc[i+2].holiday == 1 and df.iloc[i+3].holiday == 1 and df.iloc[i].quantity != 0:
df.at[i, 'pre_holiday'] = 1
df['post_holiday'] = 0
for i in range(len(df)):
limit = 3
if i > limit:
if df.iloc[i-1].holiday == 1 and df.iloc[i-2].holiday == 1 and df.iloc[i-3].holiday == 1 and df.iloc[i].quantity != 0:
df.at[i, 'post_holiday'] = 1
I have a dataframe that has columns like these:
Date temp_data holiday
01.01.2000 10000 0
02.01.2000 0 1
03.01.2000 0 1
04.01.2000 0 1
05.01.2000 0 1
06.01.2000 23000 0
..
..
..
30.01.2000 200 0
31.01.2000 0 1
01.02.2000 0 1
02.02.2000 2500 0
holiday = 0 when there is data present – indicates a working day
holiday = 1 when there is no data present – indicated a non-working day
I am trying to extract two new columns
pre_long_holiday
and
post_long_holiday
the dataframe should look like this
Date temp_data holiday pre_long_hol post_long_hol
01.01.2000 10000 0 1 0
02.01.2000 0 1 0 0
03.01.2000 0 1 0 0
04.01.2000 0 1 0 0
05.01.2000 0 1 0 0
06.01.2000 23000 0 0 1
07.01.2000 2000 0 1 0
08.01.2000 0 1 0 0
09.01.2000 0 1 0 0
10.01.2000 0 1 0 0
11.01.2000 1000 0 0 1
..
..
..
30.01.2000 200 0 0 0
31.01.2000 0 1 0 0
01.02.2000 0 1 0 0
02.02.2000 2500 0 0 0
Long_holiday = holidays >=3 consecutive days
pre and post columns has 1 before and after the holiday period
Can anyone help me with this?
The data I have is a continuous time series.
If need set only one 1
before and after holiday use Series.rolling
with sum
and test shifted values:
N = 3
m = df['holiday'].eq(0)
s = df['holiday'].rolling(N).sum()
df['pre_long_hol'] = (s.shift(-N).ge(N) & m).astype(int)
df['post_long_hol'] = (s.shift().ge(N) & m).astype(int)
print (df)
Date temp_data holiday pre_long_hol post_long_hol
0 01.01.2000 10000 0 1 0
1 02.01.2000 0 1 0 0
2 03.01.2000 0 1 0 0
3 04.01.2000 0 1 0 0
4 05.01.2000 0 1 0 0
5 06.01.2000 23000 0 0 1
6 07.01.2000 2000 0 1 0
7 08.01.2000 0 1 0 0
8 09.01.2000 0 1 0 0
9 10.01.2000 0 1 0 0
10 11.01.2000 1000 0 0 1
11 30.01.2000 200 0 0 0
12 31.01.2000 0 1 0 0
13 01.02.2000 0 1 0 0
14 02.02.2000 2500 0 0 0
EDIT: For add lengts of consecutive 0,1
is used helper Series
created by comapre shifted values with cumulative sum and then Series.map
with Series.value_counts
, last set 0
in Series.mask
:
s = df['holiday'].ne(df['holiday'].shift()).cumsum()
count = s.map(s.value_counts())
df['non-working day'] = count.mask(df['holiday'].eq(0), 0)
df['working day'] = count.mask(df['holiday'].eq(1), 0)
print (df)
Date temp_data holiday pre_long_hol post_long_hol
0 01.01.2000 10000 0 1 0
1 02.01.2000 0 1 0 0
2 03.01.2000 0 1 0 0
3 04.01.2000 0 1 0 0
4 05.01.2000 0 1 0 0
5 06.01.2000 23000 0 0 1
6 07.01.2000 2000 0 1 0
7 08.01.2000 0 1 0 0
8 09.01.2000 0 1 0 0
9 10.01.2000 0 1 0 0
10 11.01.2000 1000 0 0 1
11 30.01.2000 200 0 0 0
12 31.01.2000 0 1 0 0
13 01.02.2000 0 1 0 0
14 02.02.2000 2500 0 0 0
non-working day working day
0 0 1
1 4 0
2 4 0
3 4 0
4 4 0
5 0 2
6 0 2
7 3 0
8 3 0
9 3 0
10 0 2
11 0 2
12 2 0
13 2 0
14 0 1
There probably is a more efficient solution but here is what I came up with:
df['pre_holiday'] = 0
for i in range(len(df)):
limit = len(df) - 3
if i < limit:
if df.iloc[i+1].holiday == 1 and df.iloc[i+2].holiday == 1 and df.iloc[i+3].holiday == 1 and df.iloc[i].quantity != 0:
df.at[i, 'pre_holiday'] = 1
df['post_holiday'] = 0
for i in range(len(df)):
limit = 3
if i > limit:
if df.iloc[i-1].holiday == 1 and df.iloc[i-2].holiday == 1 and df.iloc[i-3].holiday == 1 and df.iloc[i].quantity != 0:
df.at[i, 'post_holiday'] = 1