Based on Date Create Duplicate values

Question:

I have a data frame like the one below.

Event_ID    ticket  Revenue Expences    expect  Signed_Date start_Date      end_Date
G-00001     671     6720    793         50      June 2021   2021-06-13      2021-08-13
G-00002     6       56      18          100     May 2021    2021-06-13      2021-07-13
G-00003     5       78      38          100     May 2021    2021-06-14      2021-09-14
G-00004     23      34      23          NaN     June 2021   2021-06-13      2021-09-13
G-00005     4       89      43          40      June 2021   2021-06-14      2021-09-14
G-00006     60      73      20          60      April 2021  2021-06-15      2021-09-15
G-00007     60      345     110         60      June 2021   2021-06-15      2021-09-15
G-00008     89      890     NaN         NaN     June 2021   2021-06-13      2021-09-13
G-00009     0       0       0           50      May 2021    2021-06-16      2021-09-16
G-00010     6       45      16          60      June 2021   2021-06-13      2021-09-13
G-00011     3       39      23          30      June 2021   2021-06-13      2021-09-13
G-00012     2       34      72          20      June 2021   2021-06-13      2021-09-13
G-00013     4       89      48          40      June 2021   2021-06-16      2021-09-16
G-00014     32      127     35          10      April 2021  2021-05-23      2021-08-23
G-00015     3       84      28          120     April 2021  2021-05-13      2021-08-13
G-00016     1       100     25          140     March 2021  2021-03-26      2021-08-26
G-00017     23      525     39          90      May 2021    2021-05-13      2021-10-13

I want to create a duplicate Event_ID based on the "Signed_Date" field. If the "Signed_Date" field is not in the same month and year as "start_Date" I want to duplicate Event_ID and remove Expenses from the original Event_ID field and put it into the new duplicate Event_ID line. Also, I want to create the Event_date field based on the same above concept it "Signed_Date" field is not in the same month and year in "start_Date", Event_date should be the first day of the "Signed_Date" month else "start_Date"

for example, if we think the first two indexes in the above DF

Event_ID    ticket  Revenue Expences    expect  Signed_Date start_Date      end_Date
G-00001     671     6720    793         50      June 2021   2021-06-13      2021-08-13
G-00002     6       56      18          100     May 2021    2021-06-13      2021-07-13

The above should look like below

Event_ID    ticket  Revenue Expences    expect  Signed_Date start_Date      end_Date     Event_Date
G-00001     671     6720    793         50      June 2021   2021-06-13      2021-08-13   2021-06-13
G-00002     6       56      0           100     May 2021    2021-06-13      2021-07-13   2021-06-13
G-00002     0       0       18          0       May 2021    2021-06-13      2021-07-13   2021-05-01

I’m trying to do something like that

df['StartDateMonth'] = df['start_Date'].dt.month
df['SignedDateMonth'] = df['Signed_Date'].dt.month
#if those are not equal then I'm going to make a copy of that line
index_to_copy = 0
number_of_extra_copies = 1
pd.concat([df,
           pd.DataFrame(np.repeat(df.iloc[[index_to_copy]].values,
                                  number_of_extra_copies,
                                  axis=0),
                        columns=df.columns)]).sort_values(by='index').drop(columns='index').reset_index(drop=True)

But doesn’t work

prfered way

Event_ID    ticket  Revenue Expences    expect  Signed_Date start_Date      end_Date     Event_Date
G-00001     223.6   2240    264.33      16.66   June 2021   2021-06-13      2021-08-13   2021-06-13
G-00001     223.6   2240    264.33      16.66   June 2021   2021-06-13      2021-08-13   2021-07-01
G-00001     223.6   2240    264.33      16.66   June 2021   2021-06-13      2021-08-13   2021-08-01
G-00002     3       28      0           50      May 2021    2021-06-13      2021-07-13   2021-06-13
G-00002     3       28      0           50      May 2021    2021-06-13      2021-07-13   2021-07-01
G-00002     0       0       18          0       May 2021    2021-06-13      2021-07-13   2021-05-01

Thanks in advance

Asked By: rra

||

Answers:

Create a boolean mask to identify rows based on a month period. After that you can split your 2 dataframes and update each one separately:

# Prepare some data
df['Event_Date'] = df['start_Date']
signed_date = pd.to_datetime(df['Signed_Date'])

# Boolean mask based on period
m = signed_date.dt.to_period('M') != df['start_Date'].dt.to_period('M')

# Create the duplicate dataframe
df1 = df[m].assign(ticket=0, Revenue=0, expect=0, Event_Date=signed_date)

# Reset Expences values
df.loc[m, 'Expences'] = 0

# Merge the 2 dataframes
out = pd.concat([df, df1]).sort_index(ignore_index=True)

Output:

>>> out
   Event_ID  ticket  Revenue  Expences  expect Signed_Date start_Date   end_Date Event_Date
0   G-00001     671     6720     793.0    50.0   June 2021 2021-06-13 2021-06-13 2021-06-13
1   G-00002       6       56       0.0   100.0    May 2021 2021-06-13 2021-06-13 2021-06-13
2   G-00002       0        0      18.0     0.0    May 2021 2021-06-13 2021-06-13 2021-05-01
3   G-00003       5       78       0.0   100.0    May 2021 2021-06-14 2021-06-14 2021-06-14
4   G-00003       0        0      38.0     0.0    May 2021 2021-06-14 2021-06-14 2021-05-01
5   G-00004      23       34      23.0     NaN   June 2021 2021-06-13 2021-06-13 2021-06-13
6   G-00005       4       89      43.0    40.0   June 2021 2021-06-14 2021-06-14 2021-06-14
7   G-00006      60       73       0.0    60.0  April 2021 2021-06-15 2021-06-15 2021-06-15
8   G-00006       0        0      20.0     0.0  April 2021 2021-06-15 2021-06-15 2021-04-01
9   G-00007      60      345     110.0    60.0   June 2021 2021-06-15 2021-06-15 2021-06-15
10  G-00008      89      890       NaN     NaN   June 2021 2021-06-13 2021-06-13 2021-06-13
11  G-00009       0        0       0.0    50.0    May 2021 2021-06-16 2021-06-16 2021-06-16
12  G-00009       0        0       0.0     0.0    May 2021 2021-06-16 2021-06-16 2021-05-01
13  G-00010       6       45      16.0    60.0   June 2021 2021-06-13 2021-06-13 2021-06-13
14  G-00011       3       39      23.0    30.0   June 2021 2021-06-13 2021-06-13 2021-06-13
15  G-00012       2       34      72.0    20.0   June 2021 2021-06-13 2021-06-13 2021-06-13
16  G-00013       4       89      48.0    40.0   June 2021 2021-06-16 2021-06-16 2021-06-16
17  G-00014      32      127       0.0    10.0  April 2021 2021-05-23 2021-05-23 2021-05-23
18  G-00014       0        0      35.0     0.0  April 2021 2021-05-23 2021-05-23 2021-04-01
19  G-00015       3       84       0.0   120.0  April 2021 2021-05-13 2021-05-13 2021-05-13
20  G-00015       0        0      28.0     0.0  April 2021 2021-05-13 2021-05-13 2021-04-01
21  G-00016       1      100      25.0   140.0  March 2021 2021-03-26 2021-03-26 2021-03-26
22  G-00017      23      525      39.0    90.0    May 2021 2021-05-13 2021-05-13 2021-05-13
Answered By: Corralien
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.