Replicating rows of a Pandas dataframe based on a column condition

Question

I have a dataframe looking like this:

Weekday Day_in_Month Starting_hour Ending_hour Power
3       1            1             3           35
3       1            3             7           15
4       2            22            2           5
.
.
.

I want to duplicate every column until the Starting_hour matches the Ending_hour.

-> All values of the row should be the same, but the Starting_hour value should change by Starting_hour + 1 for every new row.

The final dataframe should look like the following:

Weekday Day_in_Month Starting_hour Ending_hour Power
3       1            1             3           35
3       1            2             3           35
3       1            3             3           35
3       1            3             7           15
3       1            4             7           15
3       1            5             7           15
3       1            6             7           15
3       1            7             7           15
4       2            22            2           5
4       2            23            2           5
4       2            24            2           5
4       2            1             2           5
4       2            2             2           5

I appreciate any ideas on it, thanks!

Asked By: OAP

||

Source

Answer 1

Use Index.repeat with subtracted values and repeat rows by DataFrame.loc, then add counter to Starting_hour by GroupBy.cumcount:

df1 = df.loc[df.index.repeat(df['Ending_hour'].sub(df['Starting_hour']).add(1))]
df1['Starting_hour'] += df1.groupby(level=0).cumcount()
df1 = df1.reset_index(drop=True)
print (df1)

EDIT: If possible greater Starting_hour add 24 to Ending_hour, then in last step remove 1 for starting hours by 0, use modulo by 24 and last add 1:

m = df['Starting_hour'].gt(df['Ending_hour'])
e = df['Ending_hour'].mask(m, df['Ending_hour'].add(24))
df1 = df.loc[df.index.repeat(e.sub(df['Starting_hour']).add(1))]
df1['Starting_hour'] = (df1['Starting_hour'].add(df1.groupby(level=0).cumcount())
                           .sub(1).mod(24).add(1))
df1 = df1.reset_index(drop=True)
print (df1)
    Weekday  Day_in_Month  Starting_hour  Ending_hour  Power
0         3             1              1            3     35
1         3             1              2            3     35
2         3             1              3            3     35
3         3             1              3            7     15
4         3             1              4            7     15
5         3             1              5            7     15
6         3             1              6            7     15
7         3             1              7            7     15
8         4             2             22            2      5
9         4             2             23            2      5
10        4             2             24            2      5
11        4             2              1            2      5
12        4             2              2            2      5

Answered By: jezrael

Replicating rows of a Pandas dataframe based on a column condition

Question:

Answers: