How to create miultiple rows from a single row?

Question

I’m trying to come up with a program that creates multiple rows and columns for each row based on a value in a column.

Here’s a look at my data

import pandas as pd

data = pd.read_excel("test data.xlsx")

Id	#ofweeks	Manhours	StartDate	EndDate	Startingyear	StartingWeek
aaa	2	10	1/15/2023	1/29/2023	2023	3
bbb	3	12	2/12/2023	3/05/2023	2023	7

The table needs to be expanded so that every row is expanded by the number of weeks.
There needs to be columns added for the Labor hours per week and columns which count the number of weeks for each Id.

The results should look like this

Id	#ofweeks	Manhours	StartDate	EndDate	Startingyear	StartingWeek	WeekCount	Year	Week#
aaa	2	10	1/15/2023	1/29/2023	2023	3	1	2023	3
aaa	2	10	1/15/2023	1/29/2023	2023	3	2	2023	4
bbb	3	12	2/12/2023	3/05/2023	2023	7	1	2023	7
bbb	3	12	2/12/2023	3/05/2023	2023	7	2	2023	8
bbb	3	12	2/12/2023	3/05/2023	2023	7	3	2023	10

I’ve been able to get the table in the format I needed. However there’s one more issue.

I’ve added the following columns

# Add column for number of week for each expanded job record row
df['Week Count'] = df.groupby(['Id']).cumcount() + 1 

# Add column for year for each job record row
from math import floor
df['Year'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
                   df['Starting Year'] + floor((df['Starting Week period'] + df['Week Count'])/52),
                   df['Starting Year'])

# Add column for the number of week for the calendar year for each job record row
df['Week #'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
                   (df['Starting Week period'] + df['Week Count']-53),
                    df['Starting Week period'] + df['Week Count']-1)

# Add leading 0 to the Week # Column
df['Week #'] = df['Week #'].astype(str).str.pad(2, side = 'left', fillchar = '0')

# Add a column Period which concatenates the Year and Week #  columns 
df['Period'] = df['Year'].astype(str) + "-" + df['Week  #'].astype(str)

However, This is giving me the following error:

TypeError                                 Traceback (most recent call last)
Cell In[6], line 7
      4 # Add column for year for each job record row
      5 from math import floor
      6 df['Year'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
----> 7                        df['Starting Year'] + floor((df['Starting Week period'] + df['Week Count'])/52), 
      8                        df['Starting Year'])
     10 # Add column for the number of week for the calendar year for each job record row
     11 df['Week #'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
     12                        (df['Starting Week period'] + df['Week Count']-53),
     13                         df['Starting Week period'] + df['Week Count']-1)

File /opt/anaconda3/lib/python3.9/site-packages/pandas/core/series.py:191, in _coerce_method.<locals>.wrapper(self)
    189 if len(self) == 1:
    190     return converter(self.iloc[0])
--> 191 raise TypeError(f"cannot convert the series to {converter}")

TypeError: cannot convert the series to <class 'float'>

Asked By: user21126867

||

Source

Answer 1

Once you have expended the dataframe, I would add the columns such as:

df['WeekCount'] = df.groupby('Id')['Id'].cumcount() + 1

To account for activities that go past a calendar year you can try:

df['Week#'] = np.where((df['StartingWeek'] + df['WeekCount']-1) > 52,
                       (df['StartingWeek'] + df['WeekCount']-53),
                        df['StartingWeek'] + df['WeekCount']-1)

To add a "current year" (based on Edit) you can try:

from math import floor
df['CurrentYear'] = np.where((df['StartingWeek'] + df['WeekCount']-1) > 52,
                       df['Startingyear'] + floor((df['StartingWeek'] + df['WeekCount'])/52),
                       df['Startingyear'])

Answered By: Celius Stingher

How to create miultiple rows from a single row?

Question:

Answers: