How to create miultiple rows from a single row?
Question:
I’m trying to come up with a program that creates multiple rows and columns for each row based on a value in a column.
Here’s a look at my data
import pandas as pd
data = pd.read_excel("test data.xlsx")
Id
#ofweeks
Manhours
StartDate
EndDate
Startingyear
StartingWeek
aaa
2
10
1/15/2023
1/29/2023
2023
3
bbb
3
12
2/12/2023
3/05/2023
2023
7
The table needs to be expanded so that every row is expanded by the number of weeks.
There needs to be columns added for the Labor hours per week and columns which count the number of weeks for each Id.
The results should look like this
Id
#ofweeks
Manhours
StartDate
EndDate
Startingyear
StartingWeek
WeekCount
Year
Week#
aaa
2
10
1/15/2023
1/29/2023
2023
3
1
2023
3
aaa
2
10
1/15/2023
1/29/2023
2023
3
2
2023
4
bbb
3
12
2/12/2023
3/05/2023
2023
7
1
2023
7
bbb
3
12
2/12/2023
3/05/2023
2023
7
2
2023
8
bbb
3
12
2/12/2023
3/05/2023
2023
7
3
2023
10
I’ve been able to get the table in the format I needed. However there’s one more issue.
I’ve added the following columns
# Add column for number of week for each expanded job record row
df['Week Count'] = df.groupby(['Id']).cumcount() + 1
# Add column for year for each job record row
from math import floor
df['Year'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
df['Starting Year'] + floor((df['Starting Week period'] + df['Week Count'])/52),
df['Starting Year'])
# Add column for the number of week for the calendar year for each job record row
df['Week #'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
(df['Starting Week period'] + df['Week Count']-53),
df['Starting Week period'] + df['Week Count']-1)
# Add leading 0 to the Week # Column
df['Week #'] = df['Week #'].astype(str).str.pad(2, side = 'left', fillchar = '0')
# Add a column Period which concatenates the Year and Week # columns
df['Period'] = df['Year'].astype(str) + "-" + df['Week #'].astype(str)
However, This is giving me the following error:
TypeError Traceback (most recent call last)
Cell In[6], line 7
4 # Add column for year for each job record row
5 from math import floor
6 df['Year'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
----> 7 df['Starting Year'] + floor((df['Starting Week period'] + df['Week Count'])/52),
8 df['Starting Year'])
10 # Add column for the number of week for the calendar year for each job record row
11 df['Week #'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
12 (df['Starting Week period'] + df['Week Count']-53),
13 df['Starting Week period'] + df['Week Count']-1)
File /opt/anaconda3/lib/python3.9/site-packages/pandas/core/series.py:191, in _coerce_method.<locals>.wrapper(self)
189 if len(self) == 1:
190 return converter(self.iloc[0])
--> 191 raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
Answers:
Once you have expended the dataframe, I would add the columns such as:
df['WeekCount'] = df.groupby('Id')['Id'].cumcount() + 1
To account for activities that go past a calendar year you can try:
df['Week#'] = np.where((df['StartingWeek'] + df['WeekCount']-1) > 52,
(df['StartingWeek'] + df['WeekCount']-53),
df['StartingWeek'] + df['WeekCount']-1)
To add a "current year" (based on Edit) you can try:
from math import floor
df['CurrentYear'] = np.where((df['StartingWeek'] + df['WeekCount']-1) > 52,
df['Startingyear'] + floor((df['StartingWeek'] + df['WeekCount'])/52),
df['Startingyear'])
I’m trying to come up with a program that creates multiple rows and columns for each row based on a value in a column.
Here’s a look at my data
import pandas as pd
data = pd.read_excel("test data.xlsx")
Id | #ofweeks | Manhours | StartDate | EndDate | Startingyear | StartingWeek |
---|---|---|---|---|---|---|
aaa | 2 | 10 | 1/15/2023 | 1/29/2023 | 2023 | 3 |
bbb | 3 | 12 | 2/12/2023 | 3/05/2023 | 2023 | 7 |
The table needs to be expanded so that every row is expanded by the number of weeks.
There needs to be columns added for the Labor hours per week and columns which count the number of weeks for each Id.
The results should look like this
Id | #ofweeks | Manhours | StartDate | EndDate | Startingyear | StartingWeek | WeekCount | Year | Week# |
---|---|---|---|---|---|---|---|---|---|
aaa | 2 | 10 | 1/15/2023 | 1/29/2023 | 2023 | 3 | 1 | 2023 | 3 |
aaa | 2 | 10 | 1/15/2023 | 1/29/2023 | 2023 | 3 | 2 | 2023 | 4 |
bbb | 3 | 12 | 2/12/2023 | 3/05/2023 | 2023 | 7 | 1 | 2023 | 7 |
bbb | 3 | 12 | 2/12/2023 | 3/05/2023 | 2023 | 7 | 2 | 2023 | 8 |
bbb | 3 | 12 | 2/12/2023 | 3/05/2023 | 2023 | 7 | 3 | 2023 | 10 |
I’ve been able to get the table in the format I needed. However there’s one more issue.
I’ve added the following columns
# Add column for number of week for each expanded job record row
df['Week Count'] = df.groupby(['Id']).cumcount() + 1
# Add column for year for each job record row
from math import floor
df['Year'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
df['Starting Year'] + floor((df['Starting Week period'] + df['Week Count'])/52),
df['Starting Year'])
# Add column for the number of week for the calendar year for each job record row
df['Week #'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
(df['Starting Week period'] + df['Week Count']-53),
df['Starting Week period'] + df['Week Count']-1)
# Add leading 0 to the Week # Column
df['Week #'] = df['Week #'].astype(str).str.pad(2, side = 'left', fillchar = '0')
# Add a column Period which concatenates the Year and Week # columns
df['Period'] = df['Year'].astype(str) + "-" + df['Week #'].astype(str)
However, This is giving me the following error:
TypeError Traceback (most recent call last)
Cell In[6], line 7
4 # Add column for year for each job record row
5 from math import floor
6 df['Year'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
----> 7 df['Starting Year'] + floor((df['Starting Week period'] + df['Week Count'])/52),
8 df['Starting Year'])
10 # Add column for the number of week for the calendar year for each job record row
11 df['Week #'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
12 (df['Starting Week period'] + df['Week Count']-53),
13 df['Starting Week period'] + df['Week Count']-1)
File /opt/anaconda3/lib/python3.9/site-packages/pandas/core/series.py:191, in _coerce_method.<locals>.wrapper(self)
189 if len(self) == 1:
190 return converter(self.iloc[0])
--> 191 raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
Once you have expended the dataframe, I would add the columns such as:
df['WeekCount'] = df.groupby('Id')['Id'].cumcount() + 1
To account for activities that go past a calendar year you can try:
df['Week#'] = np.where((df['StartingWeek'] + df['WeekCount']-1) > 52,
(df['StartingWeek'] + df['WeekCount']-53),
df['StartingWeek'] + df['WeekCount']-1)
To add a "current year" (based on Edit) you can try:
from math import floor
df['CurrentYear'] = np.where((df['StartingWeek'] + df['WeekCount']-1) > 52,
df['Startingyear'] + floor((df['StartingWeek'] + df['WeekCount'])/52),
df['Startingyear'])