Need specific sets of columns to be converted into a row and the rest of columns to repeat values
Question:
I have data in the following format
ID
SCHOOL
Name1
Name1 Subject1
Name1 Grade1
Name1 Subject2
Name1 Grade2
Name2
Name2 Subject1
Name2 Grade1
Name2 Subject2
Name2 Grade2
1
S1
Mr. ABC
Math
6
Science
7
Mr. XYZ
Social
8
EVS
9
2
S2
Mr. PQR
Math
10
Science
11
Mr. KLM
Social
8
EVS
9
Can I transform it in the following format using Python
ID
SCHOOL
Name
Subject
Grade
1
S1
Mr. ABC
Math
6
1
S1
Mr. ABC
Science
7
1
S1
Mr. XYZ
Social
8
1
S1
Mr. XYZ
EVS
9
2
S2
Mr. PQR
Math
10
2
S2
Mr. PQR
Science
11
2
S2
Mr. KLM
Social
8
2
S2
Mr. KLM
EVS
9
Answers:
there might be a nicer solution but this also works:
df_1=df[['ID', 'SCHOOL','Name1', 'Name1 Subject1',
'Name1 Grade1']]
df_2=df[['ID', 'SCHOOL','Name1', 'Name1 Subject2',
'Name1 Grade2']]
df_3=df[['ID', 'SCHOOL','Name2', 'Name2 Subject1',
'Name2 Grade1']]
df_4=df[['ID', 'SCHOOL','Name2', 'Name2 Subject2',
'Name2 Grade2']]
df_list=[df_1,df_2,df_3,df_4]
for i in df_list:
i.columns=['ID','SCHOOL','Name','Subject','Grade']
final=pd.concat(df_list)
print(final)
'''
ID SCHOOL Name Subject Grade
0 1 S1 Mr. ABC Math 6
1 2 S2 Mr. PQR Math 10
0 1 S1 Mr. ABC Science 7
1 2 S2 Mr. PQR Science 11
0 1 S1 Mr. XYZ Social 8
1 2 S2 Mr. KLM Social 8
0 1 S1 Mr. XYZ EVS 9
1 2 S2 Mr. KLM EVS 9
'''
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = ['ID', 'SCHOOL', 'Name1', 'Name2'],
names_to = ('name', '.value'),
names_pattern=r"(.+)s(.+)d",
sort_by_appearance=True)
.assign(Name = lambda df: np.where(df.name.eq('Name1'), df.Name1, df.Name2))
.loc[:, ['ID', 'SCHOOL', 'Name', 'Subject', 'Grade']]
)
ID SCHOOL Name Subject Grade
0 1 S1 Mr. ABC Math 6
1 1 S1 Mr. ABC Science 7
2 1 S1 Mr. XYZ Social 8
3 1 S1 Mr. XYZ EVS 9
4 2 S2 Mr. PQR Math 10
5 2 S2 Mr. PQR Science 11
6 2 S2 Mr. KLM Social 8
7 2 S2 Mr. KLM EVS 9
The names_pattern
is a regex to capture groups within the specified columns. Groups are specified in the parenthesis; any group that aligns with .value
stays as column header. here we have two groups in the names_pattern and two values in names_to, the second group is paired with .value
and stays as a column header.
After reshaping, the where clause helps to ensure correct match and final output
I have data in the following format
ID | SCHOOL | Name1 | Name1 Subject1 | Name1 Grade1 | Name1 Subject2 | Name1 Grade2 | Name2 | Name2 Subject1 | Name2 Grade1 | Name2 Subject2 | Name2 Grade2 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | S1 | Mr. ABC | Math | 6 | Science | 7 | Mr. XYZ | Social | 8 | EVS | 9 |
2 | S2 | Mr. PQR | Math | 10 | Science | 11 | Mr. KLM | Social | 8 | EVS | 9 |
Can I transform it in the following format using Python
ID | SCHOOL | Name | Subject | Grade |
---|---|---|---|---|
1 | S1 | Mr. ABC | Math | 6 |
1 | S1 | Mr. ABC | Science | 7 |
1 | S1 | Mr. XYZ | Social | 8 |
1 | S1 | Mr. XYZ | EVS | 9 |
2 | S2 | Mr. PQR | Math | 10 |
2 | S2 | Mr. PQR | Science | 11 |
2 | S2 | Mr. KLM | Social | 8 |
2 | S2 | Mr. KLM | EVS | 9 |
there might be a nicer solution but this also works:
df_1=df[['ID', 'SCHOOL','Name1', 'Name1 Subject1',
'Name1 Grade1']]
df_2=df[['ID', 'SCHOOL','Name1', 'Name1 Subject2',
'Name1 Grade2']]
df_3=df[['ID', 'SCHOOL','Name2', 'Name2 Subject1',
'Name2 Grade1']]
df_4=df[['ID', 'SCHOOL','Name2', 'Name2 Subject2',
'Name2 Grade2']]
df_list=[df_1,df_2,df_3,df_4]
for i in df_list:
i.columns=['ID','SCHOOL','Name','Subject','Grade']
final=pd.concat(df_list)
print(final)
'''
ID SCHOOL Name Subject Grade
0 1 S1 Mr. ABC Math 6
1 2 S2 Mr. PQR Math 10
0 1 S1 Mr. ABC Science 7
1 2 S2 Mr. PQR Science 11
0 1 S1 Mr. XYZ Social 8
1 2 S2 Mr. KLM Social 8
0 1 S1 Mr. XYZ EVS 9
1 2 S2 Mr. KLM EVS 9
'''
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = ['ID', 'SCHOOL', 'Name1', 'Name2'],
names_to = ('name', '.value'),
names_pattern=r"(.+)s(.+)d",
sort_by_appearance=True)
.assign(Name = lambda df: np.where(df.name.eq('Name1'), df.Name1, df.Name2))
.loc[:, ['ID', 'SCHOOL', 'Name', 'Subject', 'Grade']]
)
ID SCHOOL Name Subject Grade
0 1 S1 Mr. ABC Math 6
1 1 S1 Mr. ABC Science 7
2 1 S1 Mr. XYZ Social 8
3 1 S1 Mr. XYZ EVS 9
4 2 S2 Mr. PQR Math 10
5 2 S2 Mr. PQR Science 11
6 2 S2 Mr. KLM Social 8
7 2 S2 Mr. KLM EVS 9
The names_pattern
is a regex to capture groups within the specified columns. Groups are specified in the parenthesis; any group that aligns with .value
stays as column header. here we have two groups in the names_pattern and two values in names_to, the second group is paired with .value
and stays as a column header.
After reshaping, the where clause helps to ensure correct match and final output