Need specific sets of columns to be converted into a row and the rest of columns to repeat values

Question:

I have data in the following format

ID SCHOOL Name1 Name1 Subject1 Name1 Grade1 Name1 Subject2 Name1 Grade2 Name2 Name2 Subject1 Name2 Grade1 Name2 Subject2 Name2 Grade2
1 S1 Mr. ABC Math 6 Science 7 Mr. XYZ Social 8 EVS 9
2 S2 Mr. PQR Math 10 Science 11 Mr. KLM Social 8 EVS 9

Can I transform it in the following format using Python

ID SCHOOL Name Subject Grade
1 S1 Mr. ABC Math 6
1 S1 Mr. ABC Science 7
1 S1 Mr. XYZ Social 8
1 S1 Mr. XYZ EVS 9
2 S2 Mr. PQR Math 10
2 S2 Mr. PQR Science 11
2 S2 Mr. KLM Social 8
2 S2 Mr. KLM EVS 9
Asked By: Harshit Jain

||

Answers:

there might be a nicer solution but this also works:

df_1=df[['ID', 'SCHOOL','Name1', 'Name1 Subject1',
       'Name1 Grade1']]
df_2=df[['ID', 'SCHOOL','Name1', 'Name1 Subject2',
       'Name1 Grade2']]

df_3=df[['ID', 'SCHOOL','Name2', 'Name2 Subject1',
       'Name2 Grade1']]

df_4=df[['ID', 'SCHOOL','Name2', 'Name2 Subject2',
       'Name2 Grade2']]

df_list=[df_1,df_2,df_3,df_4]
for i in df_list:
    i.columns=['ID','SCHOOL','Name','Subject','Grade']
final=pd.concat(df_list)
print(final)
'''
    ID  SCHOOL  Name    Subject Grade
0   1   S1      Mr. ABC Math    6
1   2   S2      Mr. PQR Math    10
0   1   S1      Mr. ABC Science 7
1   2   S2      Mr. PQR Science 11
0   1   S1      Mr. XYZ Social  8
1   2   S2      Mr. KLM Social  8
0   1   S1      Mr. XYZ EVS     9
1   2   S2      Mr. KLM EVS     9

'''



Answered By: Clegane

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor


(df
.pivot_longer(
    index = ['ID', 'SCHOOL', 'Name1', 'Name2'], 
    names_to = ('name', '.value'), 
    names_pattern=r"(.+)s(.+)d", 
    sort_by_appearance=True)
.assign(Name = lambda df: np.where(df.name.eq('Name1'), df.Name1, df.Name2))
.loc[:, ['ID', 'SCHOOL', 'Name', 'Subject', 'Grade']]
)

   ID SCHOOL     Name  Subject  Grade
0   1     S1  Mr. ABC     Math      6
1   1     S1  Mr. ABC  Science      7
2   1     S1  Mr. XYZ   Social      8
3   1     S1  Mr. XYZ      EVS      9
4   2     S2  Mr. PQR     Math     10
5   2     S2  Mr. PQR  Science     11
6   2     S2  Mr. KLM   Social      8
7   2     S2  Mr. KLM      EVS      9

The names_pattern is a regex to capture groups within the specified columns. Groups are specified in the parenthesis; any group that aligns with .value stays as column header. here we have two groups in the names_pattern and two values in names_to, the second group is paired with .value and stays as a column header.

After reshaping, the where clause helps to ensure correct match and final output

Answered By: sammywemmy