pandas.DataFrame: How to merge rows with a common column value in the same pandas.DataFrame
Question:
I have a pandas.DataFrame that looks like that:
index
projectid
question
answer
0
1
‘q1’
‘str1’
1
1
‘q2’
‘str2’
2
1
‘q3’
‘str3’
3
2
‘q1’
‘str4’
4
2
‘q3’
‘str6’
And I would like to format it like that:
index
projectid
question1
answer1
question2
answer2
question3
answer3
0
1
‘q1’
‘str1’
‘q2’
‘str2’
‘q3’
‘str3’
1
2
‘q1’
‘str4’
None
None
‘q3’
‘str6’
Not every project has the same number of question but questions are shared for each project. So when a specific question isn’t in a project, I would like cells to be filled up with None values.
I didn’t found any way to do it with join or concat, but I don’t know how to properly use it.
I would like to improve my pandas skills so my question is:
Is there any way to do it with pandas treatment or doing it manually by treating my DataFrames with iterrows is the only way ?
Thank you !
Answers:
You can use cumcount
before pivoting to get your suffixes:
df['idx'] = df.groupby('projectid').cumcount() + 1
df = df.pivot(index='projectid',columns='idx')[['question','answer']]
df.columns = [''.join(map(str, col)) for col in df.columns]
print(df)
Output::
question1 question2 question3 answer1 answer2 answer3
projectid
1 'q1' 'q2' 'q3' 'str1' 'str2' 'str3'
2 'q1' 'q3' NaN 'str4' 'str6' NaN
I have a pandas.DataFrame that looks like that:
index | projectid | question | answer |
---|---|---|---|
0 | 1 | ‘q1’ | ‘str1’ |
1 | 1 | ‘q2’ | ‘str2’ |
2 | 1 | ‘q3’ | ‘str3’ |
3 | 2 | ‘q1’ | ‘str4’ |
4 | 2 | ‘q3’ | ‘str6’ |
And I would like to format it like that:
index | projectid | question1 | answer1 | question2 | answer2 | question3 | answer3 |
---|---|---|---|---|---|---|---|
0 | 1 | ‘q1’ | ‘str1’ | ‘q2’ | ‘str2’ | ‘q3’ | ‘str3’ |
1 | 2 | ‘q1’ | ‘str4’ | None | None | ‘q3’ | ‘str6’ |
Not every project has the same number of question but questions are shared for each project. So when a specific question isn’t in a project, I would like cells to be filled up with None values.
I didn’t found any way to do it with join or concat, but I don’t know how to properly use it.
I would like to improve my pandas skills so my question is:
Is there any way to do it with pandas treatment or doing it manually by treating my DataFrames with iterrows is the only way ?
Thank you !
You can use cumcount
before pivoting to get your suffixes:
df['idx'] = df.groupby('projectid').cumcount() + 1
df = df.pivot(index='projectid',columns='idx')[['question','answer']]
df.columns = [''.join(map(str, col)) for col in df.columns]
print(df)
Output::
question1 question2 question3 answer1 answer2 answer3
projectid
1 'q1' 'q2' 'q3' 'str1' 'str2' 'str3'
2 'q1' 'q3' NaN 'str4' 'str6' NaN