pandas.DataFrame: How to merge rows with a common column value in the same pandas.DataFrame

Question:

I have a pandas.DataFrame that looks like that:

index projectid question answer
0 1 ‘q1’ ‘str1’
1 1 ‘q2’ ‘str2’
2 1 ‘q3’ ‘str3’
3 2 ‘q1’ ‘str4’
4 2 ‘q3’ ‘str6’

And I would like to format it like that:

index projectid question1 answer1 question2 answer2 question3 answer3
0 1 ‘q1’ ‘str1’ ‘q2’ ‘str2’ ‘q3’ ‘str3’
1 2 ‘q1’ ‘str4’ None None ‘q3’ ‘str6’

Not every project has the same number of question but questions are shared for each project. So when a specific question isn’t in a project, I would like cells to be filled up with None values.

I didn’t found any way to do it with join or concat, but I don’t know how to properly use it.

I would like to improve my pandas skills so my question is:
Is there any way to do it with pandas treatment or doing it manually by treating my DataFrames with iterrows is the only way ?

Thank you !

Asked By: AxRab

||

Answers:

You can use cumcount before pivoting to get your suffixes:

df['idx'] = df.groupby('projectid').cumcount() + 1
df = df.pivot(index='projectid',columns='idx')[['question','answer']]
df.columns = [''.join(map(str, col)) for col in df.columns]
print(df)

Output::

          question1 question2 question3 answer1 answer2 answer3
projectid
1              'q1'      'q2'      'q3'  'str1'  'str2'  'str3'
2              'q1'      'q3'       NaN  'str4'  'str6'     NaN
Answered By: Tranbi
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.