Lengthening Pandas Dataframe by setting column headers as a row values and having a value column
Question:
I am a bit stuck with how to reshape my dataframe into a shape that offers me more flexibility.
My current dataframe is as follows.
Orginal_df = pd.DataFrame([['Action', 1, 5, 3],
['Comedy', 2, 4, 6],
['Drama', 3, 2, 10],
['Crime', 1, 6, 6],
['Documentary', 2, 9, 3]],
columns=['Genre', 'Bob', 'Sara', 'Peter'])
Movies.head()
The shape I want my dataframe to be in is as follows:
Wanted_df = pd.DataFrame([['Action', 'Bob', 1],
['Comedy', 'Bob', 2],
['Drama', 'Bob', 3],
['Crime', 'Bob', 1],
['Documentary', 'Bob', 2],
['Action', 'Sara', 5],
['Comedy', 'Sara', 4],
['Drama', 'Sara', 2],
['Crime', 'Sara', 6],
['Documentary', 'Sara', 9],
['Action', 'Peter', 3],
['Comedy', 'Peter', 6],
['Drama', 'Peter', 10],
['Crime', 'Peter', 6],
['Documentary', 'Peter', 3]],
columns=['Genre', 'Name', 'Count'])
Wanted_df.head()
Methods that I have tried are either concatenating with a loop.
df_movies_genre_frequency_test = df_movies_genre_frequency[['index']]
for user in users:
df_movies_genre_frequency_test = pd.concat(df_movies_genre_frequency_test + [df_movies_genre_frequency[['index', user]]])
df_movies_genre_frequency_test.head(40)
And I’ve also tried with the df.melt(…)
Any help on how to solve this is very much appreciated
Answers:
In my opinion pandas.melt()
will do the job, while you set the Genre as id_vars=['Genre']
:
df.melt(id_vars=['Genre'], var_name='Name', value_name='Count')
Example
df = pd.DataFrame([['Action', 1, 5, 3],
['Comedy', 2, 4, 6],
['Drama', 3, 2, 10],
['Crime', 1, 6, 6],
['Documentary', 2, 9, 3]],
columns=['Genre', 'Bob', 'Sara', 'Peter'])
df.melt(id_vars=['Genre'], var_name='Name', value_name='Count')
Output
Genre
Name
Count
0
Action
Bob
1
1
Comedy
Bob
2
2
Drama
Bob
3
3
Crime
Bob
1
4
Documentary
Bob
2
5
Action
Sara
5
6
Comedy
Sara
4
7
Drama
Sara
2
8
Crime
Sara
6
9
Documentary
Sara
9
10
Action
Peter
3
11
Comedy
Peter
6
12
Drama
Peter
10
13
Crime
Peter
6
14
Documentary
Peter
3
I am a bit stuck with how to reshape my dataframe into a shape that offers me more flexibility.
My current dataframe is as follows.
Orginal_df = pd.DataFrame([['Action', 1, 5, 3],
['Comedy', 2, 4, 6],
['Drama', 3, 2, 10],
['Crime', 1, 6, 6],
['Documentary', 2, 9, 3]],
columns=['Genre', 'Bob', 'Sara', 'Peter'])
Movies.head()
The shape I want my dataframe to be in is as follows:
Wanted_df = pd.DataFrame([['Action', 'Bob', 1],
['Comedy', 'Bob', 2],
['Drama', 'Bob', 3],
['Crime', 'Bob', 1],
['Documentary', 'Bob', 2],
['Action', 'Sara', 5],
['Comedy', 'Sara', 4],
['Drama', 'Sara', 2],
['Crime', 'Sara', 6],
['Documentary', 'Sara', 9],
['Action', 'Peter', 3],
['Comedy', 'Peter', 6],
['Drama', 'Peter', 10],
['Crime', 'Peter', 6],
['Documentary', 'Peter', 3]],
columns=['Genre', 'Name', 'Count'])
Wanted_df.head()
Methods that I have tried are either concatenating with a loop.
df_movies_genre_frequency_test = df_movies_genre_frequency[['index']]
for user in users:
df_movies_genre_frequency_test = pd.concat(df_movies_genre_frequency_test + [df_movies_genre_frequency[['index', user]]])
df_movies_genre_frequency_test.head(40)
And I’ve also tried with the df.melt(…)
Any help on how to solve this is very much appreciated
In my opinion pandas.melt()
will do the job, while you set the Genre as id_vars=['Genre']
:
df.melt(id_vars=['Genre'], var_name='Name', value_name='Count')
Example
df = pd.DataFrame([['Action', 1, 5, 3],
['Comedy', 2, 4, 6],
['Drama', 3, 2, 10],
['Crime', 1, 6, 6],
['Documentary', 2, 9, 3]],
columns=['Genre', 'Bob', 'Sara', 'Peter'])
df.melt(id_vars=['Genre'], var_name='Name', value_name='Count')
Output
Genre | Name | Count | |
---|---|---|---|
0 | Action | Bob | 1 |
1 | Comedy | Bob | 2 |
2 | Drama | Bob | 3 |
3 | Crime | Bob | 1 |
4 | Documentary | Bob | 2 |
5 | Action | Sara | 5 |
6 | Comedy | Sara | 4 |
7 | Drama | Sara | 2 |
8 | Crime | Sara | 6 |
9 | Documentary | Sara | 9 |
10 | Action | Peter | 3 |
11 | Comedy | Peter | 6 |
12 | Drama | Peter | 10 |
13 | Crime | Peter | 6 |
14 | Documentary | Peter | 3 |