Melting data in pandas python

Question

I have a data frame which have 11 columns. The data is in wide format. 1 column is unique ID. 5 columns represent Social Media apps, while 5 columns represent how much are these apps used (frequency). All the values are categorical. first 5 columns have 0 and 1 which show whether someone uses this app or not. The other 5 columns have values "alot", "few time a day" and "very often". I want to show a plot which shows which social media have what percentage of frequency. I have tried melting data but it gets very confusing. It would be very kind of you to help me. Thanks in Advance!

This is what it looks like

Id	SNS 1	SNS 2	SNS 3	SNS 4	SNS 5	Freq 1	Freq 2	Freq 3	Freq 4	Freq 5
1	1	0	0	1	1	Alot	N/A	N/A	Often	Often
2	0	1	1	1	0	N/A	Often	Alot	Few times	N/A

I want it to look like

Id	SNS	Values	Freq
1	SNS 1	1	Alot
1	SNS 2	0	N/A
1	SNS 3	0	N/A
1	SNS 4	1	Often
1	SNS 5	1	Often

Asked By: Ali Abdullah

||

Source

Answer 1

You could check out pd.wide_to_long as it is designed to handle cases like this a little more effectively than melt.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Id': [1, 2],
 'SNS 1': [1, 0],
 'SNS 2': [0, 1],
 'SNS 3': [0, 1],
 'SNS 4': [1, 1],
 'SNS 5': [1, 0],
 'Freq 1': ['Alot', np.nan],
 'Freq 2': [np.nan, 'Often'],
 'Freq 3': [np.nan, 'Alot'],
 'Freq 4': ['Often', 'Few times'],
 'Freq 5': ['Often', np.nan]})

df = pd.wide_to_long(df, stubnames=['SNS ','Freq '], i='Id', j='s').reset_index()
df.columns = ['Id','SNS','Values','Freq']
df['SNS'] = 'SNS ' + df['SNS'].astype(str)
df = df.sort_values(by=['Id','SNS'])

print(df)

Output

   Id    SNS  Values       Freq
0   1  SNS 1       1       Alot
2   1  SNS 2       0        NaN
4   1  SNS 3       0        NaN
6   1  SNS 4       1      Often
8   1  SNS 5       1      Often
1   2  SNS 1       0        NaN
3   2  SNS 2       1      Often
5   2  SNS 3       1       Alot
7   2  SNS 4       1  Few times
9   2  SNS 5       0        NaN

Answered By: Chris

Answer 2

I see that there’s already a much better and less convoluted answer provided, but would just like to contribute what I’ve came up with

df = pd.DataFrame([[1,1,0,0,1,1, 'Alot', 'N/A','N/A','Often','Often'], 
                   [2,0,1,1,1,0, 'N/A','Often','Alot','Few Times','N/A']],
                 columns=['Id', 'SNS 1','SNS 2', 'SNS 3', 'SNS 4', 'SNS 5', 'Freq 1', 'Freq 2', 'Freq 3', 'Freq 4', 'Freq 5'])

# Melt it
melted_df = df.melt(id_vars='Id', var_name='SNS_Freq', value_name='Values')

# Obtain the social media ID, to later match 'SNS 1' to 'Freq 1', and so on
melted_df['SNS_Freq_ID'] = melted_df['SNS_Freq'].apply(lambda x: x.split()[-1])

# Split the melted_df into two separate dataframes based on SNS or Freq
SNS_df = melted_df[melted_df['SNS_Freq'].str.startswith('SNS')].copy()
SNS_df.rename(columns={'SNS_Freq':'SNS'}, inplace=True)

Freq_df = melted_df[melted_df['SNS_Freq'].str.contains('Freq')].copy()
Freq_df.rename(columns={'Values':'Freq'}, inplace=True)
Freq_df.drop('SNS_Freq', axis=1, inplace=True)

# Merge them back
final_df = pd.merge(left=SNS_df, right=Freq_df, how='inner', on=['Id','SNS_Freq_ID'])
final_df.drop('SNS_Freq_ID', axis=1, inplace=True)
final_df.sort_values(['Id','SNS'], inplace=True)
final_df.reset_index(inplace=True, drop=True)

Answered By: wanderingcatto

Melting data in pandas python

Question:

Answers: