Melting data in pandas python
Question:
I have a data frame which have 11 columns. The data is in wide format. 1 column is unique ID. 5 columns represent Social Media apps, while 5 columns represent how much are these apps used (frequency). All the values are categorical. first 5 columns have 0 and 1 which show whether someone uses this app or not. The other 5 columns have values "alot", "few time a day" and "very often". I want to show a plot which shows which social media have what percentage of frequency. I have tried melting data but it gets very confusing. It would be very kind of you to help me. Thanks in Advance!
This is what it looks like
Id
SNS 1
SNS 2
SNS 3
SNS 4
SNS 5
Freq 1
Freq 2
Freq 3
Freq 4
Freq 5
1
1
0
0
1
1
Alot
N/A
N/A
Often
Often
2
0
1
1
1
0
N/A
Often
Alot
Few times
N/A
I want it to look like
Id
SNS
Values
Freq
1
SNS 1
1
Alot
1
SNS 2
0
N/A
1
SNS 3
0
N/A
1
SNS 4
1
Often
1
SNS 5
1
Often
Answers:
You could check out pd.wide_to_long as it is designed to handle cases like this a little more effectively than melt.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Id': [1, 2],
'SNS 1': [1, 0],
'SNS 2': [0, 1],
'SNS 3': [0, 1],
'SNS 4': [1, 1],
'SNS 5': [1, 0],
'Freq 1': ['Alot', np.nan],
'Freq 2': [np.nan, 'Often'],
'Freq 3': [np.nan, 'Alot'],
'Freq 4': ['Often', 'Few times'],
'Freq 5': ['Often', np.nan]})
df = pd.wide_to_long(df, stubnames=['SNS ','Freq '], i='Id', j='s').reset_index()
df.columns = ['Id','SNS','Values','Freq']
df['SNS'] = 'SNS ' + df['SNS'].astype(str)
df = df.sort_values(by=['Id','SNS'])
print(df)
Output
Id SNS Values Freq
0 1 SNS 1 1 Alot
2 1 SNS 2 0 NaN
4 1 SNS 3 0 NaN
6 1 SNS 4 1 Often
8 1 SNS 5 1 Often
1 2 SNS 1 0 NaN
3 2 SNS 2 1 Often
5 2 SNS 3 1 Alot
7 2 SNS 4 1 Few times
9 2 SNS 5 0 NaN
I see that there’s already a much better and less convoluted answer provided, but would just like to contribute what I’ve came up with
df = pd.DataFrame([[1,1,0,0,1,1, 'Alot', 'N/A','N/A','Often','Often'],
[2,0,1,1,1,0, 'N/A','Often','Alot','Few Times','N/A']],
columns=['Id', 'SNS 1','SNS 2', 'SNS 3', 'SNS 4', 'SNS 5', 'Freq 1', 'Freq 2', 'Freq 3', 'Freq 4', 'Freq 5'])
# Melt it
melted_df = df.melt(id_vars='Id', var_name='SNS_Freq', value_name='Values')
# Obtain the social media ID, to later match 'SNS 1' to 'Freq 1', and so on
melted_df['SNS_Freq_ID'] = melted_df['SNS_Freq'].apply(lambda x: x.split()[-1])
# Split the melted_df into two separate dataframes based on SNS or Freq
SNS_df = melted_df[melted_df['SNS_Freq'].str.startswith('SNS')].copy()
SNS_df.rename(columns={'SNS_Freq':'SNS'}, inplace=True)
Freq_df = melted_df[melted_df['SNS_Freq'].str.contains('Freq')].copy()
Freq_df.rename(columns={'Values':'Freq'}, inplace=True)
Freq_df.drop('SNS_Freq', axis=1, inplace=True)
# Merge them back
final_df = pd.merge(left=SNS_df, right=Freq_df, how='inner', on=['Id','SNS_Freq_ID'])
final_df.drop('SNS_Freq_ID', axis=1, inplace=True)
final_df.sort_values(['Id','SNS'], inplace=True)
final_df.reset_index(inplace=True, drop=True)
I have a data frame which have 11 columns. The data is in wide format. 1 column is unique ID. 5 columns represent Social Media apps, while 5 columns represent how much are these apps used (frequency). All the values are categorical. first 5 columns have 0 and 1 which show whether someone uses this app or not. The other 5 columns have values "alot", "few time a day" and "very often". I want to show a plot which shows which social media have what percentage of frequency. I have tried melting data but it gets very confusing. It would be very kind of you to help me. Thanks in Advance!
This is what it looks like
Id | SNS 1 | SNS 2 | SNS 3 | SNS 4 | SNS 5 | Freq 1 | Freq 2 | Freq 3 | Freq 4 | Freq 5 |
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0 | 0 | 1 | 1 | Alot | N/A | N/A | Often | Often |
2 | 0 | 1 | 1 | 1 | 0 | N/A | Often | Alot | Few times | N/A |
I want it to look like
Id | SNS | Values | Freq |
---|---|---|---|
1 | SNS 1 | 1 | Alot |
1 | SNS 2 | 0 | N/A |
1 | SNS 3 | 0 | N/A |
1 | SNS 4 | 1 | Often |
1 | SNS 5 | 1 | Often |
You could check out pd.wide_to_long as it is designed to handle cases like this a little more effectively than melt.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Id': [1, 2],
'SNS 1': [1, 0],
'SNS 2': [0, 1],
'SNS 3': [0, 1],
'SNS 4': [1, 1],
'SNS 5': [1, 0],
'Freq 1': ['Alot', np.nan],
'Freq 2': [np.nan, 'Often'],
'Freq 3': [np.nan, 'Alot'],
'Freq 4': ['Often', 'Few times'],
'Freq 5': ['Often', np.nan]})
df = pd.wide_to_long(df, stubnames=['SNS ','Freq '], i='Id', j='s').reset_index()
df.columns = ['Id','SNS','Values','Freq']
df['SNS'] = 'SNS ' + df['SNS'].astype(str)
df = df.sort_values(by=['Id','SNS'])
print(df)
Output
Id SNS Values Freq
0 1 SNS 1 1 Alot
2 1 SNS 2 0 NaN
4 1 SNS 3 0 NaN
6 1 SNS 4 1 Often
8 1 SNS 5 1 Often
1 2 SNS 1 0 NaN
3 2 SNS 2 1 Often
5 2 SNS 3 1 Alot
7 2 SNS 4 1 Few times
9 2 SNS 5 0 NaN
I see that there’s already a much better and less convoluted answer provided, but would just like to contribute what I’ve came up with
df = pd.DataFrame([[1,1,0,0,1,1, 'Alot', 'N/A','N/A','Often','Often'],
[2,0,1,1,1,0, 'N/A','Often','Alot','Few Times','N/A']],
columns=['Id', 'SNS 1','SNS 2', 'SNS 3', 'SNS 4', 'SNS 5', 'Freq 1', 'Freq 2', 'Freq 3', 'Freq 4', 'Freq 5'])
# Melt it
melted_df = df.melt(id_vars='Id', var_name='SNS_Freq', value_name='Values')
# Obtain the social media ID, to later match 'SNS 1' to 'Freq 1', and so on
melted_df['SNS_Freq_ID'] = melted_df['SNS_Freq'].apply(lambda x: x.split()[-1])
# Split the melted_df into two separate dataframes based on SNS or Freq
SNS_df = melted_df[melted_df['SNS_Freq'].str.startswith('SNS')].copy()
SNS_df.rename(columns={'SNS_Freq':'SNS'}, inplace=True)
Freq_df = melted_df[melted_df['SNS_Freq'].str.contains('Freq')].copy()
Freq_df.rename(columns={'Values':'Freq'}, inplace=True)
Freq_df.drop('SNS_Freq', axis=1, inplace=True)
# Merge them back
final_df = pd.merge(left=SNS_df, right=Freq_df, how='inner', on=['Id','SNS_Freq_ID'])
final_df.drop('SNS_Freq_ID', axis=1, inplace=True)
final_df.sort_values(['Id','SNS'], inplace=True)
final_df.reset_index(inplace=True, drop=True)