Equivalent 'spread' and 'gather' in R/tidyverse in python/pandas?
Question:
for example.
Data A:
y female male
1 2 3
4 5 6
I want to ‘gather’ it to this:
y gender value
1 female 2
1 male 3
4 female 5
4 male 6
It’s easy in R. What about python pandas?
Answers:
You should try melt , in the given data , the opposite(spread version is called cast), these melt and cast functions are very similar to R’s reshape2:
import pandas as pd
pd.melt(dt, id_vars="y")
Where dt is your input table
Output:
#y variable value
#1 female 2
#4 female 5
#1 male 3
#4 male 6
Try out the melt from the pandas (pd.melt).
Use id_vars to define your main gather/melt variable; value_vars to define your value variables; var_name to define the titles of your value-vars variables; and value_name to define the title of your actual values.
Look at this example:
#Import pandas module
import pandas as pd
# Define the dataframe
DF = pd.DataFrame({'y': [1,4], 'female': [2,5], 'male': [3,6]})
# Gather/melt the data frame
pd.melt(DF, id_vars='y', value_vars=['female', 'male'],var_name='gender',
value_name='value')
That is how your output looks like:
y gender value
0 1 female 2
1 4 female 5
2 1 male 3
3 4 male 6
Gather
df1=df.melt(id_vars='y')
df1
Spread
df2=df1.pivot(index='y', columns='variable')
df2
How about this:
from datar import f
from datar.tibble import tribble
from datar.tidyr import pivot_longer
df = tribble(
f.y, f.female, f.male,
1, 2, 3,
4, 5, 6
)
pivot_longer(df, [f.female, f.male], names_to="gender")
# y name value
# 0 1 female 2
# 1 4 female 5
# 2 1 male 3
# 3 4 male 6
I am the author of the datar package. Please feel free to submit issues if you have any questions about using it.
for example.
Data A:
y female male
1 2 3
4 5 6
I want to ‘gather’ it to this:
y gender value
1 female 2
1 male 3
4 female 5
4 male 6
It’s easy in R. What about python pandas?
You should try melt , in the given data , the opposite(spread version is called cast), these melt and cast functions are very similar to R’s reshape2:
import pandas as pd
pd.melt(dt, id_vars="y")
Where dt is your input table
Output:
#y variable value
#1 female 2
#4 female 5
#1 male 3
#4 male 6
Try out the melt from the pandas (pd.melt).
Use id_vars to define your main gather/melt variable; value_vars to define your value variables; var_name to define the titles of your value-vars variables; and value_name to define the title of your actual values.
Look at this example:
#Import pandas module
import pandas as pd
# Define the dataframe
DF = pd.DataFrame({'y': [1,4], 'female': [2,5], 'male': [3,6]})
# Gather/melt the data frame
pd.melt(DF, id_vars='y', value_vars=['female', 'male'],var_name='gender',
value_name='value')
That is how your output looks like:
y gender value
0 1 female 2
1 4 female 5
2 1 male 3
3 4 male 6
Gather
df1=df.melt(id_vars='y')
df1
Spread
df2=df1.pivot(index='y', columns='variable')
df2
How about this:
from datar import f
from datar.tibble import tribble
from datar.tidyr import pivot_longer
df = tribble(
f.y, f.female, f.male,
1, 2, 3,
4, 5, 6
)
pivot_longer(df, [f.female, f.male], names_to="gender")
# y name value
# 0 1 female 2
# 1 4 female 5
# 2 1 male 3
# 3 4 male 6
I am the author of the datar package. Please feel free to submit issues if you have any questions about using it.