Plot multiple dataframe in a plot with facet_wrap
Question:
I have a dataset df
that looks like this:
ID Week VarA VarB VarC VarD
s001 w1 2 5 4 7
s001 w2 4 5 2 3
s001 w3 7 2 0 1
s002 w1 4 0 9 8
s002 w2 1 5 2 5
s002 w3 7 3 6 0
s001 w1 6 5 7 9
s003 w2 2 0 1 0
s003 w3 6 9 3 4
For each ID, I am trying to plot its progress by Week for all Var (VarB,VarC,VarD) with VarA as the reference data.
I do df.melt()
and run coding below and it works.
ID Week Var Value
s001 w1 VarA 2
s001 w2 VarA 4
s001 w3 VarA 7
s002 w1 VarA 4
s002 w2 VarA 1
s002 w3 VarA 7
s001 w1 VarA 6
s003 w2 VarA 2
s003 w3 VarA 6
s001 w1 VarB 5
s001 w2 VarB 5
...
Codes:
for id in idlist:
#get VarA into new df
newdf= df_melt[df_melt.Var == 'VarA']
#remove rows with VarA so it won't be included in facet_wrap()
tmp = df_melt[df_melt.Var != 'VarA']
plot2 = ggplot() + ggtitle(id) + labs(x='Week',y="Value")
+ geom_point(newdf[newdf['ID'] == id], aes(x='Week',y='Value'))
+ geom_point(tmp[tmp['ID'] == id], aes(x='Week',y='Value',color='Var'))
+ theme(axis_text_x=element_text(rotation=45))
print(plot2)
However, when I add facet_wrap('Var', ncol=3,scales='free')
I get an error below
IndexError: arrays used as indices must be of integer (or boolean) type
And also I couldn’t connect the line using geom_line()
.
Is this because of the different df
used? Is there a way to use multiple geom_point()
for different df and facet_wrap
in one ggplot object?
Answers:
The issue with the question is a bug that would be reproduced by the following code. The bug has been fixed and the next version of plotnine will have the fix.
import pandas as pd
from plotnine import *
df1 = pd.DataFrame({
'x': list("abc"),
'y': [1, 2, 3],
'g': list("AAA")
})
df2 = pd.DataFrame({
'x': list("abc"),
'y': [4, 5, 6],
'g': list("AAB")
})
(ggplot(aes("x", "y"))
+ geom_point(df1)
+ geom_point(df2)
+ facet_wrap("g", scales="free_x")
)
In addition to the fixed bug as mentioned by @has2k1, I have found the solution to add a reference data point VarA
by renaming the column name of Var
to something else so that both df
do not have the same column name and will allow facet_wrap
to work only on one of the df
.
for pt in idlist:
#get VarA into new df
newdf = df_melt[df_melt.Var == 'VarA']
newdf.rename(columns = {'Var':'RefVar'},inplace=True)
#remove rows with VarA so it won't be included in facet_wrap()
tmp = df_melt[df_melt.Var != 'VarA']
plot2 = ggplot()
+ geom_point(tmp[tmp['ID'] == pt],aes(x='Week',y='Value',color='Var'))
+ facet_wrap('Var',ncol=1,scales='free')
+ geom_point(newdf[newdf['ID'] == pt],aes(x='Week',y='Value'))
+ labs(x='Week',y='Value') + ggtitle(pt) + theme(axis_text_x=element_text(rotation=45),subplots_adjust={'hspace': 0.6})
print(plot2)
I have a dataset df
that looks like this:
ID Week VarA VarB VarC VarD
s001 w1 2 5 4 7
s001 w2 4 5 2 3
s001 w3 7 2 0 1
s002 w1 4 0 9 8
s002 w2 1 5 2 5
s002 w3 7 3 6 0
s001 w1 6 5 7 9
s003 w2 2 0 1 0
s003 w3 6 9 3 4
For each ID, I am trying to plot its progress by Week for all Var (VarB,VarC,VarD) with VarA as the reference data.
I do df.melt()
and run coding below and it works.
ID Week Var Value
s001 w1 VarA 2
s001 w2 VarA 4
s001 w3 VarA 7
s002 w1 VarA 4
s002 w2 VarA 1
s002 w3 VarA 7
s001 w1 VarA 6
s003 w2 VarA 2
s003 w3 VarA 6
s001 w1 VarB 5
s001 w2 VarB 5
...
Codes:
for id in idlist:
#get VarA into new df
newdf= df_melt[df_melt.Var == 'VarA']
#remove rows with VarA so it won't be included in facet_wrap()
tmp = df_melt[df_melt.Var != 'VarA']
plot2 = ggplot() + ggtitle(id) + labs(x='Week',y="Value")
+ geom_point(newdf[newdf['ID'] == id], aes(x='Week',y='Value'))
+ geom_point(tmp[tmp['ID'] == id], aes(x='Week',y='Value',color='Var'))
+ theme(axis_text_x=element_text(rotation=45))
print(plot2)
However, when I add facet_wrap('Var', ncol=3,scales='free')
I get an error below
IndexError: arrays used as indices must be of integer (or boolean) type
And also I couldn’t connect the line using geom_line()
.
Is this because of the different df
used? Is there a way to use multiple geom_point()
for different df and facet_wrap
in one ggplot object?
The issue with the question is a bug that would be reproduced by the following code. The bug has been fixed and the next version of plotnine will have the fix.
import pandas as pd
from plotnine import *
df1 = pd.DataFrame({
'x': list("abc"),
'y': [1, 2, 3],
'g': list("AAA")
})
df2 = pd.DataFrame({
'x': list("abc"),
'y': [4, 5, 6],
'g': list("AAB")
})
(ggplot(aes("x", "y"))
+ geom_point(df1)
+ geom_point(df2)
+ facet_wrap("g", scales="free_x")
)
In addition to the fixed bug as mentioned by @has2k1, I have found the solution to add a reference data point VarA
by renaming the column name of Var
to something else so that both df
do not have the same column name and will allow facet_wrap
to work only on one of the df
.
for pt in idlist:
#get VarA into new df
newdf = df_melt[df_melt.Var == 'VarA']
newdf.rename(columns = {'Var':'RefVar'},inplace=True)
#remove rows with VarA so it won't be included in facet_wrap()
tmp = df_melt[df_melt.Var != 'VarA']
plot2 = ggplot()
+ geom_point(tmp[tmp['ID'] == pt],aes(x='Week',y='Value',color='Var'))
+ facet_wrap('Var',ncol=1,scales='free')
+ geom_point(newdf[newdf['ID'] == pt],aes(x='Week',y='Value'))
+ labs(x='Week',y='Value') + ggtitle(pt) + theme(axis_text_x=element_text(rotation=45),subplots_adjust={'hspace': 0.6})
print(plot2)