Python Pandas : Pivot table : aggfunc concatenate instead of np.size or np.sum
Question:
I have some entries in dataframe like :
name, age, phonenumber
A,10, Phone1
A,10,Phone2
B,21,PhoneB1
B,21,PhoneB2
C,23,PhoneC
Here is what I am trying to achieve as result of pivot table:
name, age, phonenumbers, phonenocount
A,10, "Phone1,Phone2" , 2
B,21, "PhoneB1,PhoneB2", 2
C,23, "PhoneC" , 1
I was trying something like :
pd.pivot_table(phonedf, index=['name','age','phonenumbers'], values=['phonenumbers'], aggfunc=np.size)
however I want the phone numbers to be concatenated as part of aggfunc.
Any Suggestions ?
Answers:
You can use agg
function after the groupby
:
df.groupby(['name', 'age'])['phonenumber'].
agg({'phonecount': pd.Series.nunique,
'phonenumber': lambda x: ','.join(x)
}
)
# phonenumber phonecount
# name age
# A 10 Phone1,Phone2 2
# B 21 PhoneB1,PhoneB2 2
# C 23 PhoneC 1
Or a shorter version according to @root and @Jon Clements:
df.groupby(['name', 'age'])['phonenumber'].
agg({'phonecount': 'nunique', 'phonenumber': ','.join})
This answer comes from here:
https://medium.com/@enricobergamini/creating-non-numeric-pivot-tables-with-python-pandas-7aa9dfd788a7
Kudos to Enrico Bergamini for writing about this. I was struggling with this too.
Define the input first.
df = pd.DataFrame({'name':['a','a','b','b','c'],
'age':[10, 10, 21, 21, 23],
'phonenumber':['phone1', 'phone2', 'phoneb1', 'phoneb2',
'phonec']})
Use pandas pivot_table to re-shape as you want.
temp = pd.pivot_table(df, index=['name', 'age'], values='phonenumber',
aggfunc=[len, lambda x: ",".join(str(v) for v in x)])
Output:
len <lambda>
phonenumber phonenumber
name age
a 10 2 phone1,phone2
b 21 2 phoneb1,phoneb2
c 23 1 phonec
If you want to drop the multiindex in the columns, use this:
temp.columns = temp.columns.droplevel()
After you drop the functions from the column index, you can rename them easily.
temp.columns = ['count', 'concat']
New stored variable is:
count concat
name age
a 10 2 phone1,phone2
b 21 2 phoneb1,phoneb2
c 23 1 phonec
The pivot table uses df for data and phone for index and concatenates rows of code in a string variable. I used a list comprehension after aggregating to rename the resulting columns
fp=pd.pivot_table(data=df,index=["Phone"],values=["Code"],aggfunc=[len, lambda x: ", ".join(str(v) for v in x)])
fp.columns =["# of Codes" if str(column)=="('len', 'NewCode')" else str(column) for column in fp.columns.tolist()]
fp.columns =["Spec Code" if str(column)=="('<lambda>', 'NewCode')" else str(column) for column in fp.columns.tolist()]
I have some entries in dataframe like :
name, age, phonenumber
A,10, Phone1
A,10,Phone2
B,21,PhoneB1
B,21,PhoneB2
C,23,PhoneC
Here is what I am trying to achieve as result of pivot table:
name, age, phonenumbers, phonenocount
A,10, "Phone1,Phone2" , 2
B,21, "PhoneB1,PhoneB2", 2
C,23, "PhoneC" , 1
I was trying something like :
pd.pivot_table(phonedf, index=['name','age','phonenumbers'], values=['phonenumbers'], aggfunc=np.size)
however I want the phone numbers to be concatenated as part of aggfunc.
Any Suggestions ?
You can use agg
function after the groupby
:
df.groupby(['name', 'age'])['phonenumber'].
agg({'phonecount': pd.Series.nunique,
'phonenumber': lambda x: ','.join(x)
}
)
# phonenumber phonecount
# name age
# A 10 Phone1,Phone2 2
# B 21 PhoneB1,PhoneB2 2
# C 23 PhoneC 1
Or a shorter version according to @root and @Jon Clements:
df.groupby(['name', 'age'])['phonenumber'].
agg({'phonecount': 'nunique', 'phonenumber': ','.join})
This answer comes from here:
https://medium.com/@enricobergamini/creating-non-numeric-pivot-tables-with-python-pandas-7aa9dfd788a7
Kudos to Enrico Bergamini for writing about this. I was struggling with this too.
Define the input first.
df = pd.DataFrame({'name':['a','a','b','b','c'],
'age':[10, 10, 21, 21, 23],
'phonenumber':['phone1', 'phone2', 'phoneb1', 'phoneb2',
'phonec']})
Use pandas pivot_table to re-shape as you want.
temp = pd.pivot_table(df, index=['name', 'age'], values='phonenumber',
aggfunc=[len, lambda x: ",".join(str(v) for v in x)])
Output:
len <lambda>
phonenumber phonenumber
name age
a 10 2 phone1,phone2
b 21 2 phoneb1,phoneb2
c 23 1 phonec
If you want to drop the multiindex in the columns, use this:
temp.columns = temp.columns.droplevel()
After you drop the functions from the column index, you can rename them easily.
temp.columns = ['count', 'concat']
New stored variable is:
count concat
name age
a 10 2 phone1,phone2
b 21 2 phoneb1,phoneb2
c 23 1 phonec
The pivot table uses df for data and phone for index and concatenates rows of code in a string variable. I used a list comprehension after aggregating to rename the resulting columns
fp=pd.pivot_table(data=df,index=["Phone"],values=["Code"],aggfunc=[len, lambda x: ", ".join(str(v) for v in x)])
fp.columns =["# of Codes" if str(column)=="('len', 'NewCode')" else str(column) for column in fp.columns.tolist()]
fp.columns =["Spec Code" if str(column)=="('<lambda>', 'NewCode')" else str(column) for column in fp.columns.tolist()]