Python Pandas : Pivot table : aggfunc concatenate instead of np.size or np.sum

Question:

I have some entries in dataframe like :

name, age, phonenumber
 A,10, Phone1
 A,10,Phone2
 B,21,PhoneB1
 B,21,PhoneB2
 C,23,PhoneC

Here is what I am trying to achieve as result of pivot table:

 name, age, phonenumbers, phonenocount
 A,10, "Phone1,Phone2" , 2
 B,21,  "PhoneB1,PhoneB2", 2
 C,23, "PhoneC" , 1

I was trying something like :

pd.pivot_table(phonedf, index=['name','age','phonenumbers'], values=['phonenumbers'], aggfunc=np.size)

however I want the phone numbers to be concatenated as part of aggfunc.
Any Suggestions ?

Answers:

You can use agg function after the groupby:

df.groupby(['name', 'age'])['phonenumber'].
    agg({'phonecount': pd.Series.nunique, 
         'phonenumber': lambda x: ','.join(x)
        }
       )

#               phonenumber  phonecount
# name  age     
#    A   10   Phone1,Phone2           2
#    B   21 PhoneB1,PhoneB2           2
#    C   23          PhoneC           1

Or a shorter version according to @root and @Jon Clements:

df.groupby(['name', 'age'])['phonenumber'].
   agg({'phonecount': 'nunique', 'phonenumber': ','.join})
Answered By: Psidom

This answer comes from here:
https://medium.com/@enricobergamini/creating-non-numeric-pivot-tables-with-python-pandas-7aa9dfd788a7

Kudos to Enrico Bergamini for writing about this. I was struggling with this too.

Define the input first.

df = pd.DataFrame({'name':['a','a','b','b','c'], 
                   'age':[10, 10, 21, 21, 23], 
                   'phonenumber':['phone1', 'phone2', 'phoneb1', 'phoneb2',
                                  'phonec']})

Use pandas pivot_table to re-shape as you want.

temp = pd.pivot_table(df, index=['name', 'age'], values='phonenumber',
                      aggfunc=[len, lambda x: ",".join(str(v) for v in x)])

Output:

                 len         <lambda>
         phonenumber      phonenumber
name age                             
a    10            2    phone1,phone2
b    21            2  phoneb1,phoneb2
c    23            1           phonec

If you want to drop the multiindex in the columns, use this:
temp.columns = temp.columns.droplevel()

After you drop the functions from the column index, you can rename them easily.

temp.columns = ['count', 'concat']

New stored variable is:

          count           concat
name age                        
a    10       2    phone1,phone2
b    21       2  phoneb1,phoneb2
c    23       1           phonec
Answered By: Foggy

The pivot table uses df for data and phone for index and concatenates rows of code in a string variable. I used a list comprehension after aggregating to rename the resulting columns

fp=pd.pivot_table(data=df,index=["Phone"],values=["Code"],aggfunc=[len,  lambda x: ", ".join(str(v) for v in x)])
fp.columns =["# of Codes" if str(column)=="('len', 'NewCode')" else str(column) for column in fp.columns.tolist()]
fp.columns =["Spec Code" if str(column)=="('<lambda>', 'NewCode')" else str(column) for column in fp.columns.tolist()]
Answered By: Golden Lion
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.