python pandas pivot_table count frequency in one column

Question:

I am still new to Python pandas’ pivot_table and would like to ask a way to count frequencies of values in one column, which is also linked to another column of ID. The DataFrame looks like the following.

import pandas as pd
df = pd.DataFrame({'Account_number':[1,1,2,2,2,3,3],
                   'Product':['A', 'A', 'A', 'B', 'B','A', 'B']
                  })

For the output, I’d like to get something like the following:

                Product
                A      B
Account_number           
      1         2      0
      2         1      2
      3         1      1

So far, I tried this code:

df.pivot_table(rows = 'Account_number', cols= 'Product', aggfunc='count')

This code gives me the two same things. What is the problems with the code above? A part of the reason why I am asking this question is that this DataFrame is just an example. The real data that I am working on has tens of thousands of account_numbers.

Asked By: midtownguru

||

Answers:

You need to specify the aggfunc as len:

In [11]: df.pivot_table(index='Account_number', columns='Product', 
                        aggfunc=len, fill_value=0)
Out[11]:
Product         A  B
Account_number
1               2  0
2               1  2
3               1  1

It looks like count, is counting the instances of each column (Account_number and Product), it’s not clear to me whether this is a bug…

Answered By: Andy Hayden

In new version of Pandas, slight modification is required. I had to spend some time figuring out so just wanted to add that here so that someone can directly use this.

df.pivot_table(index='Account_number', columns='Product', aggfunc=len,
               fill_value=0)
Answered By: PagMax

You can use count:

df.pivot_table(index='Account_number', columns='Product', aggfunc='count')
Answered By: Rui Wang

Solution: Use aggfunc='size'

Using aggfunc=len or aggfunc='count' like all the other answers on this page will not work for DataFrames with more than three columns. By default, pandas will apply this aggfunc to all the columns not found in index or columns parameters.

For instance, if we had two more columns in our original DataFrame defined like this:

df = pd.DataFrame({'Account_number':[1, 1, 2 ,2 ,2 ,3 ,3], 
                   'Product':['A', 'A', 'A', 'B', 'B','A', 'B'], 
                   'Price': [10] * 7,
                   'Quantity': [100] * 7})

Output:

   Account_number Product  Price  Quantity
0               1       A     10       100
1               1       A     10       100
2               2       A     10       100
3               2       B     10       100
4               2       B     10       100
5               3       A     10       100
6               3       B     10       100

If you apply the current solutions to this DataFrame, you would get the following:

df.pivot_table(index='Account_number',
               columns='Product',
               aggfunc=len,
               fill_value=0)

Output:

                  Price    Quantity   
Product            A  B        A  B
Account_number                     
1                  2  0        2  0
2                  1  2        1  2
3                  1  1        1  1

Solution

Instead, use aggfunc='size'. Since size always returns the same number for each column, pandas does not call it on every single column and just does it once.

df.pivot_table(index='Account_number', 
               columns='Product',
               aggfunc='size',
               fill_value=0)

Output:

Product         A  B
Account_number      
1               2  0
2               1  2
3               1  1
Answered By: Ted Petrou

I know this question is about pivot_table but for the problem given in the question, we can use crosstab:

out = pd.crosstab(df['Account_number'], df['Product'])

Output:

Product         A  B
Account_number      
1               2  0
2               1  2
3               1  1
Answered By: user7864386
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.