Pandas – Get column value counts as new columns in dataframe

Question

I have a pandas dataframe that looks like this:

Type	Status
typeA	New
typeA	Working
typeA	Working
typeA	Closed
typeA	Closed
typeA	Closed
typeB	New
typeB	Working
typeC	Closed
typeC	Closed
typeC	Closed

I’d like to group the dataframe by the ‘Type’ field and get the count of each status as a column, like so:

Type	New	Working	Closed
typeA	1	2	3
typeB	1	1	0
typeC	0	0	3

I’d also like columns for statuses that could exist (I have a list all possibilities), but may not be represented in the input dataframe, so the final result would be something like this:

Type	New	Working	Closed
typeA	1	2	3
typeB	1	1	0
typeC	0	0	3

I’m able to get the counts per status by using:

closureCodeCounts = closureCodes.groupby(['type','status'],as_index=False).size()

I’ve also tried

closureCodeCounts = closureCodeCounts.groupby('type').value_counts()
closureCodeCounts = closureCodeCounts.unstack()

But nothing seems to come out right.

I’m pretty lost. What’s the best way to do this?

Asked By: s3p1a

||

Source

Answer 1

Try as follows:

Use pd.crosstab to reach the first stage of your desired output.
For the second stage, I am assuming that the list you mention indeed contains all possible values. If so, we can apply df.reindex to axis=1 to add the missing possibilities as columns.
As mentioned by BeRT2me in the comments, we can use the fill_value parameter inside df.reindex to populate the new columns with zeros (instead of default NaN values).

possible_statuses = ['New','Working','Closed','Escalate']

res = (pd.crosstab(closureCodes.Type, closureCodes.Status)
       .reindex(possible_statuses, axis=1, fill_value=0))

print(res)

Status  New  Working  Closed  Escalate
Type                                  
typeA     1        2       3         0
typeB     1        1       0         0
typeC     0        0       3         0

An alternative approach to reach the first stage could be as follows:

Use df.groupby with value_counts and chain df.unstack.

res = (closureCodes.groupby('Type')
       .value_counts()
       .unstack(fill_value=0)
       .reindex(possible_statuses, axis=1, fill_value=0))

print(res)

Status  New  Working  Closed  Escalate
Type                                  
typeA     1        2       3         0
typeB     1        1       0         0
typeC     0        0       3         0

This is, of course, pretty close to what you were trying to do in the first place (but you don’t need the intermediate closureCodeCounts).

"Cosmetic" additions:

res.columns.name = None # to get rid of "Status" as `columns.name`
res.index.name = None # similar for `index`

Answered By: ouroboros1

Answer 2

You can make use of the pivot table to transpose your grouped Dataframe –

closureCodeCounts = pd.pivot_table(closureCodeCounts, values = 'size', index=['type'], columns = 'status').fillna(0)

And then similar to @ouroboros1 answer, reindex your Dataframe to add the missing columns.

possible_statuses = ['New','Working','Closed','Escalate']
result = closureCodeCounts.reindex(columns=possible_statuses, fill_value=0)

Answered By: Inderpartap Cheema

Answer 3

val = df.groupby(['Type']).value_counts()
ax = pd.MultiIndex.from_tuples(val.axes[0])
df = pd.DataFrame(np.nan, index=[0], columns=ax)
for i in range(len(val)): df.loc[0, ax[i]] = val[i]

typeA			typeB		typeC
Closed	Working	New	New	Working	Closed
3.0	2.0	1.0	1.0	1.0	3.0

Answered By: Michael

Answer 4

Convert Status to a categorical.

Then, we’ll make a simple pivot table:

df.Status = pd.Categorical(df.Status, ['New', 'Working', 'Closed', 'Escalate'])

# Using a pivot table:
out = df.pivot_table(index='Type', columns='Status', aggfunc='size')

# Or, using a groupby:
out = df.groupby(['Type', 'Status']).size().unstack('Status')

# Or, making a crosstab:
out = pd.crosstab(df.Type, df.Status, dropna=False)

print(out)

Output:

Status  New  Working  Closed  Escalate
Type
typeA     1        2       3         0
typeB     1        1       0         0
typeC     0        0       3         0

Answered By: BeRT2me

Pandas – Get column value counts as new columns in dataframe

Question:

Answers: