Pandas – Get column value counts as new columns in dataframe

Question:

I have a pandas dataframe that looks like this:

Type Status
typeA New
typeA Working
typeA Working
typeA Closed
typeA Closed
typeA Closed
typeB New
typeB Working
typeC Closed
typeC Closed
typeC Closed

I’d like to group the dataframe by the ‘Type’ field and get the count of each status as a column, like so:

Type New Working Closed
typeA 1 2 3
typeB 1 1 0
typeC 0 0 3

I’d also like columns for statuses that could exist (I have a list all possibilities), but may not be represented in the input dataframe, so the final result would be something like this:

Type New Working Closed Escalate
typeA 1 2 3 0
typeB 1 1 0 0
typeC 0 0 3 0

I’m able to get the counts per status by using:

closureCodeCounts = closureCodes.groupby(['type','status'],as_index=False).size()

I’ve also tried

closureCodeCounts = closureCodeCounts.groupby('type').value_counts()
closureCodeCounts = closureCodeCounts.unstack()

But nothing seems to come out right.

I’m pretty lost. What’s the best way to do this?

Asked By: s3p1a

||

Answers:

Try as follows:

  • Use pd.crosstab to reach the first stage of your desired output.
  • For the second stage, I am assuming that the list you mention indeed contains all possible values. If so, we can apply df.reindex to axis=1 to add the missing possibilities as columns.
  • As mentioned by BeRT2me in the comments, we can use the fill_value parameter inside df.reindex to populate the new columns with zeros (instead of default NaN values).
possible_statuses = ['New','Working','Closed','Escalate']

res = (pd.crosstab(closureCodes.Type, closureCodes.Status)
       .reindex(possible_statuses, axis=1, fill_value=0))

print(res)

Status  New  Working  Closed  Escalate
Type                                  
typeA     1        2       3         0
typeB     1        1       0         0
typeC     0        0       3         0

An alternative approach to reach the first stage could be as follows:

res = (closureCodes.groupby('Type')
       .value_counts()
       .unstack(fill_value=0)
       .reindex(possible_statuses, axis=1, fill_value=0))

print(res)

Status  New  Working  Closed  Escalate
Type                                  
typeA     1        2       3         0
typeB     1        1       0         0
typeC     0        0       3         0

This is, of course, pretty close to what you were trying to do in the first place (but you don’t need the intermediate closureCodeCounts).


"Cosmetic" additions:

res.columns.name = None # to get rid of "Status" as `columns.name`
res.index.name = None # similar for `index`
Answered By: ouroboros1

You can make use of the pivot table to transpose your grouped Dataframe –

closureCodeCounts = pd.pivot_table(closureCodeCounts, values = 'size', index=['type'], columns = 'status').fillna(0)

And then similar to @ouroboros1 answer, reindex your Dataframe to add the missing columns.

possible_statuses = ['New','Working','Closed','Escalate']
result = closureCodeCounts.reindex(columns=possible_statuses, fill_value=0)
Answered By: Inderpartap Cheema
val = df.groupby(['Type']).value_counts()
ax = pd.MultiIndex.from_tuples(val.axes[0])
df = pd.DataFrame(np.nan, index=[0], columns=ax)
for i in range(len(val)): df.loc[0, ax[i]] = val[i]
typeA typeB typeC
Closed Working New New Working Closed
3.0 2.0 1.0 1.0 1.0 3.0
Answered By: Michael

Convert Status to a categorical.

Then, we’ll make a simple pivot table:

df.Status = pd.Categorical(df.Status, ['New', 'Working', 'Closed', 'Escalate'])

# Using a pivot table:
out = df.pivot_table(index='Type', columns='Status', aggfunc='size')

# Or, using a groupby:
out = df.groupby(['Type', 'Status']).size().unstack('Status')

# Or, making a crosstab:
out = pd.crosstab(df.Type, df.Status, dropna=False)

print(out)

Output:

Status  New  Working  Closed  Escalate
Type
typeA     1        2       3         0
typeB     1        1       0         0
typeC     0        0       3         0
Answered By: BeRT2me
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.