Pandas – Get column value counts as new columns in dataframe
Question:
I have a pandas dataframe that looks like this:
Type
Status
typeA
New
typeA
Working
typeA
Working
typeA
Closed
typeA
Closed
typeA
Closed
typeB
New
typeB
Working
typeC
Closed
typeC
Closed
typeC
Closed
I’d like to group the dataframe by the ‘Type’ field and get the count of each status as a column, like so:
Type
New
Working
Closed
typeA
1
2
3
typeB
1
1
0
typeC
0
0
3
I’d also like columns for statuses that could exist (I have a list all possibilities), but may not be represented in the input dataframe, so the final result would be something like this:
Type
New
Working
Closed
Escalate
typeA
1
2
3
0
typeB
1
1
0
0
typeC
0
0
3
0
I’m able to get the counts per status by using:
closureCodeCounts = closureCodes.groupby(['type','status'],as_index=False).size()
I’ve also tried
closureCodeCounts = closureCodeCounts.groupby('type').value_counts()
closureCodeCounts = closureCodeCounts.unstack()
But nothing seems to come out right.
I’m pretty lost. What’s the best way to do this?
Answers:
Try as follows:
- Use
pd.crosstab
to reach the first stage of your desired output.
- For the second stage, I am assuming that the
list
you mention indeed contains all possible values. If so, we can apply df.reindex
to axis=1
to add the missing possibilities as columns
.
- As mentioned by BeRT2me in the comments, we can use the
fill_value
parameter inside df.reindex
to populate the new columns with zeros (instead of default NaN
values).
possible_statuses = ['New','Working','Closed','Escalate']
res = (pd.crosstab(closureCodes.Type, closureCodes.Status)
.reindex(possible_statuses, axis=1, fill_value=0))
print(res)
Status New Working Closed Escalate
Type
typeA 1 2 3 0
typeB 1 1 0 0
typeC 0 0 3 0
An alternative approach to reach the first stage could be as follows:
- Use
df.groupby
with value_counts
and chain df.unstack
.
res = (closureCodes.groupby('Type')
.value_counts()
.unstack(fill_value=0)
.reindex(possible_statuses, axis=1, fill_value=0))
print(res)
Status New Working Closed Escalate
Type
typeA 1 2 3 0
typeB 1 1 0 0
typeC 0 0 3 0
This is, of course, pretty close to what you were trying to do in the first place (but you don’t need the intermediate closureCodeCounts
).
"Cosmetic" additions:
res.columns.name = None # to get rid of "Status" as `columns.name`
res.index.name = None # similar for `index`
You can make use of the pivot table to transpose your grouped Dataframe –
closureCodeCounts = pd.pivot_table(closureCodeCounts, values = 'size', index=['type'], columns = 'status').fillna(0)
And then similar to @ouroboros1 answer, reindex your Dataframe to add the missing columns.
possible_statuses = ['New','Working','Closed','Escalate']
result = closureCodeCounts.reindex(columns=possible_statuses, fill_value=0)
val = df.groupby(['Type']).value_counts()
ax = pd.MultiIndex.from_tuples(val.axes[0])
df = pd.DataFrame(np.nan, index=[0], columns=ax)
for i in range(len(val)): df.loc[0, ax[i]] = val[i]
typeA
typeB
typeC
Closed
Working
New
New
Working
Closed
3.0
2.0
1.0
1.0
1.0
3.0
Convert Status
to a categorical.
Then, we’ll make a simple pivot table:
df.Status = pd.Categorical(df.Status, ['New', 'Working', 'Closed', 'Escalate'])
# Using a pivot table:
out = df.pivot_table(index='Type', columns='Status', aggfunc='size')
# Or, using a groupby:
out = df.groupby(['Type', 'Status']).size().unstack('Status')
# Or, making a crosstab:
out = pd.crosstab(df.Type, df.Status, dropna=False)
print(out)
Output:
Status New Working Closed Escalate
Type
typeA 1 2 3 0
typeB 1 1 0 0
typeC 0 0 3 0
I have a pandas dataframe that looks like this:
Type | Status |
---|---|
typeA | New |
typeA | Working |
typeA | Working |
typeA | Closed |
typeA | Closed |
typeA | Closed |
typeB | New |
typeB | Working |
typeC | Closed |
typeC | Closed |
typeC | Closed |
I’d like to group the dataframe by the ‘Type’ field and get the count of each status as a column, like so:
Type | New | Working | Closed |
---|---|---|---|
typeA | 1 | 2 | 3 |
typeB | 1 | 1 | 0 |
typeC | 0 | 0 | 3 |
I’d also like columns for statuses that could exist (I have a list all possibilities), but may not be represented in the input dataframe, so the final result would be something like this:
Type | New | Working | Closed | Escalate |
---|---|---|---|---|
typeA | 1 | 2 | 3 | 0 |
typeB | 1 | 1 | 0 | 0 |
typeC | 0 | 0 | 3 | 0 |
I’m able to get the counts per status by using:
closureCodeCounts = closureCodes.groupby(['type','status'],as_index=False).size()
I’ve also tried
closureCodeCounts = closureCodeCounts.groupby('type').value_counts()
closureCodeCounts = closureCodeCounts.unstack()
But nothing seems to come out right.
I’m pretty lost. What’s the best way to do this?
Try as follows:
- Use
pd.crosstab
to reach the first stage of your desired output. - For the second stage, I am assuming that the
list
you mention indeed contains all possible values. If so, we can applydf.reindex
toaxis=1
to add the missing possibilities ascolumns
. - As mentioned by BeRT2me in the comments, we can use the
fill_value
parameter insidedf.reindex
to populate the new columns with zeros (instead of defaultNaN
values).
possible_statuses = ['New','Working','Closed','Escalate']
res = (pd.crosstab(closureCodes.Type, closureCodes.Status)
.reindex(possible_statuses, axis=1, fill_value=0))
print(res)
Status New Working Closed Escalate
Type
typeA 1 2 3 0
typeB 1 1 0 0
typeC 0 0 3 0
An alternative approach to reach the first stage could be as follows:
- Use
df.groupby
withvalue_counts
and chaindf.unstack
.
res = (closureCodes.groupby('Type')
.value_counts()
.unstack(fill_value=0)
.reindex(possible_statuses, axis=1, fill_value=0))
print(res)
Status New Working Closed Escalate
Type
typeA 1 2 3 0
typeB 1 1 0 0
typeC 0 0 3 0
This is, of course, pretty close to what you were trying to do in the first place (but you don’t need the intermediate closureCodeCounts
).
"Cosmetic" additions:
res.columns.name = None # to get rid of "Status" as `columns.name`
res.index.name = None # similar for `index`
You can make use of the pivot table to transpose your grouped Dataframe –
closureCodeCounts = pd.pivot_table(closureCodeCounts, values = 'size', index=['type'], columns = 'status').fillna(0)
And then similar to @ouroboros1 answer, reindex your Dataframe to add the missing columns.
possible_statuses = ['New','Working','Closed','Escalate']
result = closureCodeCounts.reindex(columns=possible_statuses, fill_value=0)
val = df.groupby(['Type']).value_counts()
ax = pd.MultiIndex.from_tuples(val.axes[0])
df = pd.DataFrame(np.nan, index=[0], columns=ax)
for i in range(len(val)): df.loc[0, ax[i]] = val[i]
typeA | typeB | typeC | |||
---|---|---|---|---|---|
Closed | Working | New | New | Working | Closed |
3.0 | 2.0 | 1.0 | 1.0 | 1.0 | 3.0 |
Convert Status
to a categorical.
Then, we’ll make a simple pivot table:
df.Status = pd.Categorical(df.Status, ['New', 'Working', 'Closed', 'Escalate'])
# Using a pivot table:
out = df.pivot_table(index='Type', columns='Status', aggfunc='size')
# Or, using a groupby:
out = df.groupby(['Type', 'Status']).size().unstack('Status')
# Or, making a crosstab:
out = pd.crosstab(df.Type, df.Status, dropna=False)
print(out)
Output:
Status New Working Closed Escalate
Type
typeA 1 2 3 0
typeB 1 1 0 0
typeC 0 0 3 0