Python Pandas: Convert ".value_counts" output to dataframe

Question

Hi I want to get the counts of unique values of the dataframe. count_values implements this however I want to use its output somewhere else. How can I convert .count_values output to a pandas dataframe. here is an example code:

import pandas as pd
df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)
print(value_counts)
print(type(value_counts))

output is:

2    3
1    2
Name: a, dtype: int64
<class 'pandas.core.series.Series'>

What I need is a dataframe like this:

unique_values  counts
2              3
1              2

Thank you.

Asked By: s900n

||

Source

Answer 1

Use rename_axis for name of column from index and reset_index:

df = df.value_counts().rename_axis('unique_values').reset_index(name='counts')
print (df)
   unique_values  counts
0              2       3
1              1       2

Or if need one column DataFrame use Series.to_frame:

df = df.value_counts().rename_axis('unique_values').to_frame('counts')
print (df)
               counts
unique_values        
2                   3
1                   2

Answered By: jezrael

Answer 2

I just run into the same problem, so I provide my thoughts here.

Warning

When you deal with the data structure of Pandas, you have to aware of the return type.

Another solution here

Like @jezrael mentioned before, Pandas do provide API pd.Series.to_frame.

Step 1

You can also wrap the pd.Series to pd.DataFrame by just doing

df_val_counts = pd.DataFrame(value_counts) # wrap pd.Series to pd.DataFrame

Then, you have a pd.DataFrame with column name 'a', and your first column become the index

Input:  print(df_value_counts.index.values)
Output: [2 1]

Input:  print(df_value_counts.columns)
Output: Index(['a'], dtype='object')

Step 2

What now?

If you want to add new column names here, as a pd.DataFrame, you can simply reset the index by the API of reset_index().

And then, change the column name by a list by API df.coloumns

df_value_counts = df_value_counts.reset_index()
df_value_counts.columns = ['unique_values', 'counts']

Then, you got what you need

Output:

       unique_values    counts
    0              2         3
    1              1         2

Full Answer here

import pandas as pd

df = pd.DataFrame({'a':[1, 1, 2, 2, 2]})
value_counts = df['a'].value_counts(dropna=True, sort=True)

# solution here
df_val_counts = pd.DataFrame(value_counts)
df_value_counts_reset = df_val_counts.reset_index()
df_value_counts_reset.columns = ['unique_values', 'counts'] # change column names

Answered By: WY Hsu

Answer 3

I’ll throw in my hat as well, essentially the same as @wy-hsu solution, but in function format:

def value_counts_df(df, col):
    """
    Returns pd.value_counts() as a DataFrame

    Parameters
    ----------
    df : Pandas Dataframe
        Dataframe on which to run value_counts(), must have column `col`.
    col : str
        Name of column in `df` for which to generate counts

    Returns
    -------
    Pandas Dataframe
        Returned dataframe will have a single column named "count" which contains the count_values()
        for each unique value of df[col]. The index name of this dataframe is `col`.

    Example
    -------
    >>> value_counts_df(pd.DataFrame({'a':[1, 1, 2, 2, 2]}), 'a')
       count
    a
    2      3
    1      2
    """
    df = pd.DataFrame(df[col].value_counts())
    df.index.name = col
    df.columns = ['count']
    return df

Answered By: Constantino

Answer 4

pd.DataFrame(
    df.groupby(['groupby_col'])['column_to_perform_value_count'].value_counts()
).rename(
    columns={'old_column_name': 'new_column_name'}
).reset_index()

Answered By: parul sharma

Answer 5

Example of selecting a subset of columns from a dataframe, grouping, applying value_count per group, name value_count column as Count, and displaying first n groups.

# Select 5 columns (A..E) from a dataframe (data_df).
# Sort on A,B. groupby B. Display first 3 groups.
df = data_df[['A','B','C','D','E']].sort_values(['A','B'])
g = df.groupby(['B'])
for n,(k,gg) in enumerate(list(g)[:3]): # display first 3 groups
    display(k,gg.value_counts().to_frame('Count').reset_index())

Answered By: BSalita