I want to create a new frequency column for each column in a pandas dataframe

Question

Let’s say I have a dataframe like this:

colors	animals
yellow	cat
yellow	cat
red	cat
red	cat
blue	cat

I want to create a column for each column showing the frequency in which each value happens:

colors	colors_frequency	animals	animals_frequency
yellow	40%	cat	100%
yellow	40%	cat	100%
red	40%	cat	100%
red	40%	cat	100%
blue	20%	cat	100%

I tried

frequency = list()
for column in df.columns:
     series = (df[column].value_counts(normalize=True, dropna=True)*100)
     overview.append(series)

#overview list
o_colors = overview[0] 
o_animals = overview[1]

df['animals_frequency'] = o_animals

If I try

df.info()

it returns

Column	Non-Null	Count	Dtype
animals_frequency	0	non-null	float64

Asked By: Luciana Coelho

||

Source

Answer 1

A simple approach is to calculate the relative frequency for each column values and then join these frequencies back to the original DataFrame.

for col in df.columns:
    # Get relative frequency
    frequency = df[col].value_counts(normalize=True)

    # Format frequency as percent values, rounded to two decimals
    frequency_percent = (frequency*100).round(2).astype(str) + '%'

    # Join frequency values to original DF
    frequency_percent.name = f"{col}_frequency"
    df = df.merge(frequency_percent.to_frame(), left_on=col, right_index=True)

Answered By: Erik Fubel

Answer 2

For each column, compute the frequency of each distinct element. Then, use replace to map each element to the corresponding frequency

for column in df.columns:
    mapping = df[column].value_counts(normalize=True, dropna=True) * 100
    df[f"{column}_frequency"] = df[column].replace(mapping)

Answered By: Lewan

I want to create a new frequency column for each column in a pandas dataframe

Question:

Answers: