I want to create a new frequency column for each column in a pandas dataframe

Question:

Let’s say I have a dataframe like this:

colors animals
yellow cat
yellow cat
red cat
red cat
blue cat

I want to create a column for each column showing the frequency in which each value happens:

colors colors_frequency animals animals_frequency
yellow 40% cat 100%
yellow 40% cat 100%
red 40% cat 100%
red 40% cat 100%
blue 20% cat 100%

I tried

frequency = list()
for column in df.columns:
     series = (df[column].value_counts(normalize=True, dropna=True)*100)
     overview.append(series)

#overview list
o_colors = overview[0] 
o_animals = overview[1]

df['animals_frequency'] = o_animals

If I try

df.info()

it returns

Column Non-Null Count Dtype
animals_frequency 0 non-null float64
Asked By: Luciana Coelho

||

Answers:

A simple approach is to calculate the relative frequency for each column values and then join these frequencies back to the original DataFrame.

for col in df.columns:
    # Get relative frequency
    frequency = df[col].value_counts(normalize=True)

    # Format frequency as percent values, rounded to two decimals
    frequency_percent = (frequency*100).round(2).astype(str) + '%'

    # Join frequency values to original DF
    frequency_percent.name = f"{col}_frequency"
    df = df.merge(frequency_percent.to_frame(), left_on=col, right_index=True)
Answered By: Erik Fubel

For each column, compute the frequency of each distinct element. Then, use replace to map each element to the corresponding frequency

for column in df.columns:
    mapping = df[column].value_counts(normalize=True, dropna=True) * 100
    df[f"{column}_frequency"] = df[column].replace(mapping)
Answered By: Lewan