I want to create a new frequency column for each column in a pandas dataframe
Question:
Let’s say I have a dataframe like this:
colors
animals
yellow
cat
yellow
cat
red
cat
red
cat
blue
cat
I want to create a column for each column showing the frequency in which each value happens:
colors
colors_frequency
animals
animals_frequency
yellow
40%
cat
100%
yellow
40%
cat
100%
red
40%
cat
100%
red
40%
cat
100%
blue
20%
cat
100%
I tried
frequency = list()
for column in df.columns:
series = (df[column].value_counts(normalize=True, dropna=True)*100)
overview.append(series)
#overview list
o_colors = overview[0]
o_animals = overview[1]
df['animals_frequency'] = o_animals
If I try
df.info()
it returns
Column
Non-Null
Count
Dtype
animals_frequency
0
non-null
float64
Answers:
A simple approach is to calculate the relative frequency for each column values and then join these frequencies back to the original DataFrame.
for col in df.columns:
# Get relative frequency
frequency = df[col].value_counts(normalize=True)
# Format frequency as percent values, rounded to two decimals
frequency_percent = (frequency*100).round(2).astype(str) + '%'
# Join frequency values to original DF
frequency_percent.name = f"{col}_frequency"
df = df.merge(frequency_percent.to_frame(), left_on=col, right_index=True)
For each column, compute the frequency of each distinct element. Then, use replace
to map each element to the corresponding frequency
for column in df.columns:
mapping = df[column].value_counts(normalize=True, dropna=True) * 100
df[f"{column}_frequency"] = df[column].replace(mapping)
Let’s say I have a dataframe like this:
colors | animals |
---|---|
yellow | cat |
yellow | cat |
red | cat |
red | cat |
blue | cat |
I want to create a column for each column showing the frequency in which each value happens:
colors | colors_frequency | animals | animals_frequency |
---|---|---|---|
yellow | 40% | cat | 100% |
yellow | 40% | cat | 100% |
red | 40% | cat | 100% |
red | 40% | cat | 100% |
blue | 20% | cat | 100% |
I tried
frequency = list()
for column in df.columns:
series = (df[column].value_counts(normalize=True, dropna=True)*100)
overview.append(series)
#overview list
o_colors = overview[0]
o_animals = overview[1]
df['animals_frequency'] = o_animals
If I try
df.info()
it returns
Column | Non-Null | Count | Dtype |
---|---|---|---|
animals_frequency | 0 | non-null | float64 |
A simple approach is to calculate the relative frequency for each column values and then join these frequencies back to the original DataFrame.
for col in df.columns:
# Get relative frequency
frequency = df[col].value_counts(normalize=True)
# Format frequency as percent values, rounded to two decimals
frequency_percent = (frequency*100).round(2).astype(str) + '%'
# Join frequency values to original DF
frequency_percent.name = f"{col}_frequency"
df = df.merge(frequency_percent.to_frame(), left_on=col, right_index=True)
For each column, compute the frequency of each distinct element. Then, use replace
to map each element to the corresponding frequency
for column in df.columns:
mapping = df[column].value_counts(normalize=True, dropna=True) * 100
df[f"{column}_frequency"] = df[column].replace(mapping)