Pandas get the most frequent values of a column
Question:
i have this dataframe:
0 name data
1 alex asd
2 helen sdd
3 alex dss
4 helen sdsd
5 john sdadd
so i am trying to get the most frequent value or values(in this case its values)
so what i do is:
dataframe['name'].value_counts().idxmax()
but it returns only the value: Alex even if it Helen appears two times as well.
Answers:
Here’s one way:
df['name'].value_counts()[df['name'].value_counts() == df['name'].value_counts().max()]
which prints:
helen 2
alex 2
Name: name, dtype: int64
By using mode
df.name.mode()
Out[712]:
0 alex
1 helen
dtype: object
You could use .apply and pd.value_counts to get a count the occurrence of all the names in the name column.
dataframe['name'].apply(pd.value_counts)
Not Obvious, But Fast
f, u = pd.factorize(df.name.values)
counts = np.bincount(f)
u[counts == counts.max()]
array(['alex', 'helen'], dtype=object)
You could try argmax
like this:
dataframe['name'].value_counts().argmax()
Out[13]: 'alex'
The value_counts
will return a count object of pandas.core.series.Series
and argmax
could be used to achieve the key of max values.
You can use this to get a perfect count, it calculates the mode a particular column
df['name'].value_counts()
To get the n
most frequent values, just subset .value_counts()
and grab the index:
# get top 10 most frequent names
n = 10
dataframe['name'].value_counts()[:n].index.tolist()
to get top 5:
dataframe['name'].value_counts()[0:5]
To get the top five most common names:
dataframe['name'].value_counts().head()
df['name'].value_counts()[:5].sort_values(ascending=False)
The value_counts
will return a count object of pandas.core.series.Series
and sort_values(ascending=False)
will get you the highest values first.
my best solution to get the first is
df['my_column'].value_counts().sort_values(ascending=False).argmax()
Simply use this..
dataframe['name'].value_counts().nlargest(n)
The functions for frequencies largest and smallest are:
nlargest()
for mostfrequent ‘n’ values
nsmallest()
for least frequent ‘n’ values
Use:
df['name'].mode()
or
df['name'].value_counts().idxmax()
n is used to get the number of top frequent used items
n = 2
a=dataframe['name'].value_counts()[:n].index.tolist()
dataframe["name"].value_counts()[a]
I had a similar issue best most compact answer to get lets say the top n (5 is default) most frequent values is:
df["column_name"].value_counts().head(n)
Identifying the top 5, for example, using value_counts
top5 = df['column'].value_counts()
Listing contents of ‘top_5’
top5[:5]
Getting top 5 most common lastname pandas:
df['name'].apply(lambda name: name.split()[-1]).value_counts()[:5]
It will give top five most common names:
df['name'].value_counts().nlargest(5)
i have this dataframe:
0 name data
1 alex asd
2 helen sdd
3 alex dss
4 helen sdsd
5 john sdadd
so i am trying to get the most frequent value or values(in this case its values)
so what i do is:
dataframe['name'].value_counts().idxmax()
but it returns only the value: Alex even if it Helen appears two times as well.
Here’s one way:
df['name'].value_counts()[df['name'].value_counts() == df['name'].value_counts().max()]
which prints:
helen 2
alex 2
Name: name, dtype: int64
By using mode
df.name.mode()
Out[712]:
0 alex
1 helen
dtype: object
You could use .apply and pd.value_counts to get a count the occurrence of all the names in the name column.
dataframe['name'].apply(pd.value_counts)
Not Obvious, But Fast
f, u = pd.factorize(df.name.values)
counts = np.bincount(f)
u[counts == counts.max()]
array(['alex', 'helen'], dtype=object)
You could try argmax
like this:
dataframe['name'].value_counts().argmax()
Out[13]: 'alex'
The value_counts
will return a count object of pandas.core.series.Series
and argmax
could be used to achieve the key of max values.
You can use this to get a perfect count, it calculates the mode a particular column
df['name'].value_counts()
To get the n
most frequent values, just subset .value_counts()
and grab the index:
# get top 10 most frequent names
n = 10
dataframe['name'].value_counts()[:n].index.tolist()
to get top 5:
dataframe['name'].value_counts()[0:5]
To get the top five most common names:
dataframe['name'].value_counts().head()
df['name'].value_counts()[:5].sort_values(ascending=False)
The value_counts
will return a count object of pandas.core.series.Series
and sort_values(ascending=False)
will get you the highest values first.
my best solution to get the first is
df['my_column'].value_counts().sort_values(ascending=False).argmax()
Simply use this..
dataframe['name'].value_counts().nlargest(n)
The functions for frequencies largest and smallest are:
nlargest()
for mostfrequent ‘n’ valuesnsmallest()
for least frequent ‘n’ values
Use:
df['name'].mode()
or
df['name'].value_counts().idxmax()
n is used to get the number of top frequent used items
n = 2
a=dataframe['name'].value_counts()[:n].index.tolist()
dataframe["name"].value_counts()[a]
I had a similar issue best most compact answer to get lets say the top n (5 is default) most frequent values is:
df["column_name"].value_counts().head(n)
Identifying the top 5, for example, using value_counts
top5 = df['column'].value_counts()
Listing contents of ‘top_5’
top5[:5]
Getting top 5 most common lastname pandas:
df['name'].apply(lambda name: name.split()[-1]).value_counts()[:5]
It will give top five most common names:
df['name'].value_counts().nlargest(5)