Uncomfortable output of mode() in pandas Dataframe
Question:
I have a dataframe with several columns (the features).
>>> print(df)
col1 col2
a 1 1
b 2 2
c 3 3
d 3 2
I would like to compute the mode of one of them. This is what happens:
>>> print(df['col1'].mode())
0 3
dtype: int64
I would like to output simply the value 3
.
This behavoiur is quite strange, if you consider that the following very similar code is working:
>>> print(df['col1'].mean())
2.25
So two questions: why does this happen? How can I obtain the pure mode value as it happens for the mean?
Answers:
Because Series.mode() can return multiple values:
consider the following DF:
In [77]: df
Out[77]:
col1 col2
a 1 1
b 2 2
c 3 3
d 3 2
e 2 3
In [78]: df['col1'].mode()
Out[78]:
0 2
1 3
dtype: int64
From docstring:
Empty if nothing occurs at least 2 times. Always returns Series
even if only one value.
If you want to chose the first value:
In [83]: df['col1'].mode().iloc[0]
Out[83]: 2
In [84]: df['col1'].mode()[0]
Out[84]: 2
I agree that it’s too cumbersome
df[‘col1’].mode().iloc[0].values[0]
mode()
will return all values that tie for the most frequent value.
In order to support that functionality, it must return a collection, which takes the form of a dataFrame
or Series.
For example, if you had a series:
[2, 2, 3, 3, 5, 5, 6]
Then the most frequent values occur twice. The result would then be the series [2, 3, 5]
since each of these occur twice.
If you want to collapse this into a single value, you can access the first value, compute the max()
, min()
, or whatever makes most sense for your application.
a series can have one mean(), but a series can have more than one mode()
like
<2,2,2,3,3,3,4,4,4,5,6,7,8> its mode 2,3,4.
the output must be indexed
I have a dataframe with several columns (the features).
>>> print(df)
col1 col2
a 1 1
b 2 2
c 3 3
d 3 2
I would like to compute the mode of one of them. This is what happens:
>>> print(df['col1'].mode())
0 3
dtype: int64
I would like to output simply the value 3
.
This behavoiur is quite strange, if you consider that the following very similar code is working:
>>> print(df['col1'].mean())
2.25
So two questions: why does this happen? How can I obtain the pure mode value as it happens for the mean?
Because Series.mode() can return multiple values:
consider the following DF:
In [77]: df
Out[77]:
col1 col2
a 1 1
b 2 2
c 3 3
d 3 2
e 2 3
In [78]: df['col1'].mode()
Out[78]:
0 2
1 3
dtype: int64
From docstring:
Empty if nothing occurs at least 2 times. Always returns Series
even if only one value.
If you want to chose the first value:
In [83]: df['col1'].mode().iloc[0]
Out[83]: 2
In [84]: df['col1'].mode()[0]
Out[84]: 2
I agree that it’s too cumbersome
df[‘col1’].mode().iloc[0].values[0]
mode()
will return all values that tie for the most frequent value.
In order to support that functionality, it must return a collection, which takes the form of a dataFrame
or Series.
For example, if you had a series:
[2, 2, 3, 3, 5, 5, 6]
Then the most frequent values occur twice. The result would then be the series [2, 3, 5]
since each of these occur twice.
If you want to collapse this into a single value, you can access the first value, compute the max()
, min()
, or whatever makes most sense for your application.
a series can have one mean(), but a series can have more than one mode()
like
<2,2,2,3,3,3,4,4,4,5,6,7,8> its mode 2,3,4.
the output must be indexed