Find the unique values in a column and then sort them

Question:

I have a pandas dataframe. I want to print the unique values of one of its columns in ascending order. This is how I am doing it:

import pandas as pd
df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
a = df['A'].unique()
print a.sort()

The problem is that I am getting a None for the output.

Asked By: MAS

||

Answers:

sort sorts inplace so returns nothing:

In [54]:
df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
a = df['A'].unique()
a.sort()
a

Out[54]:
array([1, 2, 3, 6, 8], dtype=int64)

So you have to call print a again after the call to sort.

Eg.:

In [55]:
df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
a = df['A'].unique()
a.sort()
print(a)

[1 2 3 6 8]
Answered By: EdChum

sorted(iterable): Return a new sorted list from the items in iterable.


CODE

import pandas as pd
df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
a = df['A'].unique()
print(sorted(a))

OUTPUT

[1, 2, 3, 6, 8]
Answered By: Vineet Kumar Doshi

I would suggest using numpy’s sort, as it is anyway what pandas is doing in background:

import numpy as np
np.sort(df.A.unique())

But doing all in pandas is valid as well.

Answered By: Challensois

You can also use the drop_duplicates() instead of unique()

df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
a = df['A'].drop_duplicates()
a.sort()
print a
Answered By: Meloun

I prefer the oneliner:

print(sorted(df['Column Name'].unique()))
Answered By: MDMoore313

Came across the question myself today. I think the reason that your code returns ‘None’ (exactly what I got by using the same method) is that

a.sort()

is calling the sort function to mutate the list a. In my understanding, this is a modification command. To see the result you have to use print(a).

My solution, as I tried to keep everything in pandas:

pd.Series(df['A'].unique()).sort_values()
Answered By: Bowen Liu

Another way is using set data type.

Some characteristic of Sets: Sets are unordered, can include mixed data types, elements in a set cannot be repeated, are mutable.

Solving your question:

df = pd.DataFrame({'A':[1,1,3,2,6,2,8]})
sorted(set(df.A))

The answer in List type:

[1, 2, 3, 6, 8]

Fastest code

for large data frames:

df['A'].drop_duplicates().sort_values()
Answered By: Serge Stroobandt

Surprised no one suggested this:

df['A'].sort_values().unique()
Answered By: russhoppa