Getting a list of unique values within a pandas column

Question

Can you please help me with the following issue. Imagine, I have a following df:

data = {
    'A':['A1, B2, C', 'A2, A9, C', 'A3', 'A4, Z', 'A5, A1, Z'], 
    'B':['B1', 'B2', 'B3', 'B4', 'B4'], 
}
df = pd.DataFrame(data)

How can I create a list with unique value that are stored in column ‘A’? I want to smth like this:

 list_A = [A1, B2, C, A2, A9, A3, A4, Z, A5]

Asked By: Alberto Alvarez

||

Source

Answer 1

Assuming you define as "values" the comma separated substrings, you can split, explode, and use unique:

list_A = df['A'].str.split(',s*').explode().unique().tolist()

Output: ['A1', 'B2', 'C', 'A2', 'A9', 'A3', 'A4', 'Z', 'A5']

Answered By: mozway

Answer 2

The code applies a lambda function to the ‘A’ column to remove any white spaces from the strings in the lists.

Next, the code uses the str.split() method to split the strings in the ‘A’ column by the delimiter ‘,’, resulting in a columns of lists.

Finally, the code uses a list comprehension to flatten the list of lists into a single list, and then uses the set() function to create a set object containing the unique elements of the list. The set object is then printed to the console.

Answered By: Ludo Schmidt

Answer 3

Converting column A to the desired list (new column C). In this case, instead of 'A1, B2, C', we will have ['A1', 'B2', 'C'].

df['C'] = df['A'].str.split(',s*')

.str is used to convert the column into a string in case it is not. .split(',s*') will split the string wherever it observes a comma (,) or a comma and some spaces (s*) after that.

Finding the sorted unique values of the converted column:

set(df['C'].explode())
# {'A1', 'A2', 'A3', 'A4', 'A5', 'A9', 'B2', 'C', 'Z'}

If sorting is not important, and you want to see them in the order of their appearance:

list(df['C'].explode().unique())
# ['A1', 'B2', 'C', 'A2', 'A9', 'A3', 'A4', 'Z', 'A5']

Answered By: Hadij

Getting a list of unique values within a pandas column

Question:

Answers: