Getting a list of unique values within a pandas column
Question:
Can you please help me with the following issue. Imagine, I have a following df:
data = {
'A':['A1, B2, C', 'A2, A9, C', 'A3', 'A4, Z', 'A5, A1, Z'],
'B':['B1', 'B2', 'B3', 'B4', 'B4'],
}
df = pd.DataFrame(data)
How can I create a list with unique value that are stored in column ‘A’? I want to smth like this:
list_A = [A1, B2, C, A2, A9, A3, A4, Z, A5]
Answers:
The code applies a lambda function to the ‘A’ column to remove any white spaces from the strings in the lists.
Next, the code uses the str.split() method to split the strings in the ‘A’ column by the delimiter ‘,’, resulting in a columns of lists.
Finally, the code uses a list comprehension to flatten the list of lists into a single list, and then uses the set() function to create a set object containing the unique elements of the list. The set object is then printed to the console.
Converting column A
to the desired list (new column C
). In this case, instead of 'A1, B2, C'
, we will have ['A1', 'B2', 'C']
.
df['C'] = df['A'].str.split(',s*')
.str
is used to convert the column into a string in case it is not. .split(',s*')
will split the string wherever it observes a comma (,
) or a comma and some spaces (s*
) after that.
Finding the sorted unique values of the converted column:
set(df['C'].explode())
# {'A1', 'A2', 'A3', 'A4', 'A5', 'A9', 'B2', 'C', 'Z'}
If sorting is not important, and you want to see them in the order of their appearance:
list(df['C'].explode().unique())
# ['A1', 'B2', 'C', 'A2', 'A9', 'A3', 'A4', 'Z', 'A5']
Can you please help me with the following issue. Imagine, I have a following df:
data = {
'A':['A1, B2, C', 'A2, A9, C', 'A3', 'A4, Z', 'A5, A1, Z'],
'B':['B1', 'B2', 'B3', 'B4', 'B4'],
}
df = pd.DataFrame(data)
How can I create a list with unique value that are stored in column ‘A’? I want to smth like this:
list_A = [A1, B2, C, A2, A9, A3, A4, Z, A5]
The code applies a lambda function to the ‘A’ column to remove any white spaces from the strings in the lists.
Next, the code uses the str.split() method to split the strings in the ‘A’ column by the delimiter ‘,’, resulting in a columns of lists.
Finally, the code uses a list comprehension to flatten the list of lists into a single list, and then uses the set() function to create a set object containing the unique elements of the list. The set object is then printed to the console.
Converting column A
to the desired list (new column C
). In this case, instead of 'A1, B2, C'
, we will have ['A1', 'B2', 'C']
.
df['C'] = df['A'].str.split(',s*')
.str
is used to convert the column into a string in case it is not. .split(',s*')
will split the string wherever it observes a comma (,
) or a comma and some spaces (s*
) after that.
Finding the sorted unique values of the converted column:
set(df['C'].explode())
# {'A1', 'A2', 'A3', 'A4', 'A5', 'A9', 'B2', 'C', 'Z'}
If sorting is not important, and you want to see them in the order of their appearance:
list(df['C'].explode().unique())
# ['A1', 'B2', 'C', 'A2', 'A9', 'A3', 'A4', 'Z', 'A5']