How to get the unique string values in a column with numbers and characters in python

Question:

I want to print the unique values taken in this column and not the numerical ones.
I only want to output the values taken before the special characters (when there is one) and I don’t want the second part of the string.
For example for the row "lala :59 lzenvke" I don’t want to take into account "lzenvke" but only "lala"

import pandas as pd



data1 = {
    'column_with_names': ['lala :56 javcejhv', 'lala56 : javcejhv' 'li :lo 7TUF', 'lo','lala :59 lzenvke','la','lala','lalalo'],

}

df1 = pd.DataFrame(data1)

print(df1)

the expected output would be:

enter image description here

Asked By: yoopiyo

||

Answers:

here is one way about it

Assumption: rows that don’t have : are also included in the result set

import numpy as np

# split the values on colon (:), limited to 1 split, and form list (with expand)
# take the first element
# find unique using np.unique
# finally create a DF


pd.DataFrame(np.unique(df['column_with_names'].str.split(r'[s|:]', 1, expand=True)[0]))
    0
0   la
1   lala
2   lala
3   lalalo
4   li
5   lo

if you only need to consider the rows with the colon in it

# same as above, except filter out the rows with colon beforehand
(pd.DataFrame(
    np.unique(df.loc[df['column_with_names'].str.contains(':')]['column_with_names']
              .str.split('[s|:]', 1, expand=True)[0])))
    0
0   lala
1   li
Answered By: Naveed