How to get the unique string values in a column with numbers and characters in python
Question:
I want to print the unique values taken in this column and not the numerical ones.
I only want to output the values taken before the special characters (when there is one) and I don’t want the second part of the string.
For example for the row "lala :59 lzenvke" I don’t want to take into account "lzenvke" but only "lala"
import pandas as pd
data1 = {
'column_with_names': ['lala :56 javcejhv', 'lala56 : javcejhv' 'li :lo 7TUF', 'lo','lala :59 lzenvke','la','lala','lalalo'],
}
df1 = pd.DataFrame(data1)
print(df1)
the expected output would be:
Answers:
here is one way about it
Assumption: rows that don’t have : are also included in the result set
import numpy as np
# split the values on colon (:), limited to 1 split, and form list (with expand)
# take the first element
# find unique using np.unique
# finally create a DF
pd.DataFrame(np.unique(df['column_with_names'].str.split(r'[s|:]', 1, expand=True)[0]))
0
0 la
1 lala
2 lala
3 lalalo
4 li
5 lo
if you only need to consider the rows with the colon in it
# same as above, except filter out the rows with colon beforehand
(pd.DataFrame(
np.unique(df.loc[df['column_with_names'].str.contains(':')]['column_with_names']
.str.split('[s|:]', 1, expand=True)[0])))
0
0 lala
1 li
I want to print the unique values taken in this column and not the numerical ones.
I only want to output the values taken before the special characters (when there is one) and I don’t want the second part of the string.
For example for the row "lala :59 lzenvke" I don’t want to take into account "lzenvke" but only "lala"
import pandas as pd
data1 = {
'column_with_names': ['lala :56 javcejhv', 'lala56 : javcejhv' 'li :lo 7TUF', 'lo','lala :59 lzenvke','la','lala','lalalo'],
}
df1 = pd.DataFrame(data1)
print(df1)
the expected output would be:
here is one way about it
Assumption: rows that don’t have : are also included in the result set
import numpy as np
# split the values on colon (:), limited to 1 split, and form list (with expand)
# take the first element
# find unique using np.unique
# finally create a DF
pd.DataFrame(np.unique(df['column_with_names'].str.split(r'[s|:]', 1, expand=True)[0]))
0
0 la
1 lala
2 lala
3 lalalo
4 li
5 lo
if you only need to consider the rows with the colon in it
# same as above, except filter out the rows with colon beforehand
(pd.DataFrame(
np.unique(df.loc[df['column_with_names'].str.contains(':')]['column_with_names']
.str.split('[s|:]', 1, expand=True)[0])))
0
0 lala
1 li