Regex that captures and filters the "steps" strings that have only one sole number at the early part
Question:
So I have a pandas.Series as such
s = pd.Series(['1-Onboarding + Retorno', '1.1-Onboarding escolha de bot',
'2-Seleciona produto', '3-Informa localizacao e cpf',
'3.1-CPF valido (V.2.0)', '3.2-Obtencao de CEP'],name = 'Steps')
0 1-Onboarding + Retorno
1 1.1-Onboarding escolha de bot
2 2-Seleciona produto
3 3-Informa localizacao e cpf
4 3.1-CPF valido (V.2.0)
5 3.2-Obtencao de CEP
The idea here is to "filter" the df so I gather only the strings with the a unique number.
s = pd.Series(['1-Onboarding + Retorno',
'2-Seleciona produto', '3-Informa localizacao e cpf'],name = 'Steps')
0 1-Onboarding + Retorno
1 2-Seleciona produto
2 3-Informa localizacao e cpf
Name: Steps, dtype: object
Any ideas on how I could do that? I am having difficulties formulating the regex. I know I should use to formulate such filter in Pandas.
s.str.contains('',regex = True)
Answers:
We can use str.contains
here:
df_out = s[s["Steps"].str.contains(r'^d+-', regex=True)]
The resulting output data frame df_out
will contain only steps value which begin with a major version (integer) number.
you can use this
l=[]
for i in range(len(s)):
if '-' in s[i] and '.' not in s[i] :
l.append(s[i])
new_s= pd.Series(l)
So I have a pandas.Series as such
s = pd.Series(['1-Onboarding + Retorno', '1.1-Onboarding escolha de bot',
'2-Seleciona produto', '3-Informa localizacao e cpf',
'3.1-CPF valido (V.2.0)', '3.2-Obtencao de CEP'],name = 'Steps')
0 1-Onboarding + Retorno
1 1.1-Onboarding escolha de bot
2 2-Seleciona produto
3 3-Informa localizacao e cpf
4 3.1-CPF valido (V.2.0)
5 3.2-Obtencao de CEP
The idea here is to "filter" the df so I gather only the strings with the a unique number.
s = pd.Series(['1-Onboarding + Retorno',
'2-Seleciona produto', '3-Informa localizacao e cpf'],name = 'Steps')
0 1-Onboarding + Retorno
1 2-Seleciona produto
2 3-Informa localizacao e cpf
Name: Steps, dtype: object
Any ideas on how I could do that? I am having difficulties formulating the regex. I know I should use to formulate such filter in Pandas.
s.str.contains('',regex = True)
We can use str.contains
here:
df_out = s[s["Steps"].str.contains(r'^d+-', regex=True)]
The resulting output data frame df_out
will contain only steps value which begin with a major version (integer) number.
you can use this
l=[]
for i in range(len(s)):
if '-' in s[i] and '.' not in s[i] :
l.append(s[i])
new_s= pd.Series(l)