Regex that captures and filters the "steps" strings that have only one sole number at the early part

Question:

So I have a pandas.Series as such

s = pd.Series(['1-Onboarding + Retorno', '1.1-Onboarding escolha de bot',
                  '2-Seleciona produto', '3-Informa localizacao e cpf',
                  '3.1-CPF valido (V.2.0)', '3.2-Obtencao de CEP'],name = 'Steps')

0           1-Onboarding + Retorno
1    1.1-Onboarding escolha de bot
2              2-Seleciona produto
3      3-Informa localizacao e cpf
4           3.1-CPF valido (V.2.0)
5              3.2-Obtencao de CEP

The idea here is to "filter" the df so I gather only the strings with the a unique number.

s = pd.Series(['1-Onboarding + Retorno',
                  '2-Seleciona produto', '3-Informa localizacao e cpf'],name = 'Steps')

0         1-Onboarding + Retorno
1            2-Seleciona produto
2    3-Informa localizacao e cpf
Name: Steps, dtype: object

Any ideas on how I could do that? I am having difficulties formulating the regex. I know I should use to formulate such filter in Pandas.

s.str.contains('',regex = True) 
Asked By: INGl0R1AM0R1

||

Answers:

We can use str.contains here:

df_out = s[s["Steps"].str.contains(r'^d+-', regex=True)]

The resulting output data frame df_out will contain only steps value which begin with a major version (integer) number.

Answered By: Tim Biegeleisen

you can use this

l=[]
for i in range(len(s)):
        if '-' in s[i] and '.' not in s[i] :
            l.append(s[i])
new_s= pd.Series(l)
Answered By: phœnix
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.