Regex split the string at n but skip the first one if it is nn

Question:

I want to split some strings on Python by separating at n and use them in that format, but some of those strings have unexpected newlines and I want to ignore them.

TO CLARIFY: Both examples have only one string.

For example this is a regular string with no unexpected newlines:

Step 1
Cut peppers into strips.
Step 2
Heat a non-stick skillet over medium-high heat. Add peppers and cook on stove top for about 5 minutes.
Step 3
Toast the wheat bread and then spread hummus, flax seeds, and spinach on top
Step 4
Lastly add the peppers. Enjoy!

but some of them are like this:

Step 1
Using a fork, mash up the tuna really well until the consistency is even.

Step 2
Mix in the avocado until smooth.

Step 3
Add salt and pepper to taste. Enjoy!

I have to say I am new at regex and if the solution is obvious, please forgive

Edit: Here is my regex

    stepOrder = []
    # STEPS
    txtSteps = re.split("n",directions.text)
    listOfLists = [[] for i in range(len(txtSteps)) if i % 2 == 0]
    for i in range(len(listOfLists)):
        listOfLists[i] = [txtSteps[i*2],txtSteps[i*2+1]]
    recipe["steps"] = listOfLists
    print(listOfLists)

directions.text is every one of these examples I gave. I can share what it is too, but I think it’s irrelevant.

Asked By: Burak Saraç

||

Answers:

f = open("your_file_name")
content = f.read()
f.close()

for line in content.split("n"):
    if re.match("^&",line):
        continue
    print(line)
Answered By: hofe

You can solve this problem by splitting on the following regex:

(?<=dn).*

Basically it will get any character in the same line .* which is preceeded by one digit d and one new line character n.

Check the regex demo here.


Your whole Python snippet then becomes simplified as follows using the re.findall method:

# STEPS
steps = re.findall("(?<=dn).*", directions.text)
out = [[{'order':i+1, 'step': step}] for i, step in enumerate(steps)]
Answered By: lemon
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.