How to select text ignoring line breaks

Question:

We have this project we’re working on with the Bible stored as a text file. I’m finding problems with selecting sentences, because there’s line breaks (every sentence ends with a period).

Example from the file:

1:9 And God said, Let the waters under the heaven be gathered together
unto one place, and let the dry land appear: and it was so.

1:10 And God called the dry land Earth; and the gathering together of
the waters called he Seas: and God saw that it was good.

1:11 And God said, Let the earth bring forth grass, the herb yielding
seed, and the fruit tree yielding fruit after his kind, whose seed is
in itself, upon the earth: and it was so.

But my code works line by line, and I don’t know how to do it in another way.

Here’s my code:

import re

with open("bible.txt") as data:
    for line in data:
        y=re.findall(r"(^.[0-9]:.[0-9].*.)", line)
        print(y)
Asked By: Rahma Begag

||

Answers:

The simplest way to search the whole Bible, ignoring the newlines, is to read the whole file into one string and replace the newlines with spaces.

import re

with open("bible.txt") as data:
    bible = data.read().replace('n', ' ')

You’re then going to run into some issues with your regular expression, primarily the fact that ^ matches only at the very beginning of the string, and .* is greedy, meaning it will gobble up as much as possible; in this case it would match the whole Bible instead of just the first sentence. The non-greedy version is .*?. You can also replace [0-9] with the shortened d, and use {1,2} to specify that you want to match either one or two digits. With that, your code would look like this:

import re

with open("bible.txt") as data:
    bible = data.read().replace('n', ' ')

sentences = re.findall(r"(d{1,2}:d{1,2}.*?.)", bible)
# Printing only the first few sentences, since there will be a LOT of
# them.
print(sentences[:10])

Now, you are going to have some other corner cases to think about as you work on this assignment; here are some of them I foresee. I see you’re using the NKJV translation, so I’ll quote from that as well.

What about sentences that end in something other than a period, for example a quotation mark, exclamation mark, or question mark?

Luke 1:25 "But behold, you will be mute and not able to speak until the day these things take place, because you did not believe my words which will be fulfilled in their own time."

What happens when you encounter a verse that contains more than one sentence?

Matthew 9:9 As Jesus passed on from there, He saw a man named Matthew sitting at the tax office. And He said to him, "Follow Me." So he arose and followed Him.

What about a single sentence that spans multiple verses?

John 2:24 But Jesus did not commit Himself to them, because He knew all men, 25 and had no need that anyone should testify of man, for He knew what was in man.

Answered By: CrazyChucky
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.