How to write all the sentences with the word "apple" from a txt file
Question:
I have tried this using regex, loops and simple functions as well but cannot figure out anything. here are the few codes I have tried.
import re
fp = open("apple.txt")
re.findall(r"([^.]*?apple[^.]*.)",fp)
Answers:
with open("./WABA-crash.txt", "r") as input:
content = input.read().replace('n', '')
sentences = list(map(str.strip, content.split(".")))
with open("./resultFile.txt", "w") as output:
for result in sentences:
if ('police' in result or 'Police' in result):
output.write(f'{result}. n')
There is no need for regex. It might not be the cleanest answer but it works. This is the beauty of Python. When working with files I recommend you to use with open instead of open(). Since this will automatically close the for you. Otherwise you’d need to use the close() method at the end of your file.
Hope this helps you! Enjoy
Instead of using a regex that can be quite complicated in order to be solid (you need to account many different type of potential sentence form) a good solution may be using a NLP library like NLTK or Spacy.
Here is how to tokenize with nltk:
from nltk import tokenize
with open("WABA-crash.txt") as file:
content=file.read()
sentences=tokenize.sent_tokenize(content)
police_sentences=[x for x in sentences if "police" in x]
print(police_sentences)
I have tried this using regex, loops and simple functions as well but cannot figure out anything. here are the few codes I have tried.
import re
fp = open("apple.txt")
re.findall(r"([^.]*?apple[^.]*.)",fp)
with open("./WABA-crash.txt", "r") as input:
content = input.read().replace('n', '')
sentences = list(map(str.strip, content.split(".")))
with open("./resultFile.txt", "w") as output:
for result in sentences:
if ('police' in result or 'Police' in result):
output.write(f'{result}. n')
There is no need for regex. It might not be the cleanest answer but it works. This is the beauty of Python. When working with files I recommend you to use with open instead of open(). Since this will automatically close the for you. Otherwise you’d need to use the close() method at the end of your file.
Hope this helps you! Enjoy
Instead of using a regex that can be quite complicated in order to be solid (you need to account many different type of potential sentence form) a good solution may be using a NLP library like NLTK or Spacy.
Here is how to tokenize with nltk:
from nltk import tokenize
with open("WABA-crash.txt") as file:
content=file.read()
sentences=tokenize.sent_tokenize(content)
police_sentences=[x for x in sentences if "police" in x]
print(police_sentences)