How to store string in quotation that contains two words?

Question:

I wrote the search code and I want to store what is between " " as one place in the list, how I may do that? In this case, I have 3 lists but the second one should is not as I want.

import re

message='read read read'

others = ' '.join(re.split('(.*)', message))
others_split = others.split()

to_compile = re.compile('.*((.*)).*')
to_match = to_compile.match(message)
ors_string = to_match.group(1)

should = ors_string.split(' ')

must = [term for term in re.findall(r'(.*?)|(-?(?:".*?"|w+))', message) if term and not term.startswith('-')]

must_not = [term for term in re.findall(r'(.*?)|(-?(?:".*?"|w+))', message) if term and term.startswith('-')]
must_not = [s.replace("-", "") for s in must_not]

print(f'must: {must}')
print(f'should: {should}')
print(f'must_not: {must_not}')

Output:

must: ['read', '"find find"', 'within', '"plane"']
should: ['"exactly', 'needed"', 'empty']
must_not: ['russia', '"destination good"']

Wanted result:

must: ['read', '"find find"', 'within', '"plane"']
should: ['"exactly needed"', 'empty'] <---
must_not: ['russia', '"destination good"']

Error when edited the message, how to handle it?

Traceback (most recent call last):
    ors_string = to_match.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
Asked By: Mia Lexa

||

Answers:

Your should list splits on whitespace: should = ors_string.split(' '), this is why the word is split in the list. The following code gives you the output you requested but I’m not sure that is solves your problem for future inputs.

import re

message = 'read "find find":within("exactly needed" OR empty) "plane" -russia -"destination good"'

others = ' '.join(re.split('(.*)', message))
others_split = others.split()

to_compile = re.compile('.*((.*)).*')
to_match = to_compile.match(message)
ors_string = to_match.group(1)

# Split on OR instead of whitespace.
should = ors_string.split('OR')
to_remove_or = "OR"
while to_remove_or in should:
    should.remove(to_remove_or)

# Remove trailing whitespace that is left after the split.
should = [word.strip() for word in should]

must = [term for term in re.findall(r'(.*?)|(-?(?:".*?"|w+))', message) if term and not term.startswith('-')]

must_not = [term for term in re.findall(r'(.*?)|(-?(?:".*?"|w+))', message) if term and term.startswith('-')]
must_not = [s.replace("-", "") for s in must_not]

print(f'must: {must}')
print(f'should: {should}')
print(f'must_not: {must_not}')

Answered By: StianBot
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.