string splitting with multiple words

Question:

I want to find a way to split a string into a list but still have elements with more than one word in some of them. In my case, is to split the string as normal if the word is not between parentheses else it should not be split (e.g (word1 word2) word3 (word4 word5) should result in [‘word1 word2’, ‘word3’, ‘word4 word5’].
I found a regex pattern to get all elements between parentheses then added to it some loops and this is what I got to so far.

import re

def get_queries(s):
    parentheses_queries = re.findall('((.*?))', s)

    if not parentheses_queries:
        return s.split()
    for q in parentheses_queries:
        if f'({q})' in s:
            s = s.replace(q, '')
    
    queries = s.strip().split()
    
    i = 0
    while '()' in queries:
        queries[queries.index('()')] = parentheses_queries[i]
        i += 1
    return queries
    
s = '(word1 word2) word3 (word4 word5)'
print(get_queries(s))

But I think there’s a much efficient way to do it. Any Ideas?

Asked By: Oussama Blgrim

||

Answers:

Using the regex from How to split by commas that are not within parentheses? you could do:

s = '(word1 word2) word3 (word4 word5)'
s_split = [x.replace("(", "").replace(")", "") for x in re.split(r" s*(?![^()]*))", s)]

However it depends on how you want to handle nested parenthesis (if you have any in your text)

Answered By: Nik
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.