Hierarchical string delimiting not splitting

Question:

I am trying to create a function to parse a string based on multiple delimiters, but in a hierarchical format: i.e., try the first delimiter, then the second, then the third, etc.

This question seemingly provides a solution, specifically linking this comment.

# Split the team names, with a hierarchical delimiter
def split_new(inp, delims=['VS', '/ ' ,'/']):
    # https://stackoverflow.com/questions/67574893/python-split-string-by-multiple-delimiters-following-a-hierarchy
    for d in delims:
        result = inp.split(d, maxsplit=1)
        if len(result) == 2: 
            return result
        else:
            return [inp] # If nothing worked, return the input  

test_strs = ['STACK/ OVERFLOW', 'STACK #11/00 VS OVERFLOW', 'STACK/OVERFLOW' ]

for ts in test_strs:
    res = split_new(ts)
    print(res)

"""
Output:
['STACK/ OVERFLOW']
['STACK #11/00 ', ' OVERFLOW']
['STACK/OVERFLOW']

Expected:
['STACK',' OVERFLOW']
['STACK #11/00 ', ' OVERFLOW']
['STACK', 'OVERFLOW']

"""

However, my results are not as expected. What am I missing?

Asked By: artemis

||

Answers:

Execute the "nothing worked" fallback AFTER trying all delimiters:

for d in delims:
    result = inp.split(d, maxsplit=1)
    if len(result) == 2: 
        return result
return [inp] # If nothing worked, return the input  
Answered By: VPfB

As an alternative, instead of looping the delimiters, you might use a single pattern with an alternation |

import re

test_strs = ['STACK/ OVERFLOW', 'STACK #11/00 VS OVERFLOW', 'STACK/OVERFLOW' ]
pattern = r"/(?!d)|VS"
for s in test_strs:
    print(re.split(pattern, s))

Output

['STACK', ' OVERFLOW']
['STACK #11/00 ', ' OVERFLOW']
['STACK', 'OVERFLOW']
Answered By: The fourth bird

This is because you try to return result on first iteration of loop
when there is split for ‘VS’ you return result using else statement

right way of doing it is:

def split_new(inp, delims=['VS', '/ ' ,'/']):
    for d in delims:
        result = inp.split(d, maxsplit=1)
        if len(result) == 2: 
            return result
        
    return [inp] # If nothing worked, return the input  

test_strs = ['STACK/ OVERFLOW', 'STACK #11/00 VS OVERFLOW', 'STACK/OVERFLOW' ]

for ts in test_strs:
    res = split_new(ts)
    print(res)
Answered By: sameer aggarwal
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.