How to parse custom operators inside a evaluable python string?

Question:

Having a formula as a string like:

str_forumla = "x > 0 AND y < 5 AND 'this AND that' in my_string"

Where the python operator "&" have been substituted by "AND" (but the string AND was not affected) how to revert the operation so we get the original formula:

python_formula = "x > 0 & y < 5 AND 'this AND that' in my_string"

I already got a terrible looking solution (that works both for ‘ and " strings), but as you may seen it is not elegant at all, and I was wondering if there is any easier way of doing this, maybe using ast or any kind of trick assigning the operator to the variable "AND" (as the goal is to evaluate the expresion).

formula = 'x > 0 AND y < 5 AND "this AND that" in my_string'

# invert the "AND" operators outside of quotes
inverted_formula = ''
in_quote = False
i = 0
while i < len(formula):
    # check if we are inside a quote
    if formula[i] == "'" or formula[i] == '"':
        inverted_formula += formula[i]
        in_quote = not in_quote
        i += 1
        continue

    # check if we have an "AND" operator outside of quotes
    if formula[i:i+3] == "AND" and not in_quote:
        inverted_formula += "&"
        i += 3
        continue

    # copy the current character to the inverted formula string
    inverted_formula += formula[i]
    i += 1

print(inverted_formula)

Another option (this not fully functional yet cause only support ") would be with regex:

pattern = re.compile(r'bANDb(?=([^"]*"[^"]*")*[^"]*$)')
inverted_formula = re.sub(pattern, '&', formula)

But I’m looking for an easier solution that works for single and double quoted strings as I may need to change more operators like OR AND NOT for |,~, and I’m not sure that my solution will work in more complex cases, any idea?

Harder examples:

f = "x > 0 AND y < 5 AND 'this AND tha't' in my_string AND 'this AND tha't' in my_string"
f2 = 'x > 0 AND y < 5 AND "this AND tha't" in my_string AND "this AND tha't" in my_string'
f3 = "'this AND tha't' in my_string AND 'this AND that' in my_string"
Asked By: Ziur Olpa

||

Answers:

Here is a simple example working for the proposed string.

f = "x > 0 AND y < 5 AND 'this AND that' in my_string"
q = [i for i,c in enumerate(f) if c in ["'", '"']] #quotes indexes
r = f[:q[0]].replace('AND','&') + f[q[0]:q[1]] + f[q[1]:].replace('AND','&')
print(r) #x > 0 & y < 5 & 'this AND that' in my_string

In short, you first find string quotes, then you divide the string based on the quotes position and apply str.replace() for all the blocks but the ones between the quotes indexes.

You can expand on this and generalise the solution making a proper function.

As the OP posted more complex examples, here is an adaptation of the code that solves those as well. Note that to be able to deal with escaped characters, I needed to input raw strings instead of standard ones.

examples = [
    r"x > 0 AND y < 5 AND 'this AND tha't' in my_string AND 'this AND tha't' in my_string",
    r'x > 0 AND y < 5 AND "this AND tha't" in my_string AND "this AND tha't" in my_string',
    r"'this AND tha't' in my_string AND 'this AND that' in my_string"
]

for f in examples:
    q = [i for i,c in enumerate(f) if c in ["'", '"'] and f[i-1] != '\']#qoutes indexes
    assert len(q)%2 == 0, 'SyntaxError: found uneven number of quotes'
    r = f[:q[0]].replace('AND','&')
    for i, n in enumerate(q):
        try:
            s, e = q[i], q[i+1]
            if not i%2:
                r += f[s:e]
            else:
                r += f[s:e].replace('AND','&')
        except IndexError:
            r += f[e:].replace('AND','&')
           
    print('original:', f)
    print('replaced:', r)
    print()

Output:

original: x > 0 AND y < 5 AND 'this AND tha't' in my_string AND 'this AND tha't' in my_string
replaced: x > 0 & y < 5 & 'this AND tha't' in my_string & 'this AND tha't' in my_string

original: x > 0 AND y < 5 AND "this AND tha't" in my_string AND "this AND tha't" in my_string
replaced: x > 0 & y < 5 & "this AND tha't" in my_string & "this AND tha't" in my_string

original: 'this AND tha't' in my_string AND 'this AND that' in my_string
replaced: 'this AND tha't' in my_string & 'this AND that' in my_string
Answered By: alec_djinn

At the end I created another answer that doesn’t (directly) count string characters or escape sequences, because it uses the ast module:

import ast

def separate_by_index_and_join(string_to_split: str, indices: list[int], join:str)-> str:
    return join.join([string_to_split[0:indices[0]]] + [string_to_split[i+1:j] for i,j in zip(indices, indices[1:]+[None])])

def formula_operator_replace(formula: str, custom_operator: str, python_native_operator: str):
    centinel = " | "if python_native_operator != "|" else " & "
    python_valid_formula = ast.parse(formula.replace(f" {custom_operator} ",f" {python_native_operator} "))
    python_valid_formula2 = ast.parse(formula.replace(f" {custom_operator} ",centinel))

    for node, node2 in zip(ast.walk(python_valid_formula), ast.walk(python_valid_formula2)):
        if isinstance(node, ast.Constant):
            if f" {python_native_operator} " in str(node.value):
                # element wise compare two strings of same length
                split_index = [i for i, zipi in enumerate(zip(node.value, node2.value)) if zipi[0]!=zipi[1]]
                # separate by index and join
                node.value = custom_operator.join([node.value[0:split_index[0]]] + [node.value[i+1:j] for i,j in zip(split_index, split_index[1:]+[None])])

    return ast.unparse(python_valid_formula)

Basically we replace the custom operator for the one we need and then using the ast built-in module we look for strings on the expression to revert the replaced changes by using a sentinel value to compare.

formula = r"x > 0 AND y < 5 AND 'this & AND tha't' in my_string and 'this AND tha't' in my_string"
formula = 'x > 0 AND y < 5 AND "this AND tha't" in my_string AND "this AND tha't" in my_string'
formula_operator_replace(formula, "AND", "&")

'x > 0 & y < 5 & "this AND tha't" in my_string & "this AND tha't" in my_string'
Answered By: Ziur Olpa