# Parsing / reformatting a tokenized list in python

## Question:

I have lists of tokens of the form

"(a OR b) AND c OR d AND c"

or

"(a OR b) AND c OR (d AND c)"

I want to reformat these tokens into a string of the form:

Expressions are of arbitrary form like (a or b) and c or (d and c).

Write python to reformat the expressions as:

{{or {and {or a b} c} {and d c}}}

I have code that works for some token lists, but not for others:

```
def parse_expression(tokens):
if len(tokens) == 1:
return tokens[0]
# Find the top-level operator (either 'and' or 'or')
parens = 0
for i in range(len(tokens) - 1, -1, -1):
token = tokens[i]
if token == ')':
parens += 1
elif token == '(':
parens -= 1
elif parens == 0 and token in {'AND', 'OR'}:
op = token
break
else:
print('Invalid expression')
# Recursively parse the sub-expressions
left_tokens = tokens[:i]
right_tokens = tokens[i+1:]
print("{i} left {left_tokens}")
print("{i} right {right_tokens}")
if op == 'AND':
left = parse_expression(left_tokens)
right = parse_expression(right_tokens)
return f'(and {left} {right})'
else:
left = parse_expression(left_tokens)
right = parse_expression(right_tokens)
return f'(or {left} {right})'
x=list()
x = ['x', 'AND', 'y', 'AND', 'z', 'AND', '(', '(', 'a', 'AND', 'b', ')', 'OR', '(', 'c', 'AND', 'd', ')', ')']
y = ['x', 'AND', 'y', 'AND', 'z', 'AND', '(', 'w', 'AND', 'q', ')']
```

It seems to work without parenthesis, but not when I use them.

When I try to reformat these with the parser, I keep getting

```
Traceback (most recent call last):
File "./prog.py", line 41, in <module>
File "./prog.py", line 29, in parse_expression
File "./prog.py", line 27, in parse_expression
UnboundLocalError: local variable 'op' referenced before assignment
```

What am I doing wrong?

## Answers:

The problem you have is with the way you evaluate sub-expressions.

Consider the following sub-expression (the rightmost part of x):

`['(', '(', 'a', 'AND', 'b', ')', 'OR', '(', 'c', 'AND', 'd', ')', ')']`

Your plan in this evaluation is to separate this out by first finding the operation, and then moving that to the front.

In this example, the operation is "OR." The way you check for this, is by seeing if the "or" is at a parentheses level of 0. However, in this case, the parentheses level is 1, since the entire thing is nested. Because of this, the variable `op`

never gets defined, and you get the error you get. This is also why `i`

does not get defined.

One way to fix this, is to try again with outer parentheses removed *if* the expression failed to work the first pass, *and* the outermost characters are opening and closing parentheses. Here is a hackfix:

```
def parse_expression(tokens):
if len(tokens) == 1:
return tokens[0]
# Find the top-level operator (either 'and' or 'or')
parens = 0
for i in range(len(tokens) - 1, -1, -1):
token = tokens[i]
if parens < 0:
print("Invalid expression")
if token == ')':
parens += 1
elif token == '(':
parens -= 1
elif parens == 0 and token in {'AND', 'OR'}:
op = token
break
else:
if tokens[0] == "(" and tokens[-1] == ")":
return parse_expression(tokens[1:-1])
else:
print('Invalid expression')
# Recursively parse the sub-expressions
left_tokens = tokens[:i]
right_tokens = tokens[i+1:]
if op == 'AND':
left = parse_expression(left_tokens)
right = parse_expression(right_tokens)
return f'(and {left} {right})'
else:
left = parse_expression(left_tokens)
right = parse_expression(right_tokens)
return f'(or {left} {right})'
```

I’m pretty sure there has to be a cleaner algorithm that can be used for this, but this, at the very least, works, and you can continue.

I hope that helps. ðŸ™‚