string parsing/replace in python(Using reg expression)

Question:

I am trying to write a parser: convert string to query format. And stuck at a particular point of string replace (by matching a pattern).

I can’t figure out the regular expression pattern matching.

I have a input_string like

ip_query_string = "CITY == 'Mumbai' & LOCATION in ['Harrys Bar & Cafe: Mumbai','Hard Rock Cafe in Mumbai']"

#Mark the """& after CITY == """ and  """ in after LOCATION""".
#Then there is another "& and a string ' in '" inside values for in-condition.

#My output should be:
op_query_string = "CITY == 'Mumbai' AND LOCATION IN ['Harrys Bar & Cafe: Mumbai','Hard Rock Cafe in Mumbai']"

#if i will find  ' & ' or a ' in ' (before and after there are spaces):: I have to replace them with ' AND ' and ' IN ' respectively.(In this case a ip_string.replace(' & ', ' AND ').replace(' in ', ' In ')) would work.BUT read next point.
#And if they are inside a in-condition values like 'Harrys Bar & Cafe: Mumbai','Hard Rock Cafe in Mumbai' then don't replace them. keep them as is.
#If you look at op_string in-condition, the & and in are not replaced.

Please help in forming a logic.

Or what will be the reg pattern (if & or in enclosed in single quotes along with other character, don’t replace, else replace)?

Asked By: user7079832

||

Answers:

Got it worked in some odd way (may not be pythonic), but it worked.

def rplc_str(s):
   sp = s.split("'")
   print('After split==',sp)
   sp1 = [x.replace(' & ', ' AND ') if ((x.startswith(' &')) or (x.startswith('] &'))) else x for x in sp]
   print('After replacing & ==',sp1)
   sp2 = [x.replace(' in ', ' IN ') if x.endswith(' [') else x for x in sp1]
   print('After replacing in ==',sp1)
   return "'".join(sp2)

ip_str = "CITY == 'Mumbai' & LOCATION in ['Harrys Bar & Cafe: Mumbai','Hard Rock Cafe in Mumbai']"
op_str = rplc_str(ip_str)
print(op_str)
#After split== ['CITY == ', 'Mumbai', ' & LOCATION in [', 'Harrys Bar & Cafe: Mumbai', ',', 'Hard Rock Cafe in Mumbai', ']']
#After replace & == ['CITY == ', 'Mumbai', ' AND LOCATION in [', 'Harrys Bar & Cafe: Mumbai', ',', 'Hard Rock Cafe in Mumbai', ']']
#After replace in == ['CITY == ', 'Mumbai', ' AND LOCATION in [', 'Harrys Bar & Cafe: Mumbai', ',', 'Hard Rock Cafe in Mumbai', ']']
#CITY == 'Mumbai' AND LOCATION IN ['Harrys Bar & Cafe: Mumbai','Hard Rock Cafe in Mumbai']

Hope it helps someone, but still waiting for some pythonic answers (I mean reg expr.)

Answered By: user7079832

Short solution using re.sub() function:

import re

ip_query_string = "CITY == 'Mumbai' & LOCATION in ['Harrys Bar & Cafe: Mumbai','Hard Rock Cafe in Mumbai']"
op_query_string  = re.sub(r'^([^[]+?)(in)', r'1IN', re.sub(r'^([^[]+?)(&)', r'1AND', ip_query_string))

print(op_query_string)

The output:

CITY == 'Mumbai' AND LOCATION IN ['Harrys Bar & Cafe: Mumbai','Hard Rock Cafe in Mumbai']
Answered By: RomanPerekhrest

Here is an approach building off of @RomanPerekherst‘s answer. This approach will first remove the arguments you don’t want altered, make the alterations, and finally put the string back together. With this solution the regex won’t alter parts of the string that it shouldn’t.

import re

string =  "CITY == 'Mumbai in Goa' & LOCATION in ['Harrys Bar & Cafe: Mumbai','Hard Rock Cafe in Mumbai']"

# Arguments not meant to be altered by the regex
arguments = [x[0] for x in re.findall(r'('([a-zA-Z:&]+s?)+')+',string)]

# The string without the arguments
negated = re.sub(r'('([a-zA-Z:&]+s?)+')+','{}',string)

# The altered string using @RomanPerekherst's regex solution
converted = re.sub(r'^([^[]+?)(in)', r'1IN', re.sub(r'^([^[]+?)(&)', r'1AND', negated))

#Unpacking the arguments back into the altered string
new_string = converted.format(*arguments)
print(new_string)

The output:

CITY == 'Mumbai in Goa' AND LOCATION IN ['Harrys Bar & Cafe: Mumbai','Hard Rock Cafe in Mumbai']
Answered By: E Joseph
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.