Split every occurrence of Key=Value pairs in a string where the value include one or more spaces

Question:

I have a situation where user can enter commands with optional key value pairs and value may contain spaces ..

here are 4 – different form user input where key and value are separated with = sign and values have space:

"cmd=create-folder    name=SelfServe - Test ride"

"cmd=create-folder    name=SelfServe - Test ride server=prd"

"cmd=create-folder  name=cert - Test ride   server=dev site=Service"

"cmd=create-folder   name=cert - Test ride   server=dev site=Service permission=locked"

Requirement:
I am trying to parse this string and split into a dictionary based on the key and value present on a string .

If user enter First form of Statement, that wold produce a dictionary like :

query_dict = {

'cmd' : 'create-folder',
'name' : 'selfserve - Test ride'
}

if user enter second form of statement that would produce /add the additional key /value pair

query_dict = {

'cmd' : 'create-folder',
'name' : 'selfserve - Test ride',
'server' : 'prd'

}

if user enter third form of statement that would produce

query_dict ={

'cmd' : 'create-folder',
'name' : 'cert - Test ride',
'server' : 'dev',
'site': 'Service'
}

forth form produce the dictionary with key/value split like below

query_dict ={

'cmd' : 'create-folder',
 'name' : 'cert - Test ride',
'server' : 'dev',
 'site': 'Service',
 'permission' : 'locked' }

-idea is to parse a string where key and value are separated with = symbol and where the values can have one or more space and extract the matching key /value pair .

I tried multiple methods to match but unable to figure out a single generic regular expression pattern which can match/extract any string where we have this kind of pattern

Appreciate your help.

i tried several pattern map based different possible user input but that is not a scalable approach .
example :

i created three pattern to match three variety of user input but it would be nice if i can have one generic pattern that can match any combination of key=values in a string (i am hard coding the key in the pattern which is not ideal

'(cmd=create-folder).*(name=.*).*' ,
    '(cmd=create-pfolder).*(name=.*).*(server=.*).*',
    '(cmd=create-pfolder).*(name=.*).*(server=.*).*(site=.*)'
Asked By: jam

||

Answers:

Try with the following regex:

(S+)=([^=]+?)(?=sS+=|$)

Regex Explanation:

  • (S+): first group holds any non-space character
  • =: followed by a equal sign
  • ([^=]+?): second group holds any non-equal character (least possible)
  • (?=sS+=|$): followed by either a space + word + =, or end of string character

Check the regex demo here.

Note: Here the assumption is that your key (right-hand side of the pair) won’t allow spaces.


You can then use this python code to retrieve your groups:

import re

strings = [
    "cmd=create-folder    name=SelfServe - Test ride",
    "cmd=create-folder    name=SelfServe - Test ride server=prd",
    "cmd=create-folder  name=cert - Test ride   server=dev site=Service",
    "cmd=create-folder   name=cert - Test ride   server=dev site=Service permission=locked"
]

pattern = r'(S+)=([^=]+?)(?=sS+=|$)'

for string in strings:
    print(string)
    for match in re.findall(pattern, string):
        print(f'Group1: {match[0]} t Group2: {match[1]}')
    print()

Check the python demo here.

Answered By: lemon

I would suggest using split, and then zip to feed the dict constructor:

def get_dict(s):
    parts = re.split(r"s*(w+)=", s)
    return dict(zip(parts[1::2], parts[2::2]))

Example runs:

print(get_dict("cmd=create-folder    name=SelfServe - Test ride"))
print(get_dict("cmd=create-folder    name=SelfServe - Test ride server=prd"))
print(get_dict("cmd=create-folder  name=cert - Test ride   server=dev site=Service"))
print(get_dict("cmd=create-folder   name=cert - Test ride   server=dev site=Service permission=locked"))

Outputs:

{'cmd': 'create-folder', 'name': 'SelfServe - Test ride'}
{'cmd': 'create-folder', 'name': 'SelfServe - Test ride', 'server': 'prd'}
{'cmd': 'create-folder', 'name': 'cert - Test ride', 'server': 'dev', 'site': 'Service'}
{'cmd': 'create-folder', 'name': 'cert - Test ride', 'server': 'dev', 'site': 'Service', 'permission': 'locked'}

Explanation

Using this input as example:

"cmd=create-folder    name=SelfServe - Test ride"

The split regex identifies these parts:

"cmd=create-folder    name=SelfServe - Test ride"
 ^^^^             ^^^^^^^^^

The strings that are not matched by it will end up a results, so we have these:

 "", "create-folder", "SelfServe - Test ride"

The first string is empty, because it is what precedes the first match.

Now, as the regex has a capture group, the string that is captured by that group, is also returned in the result list, at odd indices. So parts ends up like this:

 ["", "cmd", "create-folder", "name", "SelfServe - Test ride"]

The keys we are interested in, occur at odd indices. We can get those with parts[1::2], where 1 is the starting index, and 2 is the step.

The corresponding values for those keys occur at even indices, ignoring the empty string at index 0. So we get those with parts[2::2]. With the call to zip, we pair those keys and values together as we want them.

Finally, the dict constructor can take an argument with key/value pairs, which is exactly what that zip call provides.

Answered By: trincot
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.