Comma separated number after specific substring in middle of string

Question

I need to extract a sequence of coma separated numbers after specific substring. When the substring is in the beginning of the string it works fine, but not when its in the middle.

The regex 'Port': .([0-9]+) works fine with the example below to get the value 2.

String example:

{'Port': '2', 'Array': '[0, 0]', 'Field': '[2,2]', 'foo': '[0, 0]' , 'bar': '[9, 9]'}

But i need to get Field value, I dont care if its ‘[2,2]’ or 2,2 (string or number)

I tried various attempts with regex calculator, but couldnt find a solution to return the value after string in middle of the text. Any ideas? Please help. Thanks ahead, Nir

Asked By: nir

||

Source

Answer 1

This looks like a print()ed Python dict; can you use ast.literal_eval() to bring it back into a dictionary?

>>> import ast
>>> d = ast.literal_eval("""{'Port': '2', 'Array': '[0, 0]', 'Field': '[2,2]', 'foo': '[0, 0]' , 'bar': '[9, 9]'}""")
>>> d
{'Port': '2', 'Array': '[0, 0]', 'Field': '[2,2]', 'foo': '[0, 0]', 'bar': '[9, 9]'}
>>> d["Array"]
'[0, 0]'

Answered By: ti7

Answer 2

I found the regex to be like this, not sure if thats what you want:

import re

string = "{'Port': '2', 'Array': '[0, 0]', 'Field': '[2,2]', 'foo': '[0, 0]' , 'bar': '[9, 9]'}"

output = re.findall(r"'Field': '[([0-9]+),([0-9]+)]'",string)

print(output)

output:

[('2', '2')]

if you want as a string:

output = str(output).replace('[','').replace(']','').replace('(','').replace(')','').replace(' ','').replace(''','')
print(output)

output:

2,2

EDIT:

If I got what you want, this might work, it will create a new dataframe with values with only a column called ‘Field’ and you can then append it to your own dataframe.

values = []

def get_values(mdict, values):
    pattern = r"'Field': '[([0-9]+),([0-9]+)]'"
    output = re.findall(pattern,mdict)
    output = str(output).replace('[','').replace(']','').replace('(','').replace(')','').replace(' ','').replace(''','')
    values.append(output)

# get_values(mdict, values)

for x in df['param']:
    get_values(str(x), values)

df_temp = pd.DataFrame(values, columns=['Field'])

df.append(df_temp)

Answered By: docksdocks

Answer 3

If you just want 2,2 for the Field value you can use a single capture group.

Note that you don’t have to escape the ' : , and ]

'Field':s+'[([0-9]+,s*[0-9]+)]'

'Field': Match literally
s+'[ Match 1+ whitespace chars and [
( Capture group 1
- [0-9]+,s*[0-9]+ Match 1+ digits , optional whitespace chars and 1+ digits
) Close group 1
]' Match literally

See a regex demo and a Python demo.

Example code

import re

pattern = r"'Field':s+'[([0-9]+,[0-9]+)]'"

s = "{'Port': '2', 'Array': '[0, 0]', 'Field': '[2,2]', 'foo': '[0, 0]' , 'bar': '[9, 9]'}"

m = re.search(pattern, s)
if m:
    print(m.group(1))

Output

2,2

If you want to get all the values where the fields are between single quotes you can use a conditional matching the ] only when there is a [

'[^']+':s+'([)?([0-9]+(?:,s*[0-9]+)*)(?(1)])'

Regex demo | Python demo

Then you can get the capture group 2 value.

Example:

import re

pattern = r"'[^']+':s+'([)?([0-9]+(?:,s*[0-9]+)*)(?(1)])'"
s = "{'Port': '2', 'Array': '[0, 0]', 'Field': '[2,2]', 'foo': '[0, 0]' , 'bar': '[9, 9]'}"
matches = re.finditer(pattern, s)

for matchNum, match in enumerate(matches, start=1):
    print(match.group(2))

Output

Answered By: The fourth bird

Comma separated number after specific substring in middle of string

Question:

Answers: