regular expression to match only json output
Question:
I need to extract valid JSON from various output. Sometimes it is:
[
{
"parameter1" : "value1",
"parameter2" : "value2"
}
]
/*
bla bla
*/
Sometime it is different:
{
"parameter3" : "value3",
"parameter4" : "value4"}
/*
another blah
*/
BUT sometimes it has no comment section (valid JSON without /*).
So this multi-line string can not be parsed as JSON because of extended characters after data.
How can I get rid of this characters if they exist using re module in python 3?
I tried to use this regexp match
^((.|n)*?)/*
but when there is no comment it is not working at all.
Answers:
You could try looking for the first and last curly braces and use those, regex not needed.
import json
main_data = """[
{
"parameter1" : "value1",
"parameter2" : "value2"
}
]
/*
bla bla
*/
"""
json_data = json.loads(main_data[main_data.find('{'):main_data.rfind('}')+1])
print(json_data)
output
{'parameter1': 'value1', 'parameter2': 'value2'}
You could also use the error message to get the location of extra characters in general which might work for more complex scenarios. JSONDecodeError has properties that help doing this.
try:
json_data = json.loads(main_data)
except json.JSONDecodeError as e:
if e.msg == 'Extra data':
json_data = json.loads(main_data[:e.pos])
else:
raise e
OLD
try:
json_data = json.loads(main_data)
except json.JSONDecodeError as e:
error_message = str(e)
if error_message.startswith("Extra data:"):
json_data = json.loads(main_data[:int(error_message[error_message.rfind('(char ')+6:-1])])
else:
raise e
alternately use regex matching on the error message to make it clearer
try:
json_data = json.loads(main_data)
except json.JSONDecodeError as e:
expected_error = "^Extra data: line d+ column d+ (char (d+))$"
error_message = str(e)
match = re.match(expected_error, error_message)
if match:
json_data = json.loads(main_data[:int(match.group(1))])
else:
raise e
I need to extract valid JSON from various output. Sometimes it is:
[
{
"parameter1" : "value1",
"parameter2" : "value2"
}
]
/*
bla bla
*/
Sometime it is different:
{
"parameter3" : "value3",
"parameter4" : "value4"}
/*
another blah
*/
BUT sometimes it has no comment section (valid JSON without /*).
So this multi-line string can not be parsed as JSON because of extended characters after data.
How can I get rid of this characters if they exist using re module in python 3?
I tried to use this regexp match
^((.|n)*?)/*
but when there is no comment it is not working at all.
You could try looking for the first and last curly braces and use those, regex not needed.
import json
main_data = """[
{
"parameter1" : "value1",
"parameter2" : "value2"
}
]
/*
bla bla
*/
"""
json_data = json.loads(main_data[main_data.find('{'):main_data.rfind('}')+1])
print(json_data)
output
{'parameter1': 'value1', 'parameter2': 'value2'}
You could also use the error message to get the location of extra characters in general which might work for more complex scenarios. JSONDecodeError has properties that help doing this.
try:
json_data = json.loads(main_data)
except json.JSONDecodeError as e:
if e.msg == 'Extra data':
json_data = json.loads(main_data[:e.pos])
else:
raise e
OLD
try:
json_data = json.loads(main_data)
except json.JSONDecodeError as e:
error_message = str(e)
if error_message.startswith("Extra data:"):
json_data = json.loads(main_data[:int(error_message[error_message.rfind('(char ')+6:-1])])
else:
raise e
alternately use regex matching on the error message to make it clearer
try:
json_data = json.loads(main_data)
except json.JSONDecodeError as e:
expected_error = "^Extra data: line d+ column d+ (char (d+))$"
error_message = str(e)
match = re.match(expected_error, error_message)
if match:
json_data = json.loads(main_data[:int(match.group(1))])
else:
raise e