Regexp finding duplicated character conditionally in incorrectly formatted JSON

Question:

Suppose I have the following string:

example = '{"Hello": "world", "key_1": ""Double quote text"", "key_2", ""}'

This is not a properly created JSON object, thus when I try to load it:

json.loads(example)

I get an error
JSONDecodeError: Expecting ',' delimiter: line 1 column 29 (char 28)

I’ve obviously tried to example.replace('""', '"'), but this, fixes key_1, but breaks key_2.

How can I remove the double quote in key_1, without removing it from key_2?

Asked By: adrpino

||

Answers:

you can use this pattern and python regex sub

re.sub(r"(""([^"]*)"")",r'"2"',example)

this pattern will find any sets of ""string"" that have data between them and replace is it with a "string"

a cleaner pattern would be this:

'("{2}([^"]*)"{2})'
Answered By: Mojtaba
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.