replace characters in a json file

Question:

I’ve done some wrong manipulation of a 100 json files. Not sure what happened, but most of my json files now have a random number of the last characters repeated (as per image below).
Is there a way to clean a json file by deleting characters starting from the last one, until the json file has returned into a clean json format ?

enter image description here

enter image description here

enter image description here

Asked By: LBedo

||

Answers:

You can use regular expressions. An alternative would be string manipulation, but in this case regex is quicker to write, especially for one-time-use code.

import re

files = ['a.json','b.json',...] # populate as needed

for filename in files:
    with open(filename,'r') as file:
        content = file.read()
    
    new_content = re.match('([sS]+}]}})[sS]+?',content).group(1)
    
    with open(filename,'w') as file:
        file.write(new_content)

This regex has several parts.
[sS] matches all characters (whereas . would not match newlines and some other characters).
The greedy [sS]+ matches as much as possible, and the lazy [sS]+? matches as little as possible (in this case, the trailing text we don’t want).

We then parenthesise the part we do want to keep, ([sS]+}]}}), and extract that using .group(1) and write this to the file.

For more information, see Reference – What does this regex mean?, and in future I would suggest manipulating JSON using the builtin json library.

Answered By: Mous
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.