Merge multiple JSON files into one file by using Python (stream twitter)

Question:

I’ve pulled data from Twitter. Currently, the data is in multiple files and I could not merge it into one single file.

Note: all files are in JSON format.

The code I have used is here and here.

It has been suggested to work with glop to compile JSON files

I write this code as I have seen in some tutorials about merge JSON by using Python

from glob import glob 
import json
import pandas as pd

with open('Desktop/json/finalmerge.json', 'w') as f: 
    for fname in glob('Desktop/json/*.json'): # Reads all json from the current directory 
        with open(fname) as j: 
            f.write(str(j.read())) 
            f.write('n')

I successfully merge all files and now the file is finalmerge.json.

Now I used this as suggested in several threads:

df_lines = pd.read_json('finalmerge.json', lines=True)
df_lines
1000000*23 columns 

Then, what I should do to make each feature in separate columns?

I’m not sure why what’s wrong with JSON files, I checked the file that I merge and I found it’s not valid as JSON file? what I should do to make this as a data frame?

The reason I am asking this is that I have very basic python knowledge and all the answers to similar questions that I have found are way more complicated than I can understand. Please help this new python user to convert multiple JSON files to one JSON file.

Asked By: ML Moh

||

Answers:

I think that the problem is that your files are not really json (or better, they are structured as jsonl ). You have two ways of proceding:

  1. you could read every file as a text file and merge them line by line
  2. you could convert them to json (add a square bracket at the beginning of the file and a comma at the end of every json element).

Try following this question and let me know if it solves your problem: Loading JSONL file as JSON objects

You can also try to edit your code this way:

with open('finalmerge.json', 'w') as f:
    for fname in glob('Desktop/json/*.json'): 
        with open(fname) as j:
            f.write(str(j.read()))
            f.write('n')

Every line will be a different json element.

Answered By: emanuele_maruzzi
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.