JSON to CSV: keys to header issue

Question:

I am trying to convert a very long JSON file to CSV. I’m currently trying to use the code below to accomplish this.

import json
import csv
 
with open('G:userjsondata.json') as json_file:
    jsondata = json.load(json_file)
 
data_file = open('G:userjsonoutput.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
 
count = 0
for data in jsondata:
    if count == 0:
        header = data.keys()
        csv_writer.writerow(header)
        count += 1
    csv_writer.writerow(data.values())
 
data_file.close()

This code accomplishes writing all the data to a CSV, However only takes the keys for from the first JSON line to use as the headers in the CSV. This would be fine, but further in the JSON there are more keys to used. This causes the values to be disorganized. I was wondering if anyone could help me find a way to get all the possible headers and possibly insert NA when a line doesn’t contain that key or values for that key.

The JSON file is similar to this:


[

    {"time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "action": "TAKEN", "Country": "Germany", "Purchased": "YES", ...},
    {"time": "1984-10-04:4:00", "dateOfevent": "1984-10-04", "action": "NOTTAKEN", "Country": "Germany", "Purchased": "NO", ...},
    {"type": "A4", "time": "1984-11-04:4:00", "dateOfevent": "1984-11-04", "Country": "Germany", "typeOfevent": "H7", ...},
    {...},
    {...},

]

I’ve searched for possible solutions all over, but was unable to find anyone having a similar issue.

Asked By: Thomus H

||

Answers:

Could you try to read it normally, and then cobert it to csv using .to_csv like this:

df = pd.read_json('G:userjsondata')
#df = pd.json_normalize(df['Column Name']) #if you want to normalize it


dv.to_csv('example.csv')
Answered By: Hisham

If want to use csv and json modules to do this then can do it in two passes. First pass collects the keys for the CSV file and second pass writes the rows to CSV file. Also, must use a DictWriter since the keys differ in the different records.

import json
import csv

with open('jsondata.json') as json_file:
    jsondata = json.load(json_file)

# stage 1 - populate column names from JSON
keys = []
for data in jsondata:
    for k in data.keys():
        if k not in keys:
            keys.append(k)

# stage 2 - write rows to CSV file
with open('jsonoutput.csv', 'w', newline='') as fout:
    csv_writer = csv.DictWriter(fout, fieldnames=keys)
    csv_writer.writeheader()
    for data in jsondata:
        csv_writer.writerow(data)
Answered By: CodeMonkey
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.