How to convert csv to json

Question:

I have the following csv file:

Topic,Characteristics,Total
Population and dwellings,Population-2016,183314
Population and dwellings,Population-2011,175779
Population and dwellings,Population percentage change,4.3
Age characteristics,0 to 14 years,30670
Age characteristics,0 to 4 years,9275
Age characteristics,5 to 9 years,10475

I would like to output a json file such that each unique ‘Topic’ is a key, and the value is a dictionary of ‘Characteristic’: ‘Total’, i.e. the output would be:

{
    "Population and dwellings": {
        "Population-2016": 183314,
        "Population-2011": 175779,
        "Population percent change": 4.3
    },
    "Age characteristics": {
        "0 to 14 years": 30670,
        "0 to 4 years": 9275,
        "5 to 9 years": 10475
    }
}

How can I do this properly? All the methods I have tried so far overwrite each other, any help would be appreciated. Thanks.

Asked By: Abdullah Hadi

||

Answers:

You can use csv module to read the file and dict.setdefault to group elements:

import csv

out = {}
with open("your_file.csv", "r") as f_in:
    reader = csv.reader(f_in)
    next(reader)  # skip headers
    for topic, characteristics, total in reader:
        out.setdefault(topic, {})[characteristics] = float(total)

print(out)

Prints:

{
    "Population and dwellings": {
        "Population-2016": 183314.0,
        "Population-2011": 175779.0,
        "Population percentage change": 4.3,
    },
    "Age characteristics": {
        "0 to 14 years": 30670.0,
        "0 to 4 years": 9275.0,
        "5 to 9 years": 10475.0,
    },
}

To output JSON from out you can do then:

import json

print(json.dumps(out, indent=4))
Answered By: Andrej Kesely

My solution is similar to Andrej, but I use defaultdict to simplify the code.

import collections
import csv
import json


out = collections.defaultdict(lambda: collections.defaultdict(float))

with open("data.csv") as stream:
    next(stream)  # Skip the header
    reader = csv.reader(stream)
    for topic, characteristics, total in reader:
        out[topic][characteristics] += float(total)

print(json.dumps(out, indent=4))

Output:

{
    "Population and dwellings": {
        "Population-2016": 183314.0,
        "Population-2011": 175779.0,
        "Population percentage change": 4.3
    },
    "Age characteristics": {
        "0 to 14 years": 30670.0,
        "0 to 4 years": 9275.0,
        "5 to 9 years": 10475.0
    }
}
Answered By: Hai Vu