Convert JSON to SQLite in Python – How to map json keys to database columns properly?

Question:

I want to convert a JSON file I created to a SQLite database.

My intention is to decide later which data container and entry point is best, json (data entry via text editor) or SQLite (data entry via spreadsheet-like GUIs like SQLiteStudio).

My json file is like this (containing traffic data from some crossroads in my city):

...
"2011-12-17 16:00": {
    "local": "Av. Protásio Alves; esquina Ramiro Barcelos",
    "coord": "-30.036916,-51.208093",
    "sentido": "bairro-centro",
    "veiculos": "automotores",
    "modalidade": "semaforo 50-15",
    "regime": "típico",
    "pistas": "2+c",
    "medicoes": [
        [32, 50],
        [40, 50],
        [29, 50],
        [32, 50],
        [35, 50]
        ]
    },
"2011-12-19 08:38": {
    "local": "R. Fernandes Vieira; esquina Protásio Alves",
    "coord": "-30.035535,-51.211079",
    "sentido": "único",
    "veiculos": "automotores",
    "modalidade": "semáforo 30-70",
    "regime": "típico",
    "pistas": "3",
    "medicoes": [
        [23, 30],
        [32, 30],
        [33, 30],
        [32, 30]
        ]
    }
...

And I have created nice database with a one-to-many relation with these lines of Python code:

import sqlite3

db = sqlite3.connect("fluxos.sqlite")
c = db.cursor()

c.execute('''create table medicoes
         (timestamp text primary key,
          local text,
          coord text,
          sentido text,
          veiculos text,
          modalidade text,
          pistas text)''')

c.execute('''create table valores
         (id integer primary key,
          quantidade integer,
          tempo integer,
          foreign key (id) references medicoes(timestamp))''')

BUT the problem is, when I was preparing to insert the rows with actual data with something like c.execute("insert into medicoes values(?,?,?,?,?,?,?)" % keys), I realized that, since the dict loaded from the JSON file has no special order, it does not map properly to the column order of the database.

So, I ask: “which strategy/method should I use to programmatically read the keys from each “block” in the JSON file (in this case, “local”, “coord”, “sentido”, “veiculos”, “modalidade”, “regime”, “pistas” e “medicoes”), create the database with the columns in that same order, and then insert the rows with the proper values”?

I have a fair experience with Python, but am just beginning with SQL, so I would like to have some counseling about good practices, and not necessarily a ready recipe.

Asked By: heltonbiker

||

Answers:

You have this python code:

c.execute("insert into medicoes values(?,?,?,?,?,?,?)" % keys)

which I think should be

c.execute("insert into medicoes values (?,?,?,?,?,?,?)", keys)

since the % operator expects the string to its left to contain formatting codes.

Now all you need to make this work is for keys to be a tuple (or list) containing the values for the new row of the medicoes table in the correct order. Consider the following python code:

import json

traffic = json.load(open('xxx.json'))

columns = ['local', 'coord', 'sentido', 'veiculos', 'modalidade', 'pistas']
for timestamp, data in traffic.iteritems():
    keys = (timestamp,) + tuple(data[c] for c in columns)
    print str(keys)

When I run this with your sample data, I get:

(u'2011-12-19 08:38', u'R. Fernandes Vieira; esquina Protxe1sio Alves', u'-30.035535,-51.211079', u'xfanico', u'automotores', u'semxe1foro 30-70', u'3')
(u'2011-12-17 16:00', u'Av. Protxe1sio Alves; esquina Ramiro Barcelos', u'-30.036916,-51.208093', u'bairro-centro', u'automotores', u'semaforo 50-15', u'2+c')

which would seem to be the tuples you require.

You could add the necessary sqlite code with something like this:

import json
import sqlite3

traffic = json.load(open('xxx.json'))
db = sqlite3.connect("fluxos.sqlite")

query = "insert into medicoes values (?,?,?,?,?,?,?)"
columns = ['local', 'coord', 'sentido', 'veiculos', 'modalidade', 'pistas']
for timestamp, data in traffic.iteritems():
    keys = (timestamp,) + tuple(data[c] for c in columns)
    c = db.cursor()
    c.execute(query, keys)
    c.close()

Edit: if you don’t want to hard-code the list of columns, you could do something like this:

import json

traffic = json.load(open('xxx.json'))

someitem = traffic.itervalues().next()
columns = list(someitem.keys())
print columns

When I run this it prints:

[u'medicoes', u'veiculos', u'coord', u'modalidade', u'sentido', u'local', u'pistas', u'regime']

You could use it with something like this:

import json
import sqlite3

db = sqlite3.connect('fluxos.sqlite')
traffic = json.load(open('xxx.json'))

someitem = traffic.itervalues().next()
columns = list(someitem.keys())
columns.remove('medicoes')
columns.remove('regime')

query = "insert into medicoes (timestamp,{0}) values (?{1})"
query = query.format(",".join(columns), ",?" * len(columns))
print query

for timestamp, data in traffic.iteritems():
    keys = (timestamp,) + tuple(data[c] for c in columns)
    c = db.cursor()
    c.execute(query)
    c.close()

The query this code prints when I try it with your sample data is something like this:

insert into medicoes (timestamp,veiculos,coord,modalidade,sentido,local,pistas) values (?,?,?,?,?,?,?)
Answered By: srgerg