Targeting specific values from JSON API and inserting into Postgresql, using Python
Question:
Right now i am able to connect to the url api and my database. I am trying to insert data from the url to the postgresql database using psycopg2. I dont fully understand how to do this, and this is all i could come up with to do this.
import urllib3
import json
import certifi
import psycopg2
from psycopg2.extras import Json
http = urllib3.PoolManager(
cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where())
url = '<API-URL>'
headers = urllib3.util.make_headers(basic_auth='<user>:<passowrd>')
r = http.request('GET', url, headers=headers)
data = json.loads(r.data.decode('utf-8'))
def insert_into_table(data):
for item in data['issues']:
item['id'] = Json(item['id'])
with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
with conn.cursor() as cursor:
query = """
INSERT into
Countries
(revenue)
VALUES
(%(id)s);
"""
cursor.executemany(query, data)
conn.commit()
insert_into_table(data)
So this code give me a TypeError: string indices must be integers
on cursor.executemany(query, data)
So i know that json.loads brings back a type object and that json.dumps brings a type string . I wasn’t sure which one i should be using. and i know i am completely missing something on how im targeting the ‘id’ value, and inserting it into the query.
Also a little about the API, it is very large and complex and eventually i’ll have to go down multiple trees to grab certain values, here is an example of what i’m pulling from.
I am trying to grab “id” under “issues” and not “issue type”
{
"expand": "<>",
"startAt": 0,
"maxResults": 50,
"total": 13372,
"issues": [
{
"expand": "<>",
"id": "41508",
"self": "<>",
"key": "<>",
"fields": {
"issuetype": {
"self": "<>",
"id": "1",
"description": "<>",
"iconUrl": "<>",
"name": "<>",
"subtask": <>,
"avatarId": <>
},
Answers:
If you’re trying to get just the id and insert it into your table, you should try
ids = []
for i in data['issues']:
ids.append(i['id'])
Then you can pass your ids
list to you cursor.executemany
function.
The issue you have is not in the way you are parsing your JSON, it occurs when you try to insert it into your table using cursor.executemany()
.
data
is a single object, Are you attempting to insert all of the data your fetch returns into your table all at once? Or are you trying to insert a specific part of the data (a list of issue IDs)?
You are passing data
into your cursor.executemany
call. data
is an object. I believe you wish to pass data.issues
which is the list of issues that you modified.
If you only wish to insert the ids into the table try this:
def insert_into_table(data):
with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
with conn.cursor() as cursor:
query = """
INSERT into
Countries
(revenue)
VALUES
(%(id)s);
"""
for item in data['issues']:
item['id'] = Json(item['id'])
cursor.execute(query, item['id')
conn.commit()
insert_into_table(data)
If you wish keep the efficiency of using cursor.executemany()
You need create an array of the IDs, as the current object structure doesn’t arrange them the way the cursor.executemany()
requires.
First, extract ids
into a list of tuples:
ids = list((item['id'],) for item in data['issues'])
# example ids: [('41508',), ('41509',)]
Next use the function extras.execute_values():
from psycopg2 import extras
query = """
INSERT into Countries (revenue)
VALUES %s;
"""
extras.execute_values(cursor, query, ids)
Why I was getting type errors?
The second argument of the function executemany(query, vars_list) should be a sequence while data
is an object which elements cannot be accessed by integer indexes.
Why to use execute_values()
instead of executemany()
?
Because of performance, the first function executes a single query with multiple arguments, while the second one executes as many queries as arguments.
Note, that by default the third argument of execute_values()
is a list of tuples, so we extracted ids
just in this way.
If you have to insert values into more than one column, each tuple in the list should contain all the values for a single inserted row, example:
values = list((item['id'], item['key']) for item in data['issues'])
query = """
INSERT into Countries (id, revenue)
VALUES %s;
"""
extras.execute_values(cur, query, values)
Right now i am able to connect to the url api and my database. I am trying to insert data from the url to the postgresql database using psycopg2. I dont fully understand how to do this, and this is all i could come up with to do this.
import urllib3
import json
import certifi
import psycopg2
from psycopg2.extras import Json
http = urllib3.PoolManager(
cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where())
url = '<API-URL>'
headers = urllib3.util.make_headers(basic_auth='<user>:<passowrd>')
r = http.request('GET', url, headers=headers)
data = json.loads(r.data.decode('utf-8'))
def insert_into_table(data):
for item in data['issues']:
item['id'] = Json(item['id'])
with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
with conn.cursor() as cursor:
query = """
INSERT into
Countries
(revenue)
VALUES
(%(id)s);
"""
cursor.executemany(query, data)
conn.commit()
insert_into_table(data)
So this code give me a TypeError: string indices must be integers
on cursor.executemany(query, data)
So i know that json.loads brings back a type object and that json.dumps brings a type string . I wasn’t sure which one i should be using. and i know i am completely missing something on how im targeting the ‘id’ value, and inserting it into the query.
Also a little about the API, it is very large and complex and eventually i’ll have to go down multiple trees to grab certain values, here is an example of what i’m pulling from.
I am trying to grab “id” under “issues” and not “issue type”
{
"expand": "<>",
"startAt": 0,
"maxResults": 50,
"total": 13372,
"issues": [
{
"expand": "<>",
"id": "41508",
"self": "<>",
"key": "<>",
"fields": {
"issuetype": {
"self": "<>",
"id": "1",
"description": "<>",
"iconUrl": "<>",
"name": "<>",
"subtask": <>,
"avatarId": <>
},
If you’re trying to get just the id and insert it into your table, you should try
ids = []
for i in data['issues']:
ids.append(i['id'])
Then you can pass your ids
list to you cursor.executemany
function.
The issue you have is not in the way you are parsing your JSON, it occurs when you try to insert it into your table using cursor.executemany()
.
data
is a single object, Are you attempting to insert all of the data your fetch returns into your table all at once? Or are you trying to insert a specific part of the data (a list of issue IDs)?
You are passing data
into your cursor.executemany
call. data
is an object. I believe you wish to pass data.issues
which is the list of issues that you modified.
If you only wish to insert the ids into the table try this:
def insert_into_table(data):
with psycopg2.connect(database='test3', user='<username>', password='<password>', host='localhost') as conn:
with conn.cursor() as cursor:
query = """
INSERT into
Countries
(revenue)
VALUES
(%(id)s);
"""
for item in data['issues']:
item['id'] = Json(item['id'])
cursor.execute(query, item['id')
conn.commit()
insert_into_table(data)
If you wish keep the efficiency of using cursor.executemany()
You need create an array of the IDs, as the current object structure doesn’t arrange them the way the cursor.executemany()
requires.
First, extract ids
into a list of tuples:
ids = list((item['id'],) for item in data['issues'])
# example ids: [('41508',), ('41509',)]
Next use the function extras.execute_values():
from psycopg2 import extras
query = """
INSERT into Countries (revenue)
VALUES %s;
"""
extras.execute_values(cursor, query, ids)
Why I was getting type errors?
The second argument of the function executemany(query, vars_list) should be a sequence while data
is an object which elements cannot be accessed by integer indexes.
Why to use
execute_values()
instead ofexecutemany()
?
Because of performance, the first function executes a single query with multiple arguments, while the second one executes as many queries as arguments.
Note, that by default the third argument of execute_values()
is a list of tuples, so we extracted ids
just in this way.
If you have to insert values into more than one column, each tuple in the list should contain all the values for a single inserted row, example:
values = list((item['id'], item['key']) for item in data['issues'])
query = """
INSERT into Countries (id, revenue)
VALUES %s;
"""
extras.execute_values(cur, query, values)