How do I get the data extracted from API to my database

Question:

I am currently working on a project to build a database on professor’s research paper database.
This is my first time building a database(never had experience with MYSQL) and I am learning as I am doing it.

I was able to use an api to get the data, for example:

{"authorId": "1773022", "url": "https://www.semanticscholar.org/author/1773022", "papers": [{"paperId": "1253d2704580a74e776ae211602cfde71532c057", "title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning"}, {"paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6", "title": "Task-Driven Learning of Spatial Combinations of Visual Features"}, {"paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662", "title": "Statistical Learning of Visual Feature Hierarchies"}]}

How would I use python to turn this into a table so I can use it to build my database?

I am trying to make a table where columns are:
Paper ID|Title|

Asked By: Sang Park

||

Answers:

Firstly download MySQL and run the below SQL on the MySQL database to create your MYSQL table

CREATE TABLE papers
(
paperId varchar(1024),
title varchar(4000)
);

Then below py code, connects to your MySQL database and inserts the records. Just modify the host,user,password,database to your database. Sorry don’t have MySQL to test it but it should work. I used pandas to convert the API list to a dataframe so its easier to work with when inserting into the database table, hope it helps

import pandas as pd
import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

mycursor = mydb.cursor()

data = {"authorId":"1773022","url":"https://www.semanticscholar.org/author/1773022","papers":[{"paperId": "1253d2704580a74e776ae211602cfde71532c057", "title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning"},{"paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6", "title": "Task-Driven Learning of Spatial Combinations of Visual Features"},{"paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662", "title": "Statistical Learning of Visual Feature Hierarchies"}]}
df = pd.DataFrame(data)
papersdf = pd.DataFrame(df['papers'].tolist())

for index, row in papersdf.iterrows():
    sql = print('INSERT INTO papers(paperId,title) VALUES("' + row['paperId'] + '","' + row['title'] + '")')
    mycursor.execute(sql)
    mydb.commit()
    print(mycursor.rowcount, "record inserted.")
Answered By: Tikkaty

From https://www.w3schools.com/python/python_mysql_getstarted.asp


Installation

python -m pip install mysql-connector-python

Overview

Create a connection:

import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword"
)

Create a cursor to interact with the connection you made, then create a Database:

mycursor = mydb.cursor()

mycursor.execute("CREATE DATABASE mydatabase")

mydb.close()

After the database has been created, you can start connecting with it like so:

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

Note that you don’t have to actually close your connection and reopen it to connect to that database, yet I don’t see the docs mentioning anything about interacting with that specific database after you’ve created it, so I’m going to close it after I create the database…

Create the table with the proper datatypes and constraints:

mycursor.execute("CREATE TABLE customers (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), address VARCHAR(255))")

Then insert into it and commit the entries:

sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)

mydb.commit()

Close the connection:

mydb.close()

Your Specific Case

This is the data you have provided:

data = {
    "authorId": "1773022",
    "url": "https://www.semanticscholar.org/author/1773022",
    "papers": [
        {
            "paperId": "1253d2704580a74e776ae211602cfde71532c057",
            "title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning"
        }, 
        {
            "paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6",
            "title": "Task-Driven Learning of Spatial Combinations of Visual Features"
        },
        {
            "paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662",
            "title": "Statistical Learning of Visual Feature Hierarchies"
        }
    ]
}

Granted that I don’t know all the details, but based on the data given, I’d assume that you would want a table for:

  1. Authors – id (auto increment pk), authorId (varchar), url (varchar),
  2. Papers – id (auto increment pk), authorId (varchar fk), paperId, title (varchar)

Modify this as you please:

import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

mycursor = mydb.cursor()

# create the tables if they don't exist
authors = False
papers = False
mycursor.execute("SHOW TABLES")

for e in mycursor.fetchall():
    if "authors" == e[0]:
        authors = True
    if "papers" == e[0]:
        papers = True

if not authors:
    mycursor.execute("CREATE TABLE authors (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), url VARCHAR(255))")

if not papers:
    mycursor.execute("CREATE TABLE papers (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), paperId VARCHAR(255), title VARCHAR(255))")

# insert into them - recall that "data" is what gets returned by your api. I am assuming 1 entry.
a = data["authorId"]
u = data["url"]
ps = data["papers"]

# I am going to check if the authorId already exists.
mycursor.execute("SELECT * FROM authors WHERE authorId = '" + a + "'")
as = mycursor.fetchall()
if len(as) == 0:
    sql = "INSERT INTO authors (authorId, url) VALUES (%s, %s)"
    val = (a, u)
    mycursor.execute(sql, val)
    mydb.commit()

# for each paper in papers
for p in ps:
    # Consider this: Suppose you had multiple authors for 1 paper. Provided that they are entered into the authors table (which they should be), you can add an entry to the papers table for each author.
    # check to see if the paper is already in the database with the specific author, paperId, and title
    mycursor.execute("SELECT * FROM papers WHERE authorId = '" + a + "' AND paperId = '" + p["paperId"] + "' AND title = '" + p["title"] + "'")
    pc = mycursor.fetchall()
    # if they are no entries, insert it
    if len(pc) == 0:
        sql = "INSERT INTO papers (authorId, paperId, title) VALUES (%s, %s, %s)"
        val = (a, p["paperId"], p["title"])
        mycursor.execute(sql, val)
        mydb.commit()

mydb.close()

Side note:

I have not tested this. I will be amazed myself if this works first try. If you come across any errors, I can do my best to trouble shoot.


Pandas Dataframe .to_sql Method

Just in case this thread blows up with a bunch of answers, this answer on this thread uses a nice method from the pandas dataframe library and instead of plagiarizing it, I’d rather give credit where credit is due. I’d also like to include the docs and this detailed answer. It’s probably a lot easier and is going to be a lot closer to an ORM (as I’ve mentioned below here) than my crude attempt to make one.


My Crude Attempt at an ORM

Once again, I have not tested this. It may not work, but the idea is there.
Please check out this article: What are some good Python ORM solutions?

Save this as Models.py

import mysql.connector
import os


# --------------------------------------------------
# General Model
# --------------------------------------------------

# table columns is a dictionary that looks like:
# table_columns = {"column name": "constaints"}
# ex: table_columns = {"authorId": "VARCHAR(255)", "next col": "VARCHAR(255)"}, etc.
# data is the same as table_columns, but contains the data to be added for 1 row
class Model:
    def __init__(self, table_columns={}, table_name=None, data=None, host=os.environ["mysqlhost"], user=os.environ["mysqluser"], password=os.environ["mysqlpassword"], database=os.environ["database"]):
        self.db = None
        self.cursor = None
        self.table_name = table_name
        self.table_columns = table_columns
        self.data = data
        if host is not None and user is not None and password is not None and database is not None:
            try:
                self.db = mysql.connector.connect(
                    host=host,
                    user=user,
                    password=password,
                    database=database
                )
            except Exception as e:
                print(e)
                #print("One or more credentials were incorrect! Could not connect to you database!")
        if self.db is not None:
            self.cursor = self.db.cursor()
            if table_name is not None:
                self.cursor.execute("SHOW TABLES;")
                for e in self.cursor.fetchall():
                    if e[0] != self.table_name:
                        pk = [[e, self.table_columns[e]] for e in self.table_columns if "primary key" in e.lower()]
                        if len(pk) == 1:
                            pk = " ".join(pk) + ", "
                            del self.table_columns[pk[0]]
                        else:
                            pk = ""
                        try:
                            table_string = 'CREATE TABLE "' + self.table_name + '"(' + pk + ", ".join([" ".join(['"' + "_".join(c.split()) + '"', self.table_columns[c].upper()]) for c in self.table_columns]) + ');'
                            self.cursor.execute(table_string)
                            print("Created table with name: " + self.table_name)
                        except Exception as e:
                            self.db.rollback()
                            print(e)

    def insert(self):
        if self.data is not None:
            pkname = ""
            try:
                self.cursor.execute('SHOW KEYS FROM "(%s)" WHERE Key_name = (%s);', (self.table_name, 'PRIMARY'))
                pkname = self.cursor.fetchall()[0]
                if pkname in self.table_columns and pkname not in self.data:
                    del self.table_columns[pkname]
                elif pkname not in self.table_columns and pkname in self.data:
                    del  self.table_columns[pkname]
            except Exception as e:
                print("Could not get primary key name!")
                print(e)
            try:
                self.cursor.execute('SHOW COLUMNS FROM "' + self.table_name + '";')
                self.table_columns = {e: "" for e in self.cursor.fetchall()}
            except Exception as e:
                self.db.rollback()
                print("Could not find table with name " + self.table_name)
                print(e)
            flag = True
            for e in self.data:
                if e not in self.table_columns:
                    flag = False
            if flag:
                if len(self.data) == len(self.table_columns):
                    col = ["'" + e + "'" if e[0] != "'" and e[-1] !="'" else e for e in self.data]
                    data = [self.data[e] for e in self.data]
                    sql = "INSERT INTO %s (%s) VALUES (%s)"
                    val = ('"' + self.table_name + '"', ", ".join(col), ", ".join(data))
                    try:
                        self.cursor.execute(sql, val)
                        self.save()
                    except Exception as e:
                        print("Could not insert into " + self.table_name)
                        print(e)
            else:
                print("Found unexpected data. Try an insert or update query.")

    def save(self):
        committed = False
        try:
            self.db.commit()
            committed = True
        except Exception as e:
            self.db.rollback()
            print(e)
        if committed:
            self.db.close()
            self.db = None
            self.cursor = None

Usage:

import os
import Model as m

os.environ["mysqlhost"] = "host"
os.environ["mysqluser"] = "mysqluser"
os.environ["mysqlpassword"] = "password"
os.environ["database"] = "database"

# Presumably, this is exactly what your API returns.
data = {
    "authorId": "118985833",
    "url": "semanticscholar.org/author/118985833",
    "papers": [
        {
            "paperId": "0b698cd2269d9b52c75dbdf1994dbc9a98fb16c8",
            "title": "The development of self-care workshop to prevent vicarious traumatization in training therapists working with sexually exploited children"
        }
    ]
}

data_for_author = {"authorId": data["authorId"], "url": data["url"]}

# table_name=None
# table_columns={}
# data=None
model = m.Model(table_name="authors", data=data_for_author)
model.insert()

This may not even work – but I did it to give you some hope. There are easier ways to insert data into your database. You need to look into ORMs – the django (a webserver library for python) library has one native to it, that makes it super easy to manage your database.

Answered By: Shmack