How to preserve leading zeros with read_json in Python

Question:

I get JSON from a URL. I then want to put it into a dataframe and insert it into a SQL table:

import requests
import json
import pandas as pd
import pyodbc 
from sqlalchemy import create_engine

# Make the database connection
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=SomeServer;'
                      'Database=SomeDatabase;'
                      'Trusted_Connection=yes;')

# Set the url to given endpoint
url = "https://someurl.com/query/Extract"

# Connect to endpoinnt with credentials and put results in dictionary
myResponse = requests.get(url,auth=("someUser", "somePassword"), verify=True)

# Load the response as proper JSON into a var
MyResponse = (myResponse.content)
# Load the var into a dataframe
df = pd.read_json(MyResponse)

# Create cursor using the connection above
cursor = conn.cursor()
# Truncate the load table
cursor.execute("TRUNCATE TABLE load_Sometable")
# Insert each row of the dataframe into the load table
df = df.fillna(value = 0)
print (df)
for index, row in df.iterrows():
    cursor.execute("INSERT INTO SomeDatabase.dbo.load_SomeTable(MaterialCode, QtyUnits, PalletNo, CustomerPallet, CustomerCarton, Markets, BlockID, BlockStatus) values(?,?,?,?,?,?,?,?)", str(int(row.MaterialCode)), row.QtyUnits, str(row.PalletNo), row.CustomerPallet, row.CustomerCarton, row.Markets, row.BlockID, row.BlockStatus)

# Commit and close
conn.commit()
cursor.close()
conn.close()

In the resulting JSON there are pallet numbers that may be prefixed with zeros, example 00123456789

If I print the variable myResponse, the JSON still has the leading zeros on the pallet numbers. After putting it into a dataframe with df = pd.read_json(MyResponse) the leading zeros are dropped.

I have searched a lot and read_csv among other has options, with converters, but I can’t find anything for read_json…

Can anyone assist?

Asked By: opperman.eric

||

Answers:

As mentioned by @Jonathan Leon in the comments, you can solve this by providing the dtypes.

Example:

import pandas as pd

json_str = '{"data":["00123456789","00223456789"]}'

df = pd.read_json(json_str, dtype={'data': str})

Output df:

          data
0  00123456789
1  00223456789

Without providing the datatypes the example strips the leading zeroes, i.e. pd.read_json(json_str) results in:

        data
0  123456789
1  223456789
Answered By: above_c_level