How to preserve leading zeros with read_json in Python
Question:
I get JSON from a URL. I then want to put it into a dataframe and insert it into a SQL table:
import requests
import json
import pandas as pd
import pyodbc
from sqlalchemy import create_engine
# Make the database connection
conn = pyodbc.connect('Driver={SQL Server};'
'Server=SomeServer;'
'Database=SomeDatabase;'
'Trusted_Connection=yes;')
# Set the url to given endpoint
url = "https://someurl.com/query/Extract"
# Connect to endpoinnt with credentials and put results in dictionary
myResponse = requests.get(url,auth=("someUser", "somePassword"), verify=True)
# Load the response as proper JSON into a var
MyResponse = (myResponse.content)
# Load the var into a dataframe
df = pd.read_json(MyResponse)
# Create cursor using the connection above
cursor = conn.cursor()
# Truncate the load table
cursor.execute("TRUNCATE TABLE load_Sometable")
# Insert each row of the dataframe into the load table
df = df.fillna(value = 0)
print (df)
for index, row in df.iterrows():
cursor.execute("INSERT INTO SomeDatabase.dbo.load_SomeTable(MaterialCode, QtyUnits, PalletNo, CustomerPallet, CustomerCarton, Markets, BlockID, BlockStatus) values(?,?,?,?,?,?,?,?)", str(int(row.MaterialCode)), row.QtyUnits, str(row.PalletNo), row.CustomerPallet, row.CustomerCarton, row.Markets, row.BlockID, row.BlockStatus)
# Commit and close
conn.commit()
cursor.close()
conn.close()
In the resulting JSON there are pallet numbers that may be prefixed with zeros, example 00123456789
If I print the variable myResponse, the JSON still has the leading zeros on the pallet numbers. After putting it into a dataframe with df = pd.read_json(MyResponse)
the leading zeros are dropped.
I have searched a lot and read_csv among other has options, with converters, but I can’t find anything for read_json…
Can anyone assist?
Answers:
As mentioned by @Jonathan Leon in the comments, you can solve this by providing the dtypes.
Example:
import pandas as pd
json_str = '{"data":["00123456789","00223456789"]}'
df = pd.read_json(json_str, dtype={'data': str})
Output df:
data
0 00123456789
1 00223456789
Without providing the datatypes the example strips the leading zeroes, i.e. pd.read_json(json_str)
results in:
data
0 123456789
1 223456789
I get JSON from a URL. I then want to put it into a dataframe and insert it into a SQL table:
import requests
import json
import pandas as pd
import pyodbc
from sqlalchemy import create_engine
# Make the database connection
conn = pyodbc.connect('Driver={SQL Server};'
'Server=SomeServer;'
'Database=SomeDatabase;'
'Trusted_Connection=yes;')
# Set the url to given endpoint
url = "https://someurl.com/query/Extract"
# Connect to endpoinnt with credentials and put results in dictionary
myResponse = requests.get(url,auth=("someUser", "somePassword"), verify=True)
# Load the response as proper JSON into a var
MyResponse = (myResponse.content)
# Load the var into a dataframe
df = pd.read_json(MyResponse)
# Create cursor using the connection above
cursor = conn.cursor()
# Truncate the load table
cursor.execute("TRUNCATE TABLE load_Sometable")
# Insert each row of the dataframe into the load table
df = df.fillna(value = 0)
print (df)
for index, row in df.iterrows():
cursor.execute("INSERT INTO SomeDatabase.dbo.load_SomeTable(MaterialCode, QtyUnits, PalletNo, CustomerPallet, CustomerCarton, Markets, BlockID, BlockStatus) values(?,?,?,?,?,?,?,?)", str(int(row.MaterialCode)), row.QtyUnits, str(row.PalletNo), row.CustomerPallet, row.CustomerCarton, row.Markets, row.BlockID, row.BlockStatus)
# Commit and close
conn.commit()
cursor.close()
conn.close()
In the resulting JSON there are pallet numbers that may be prefixed with zeros, example 00123456789
If I print the variable myResponse, the JSON still has the leading zeros on the pallet numbers. After putting it into a dataframe with df = pd.read_json(MyResponse)
the leading zeros are dropped.
I have searched a lot and read_csv among other has options, with converters, but I can’t find anything for read_json…
Can anyone assist?
As mentioned by @Jonathan Leon in the comments, you can solve this by providing the dtypes.
Example:
import pandas as pd
json_str = '{"data":["00123456789","00223456789"]}'
df = pd.read_json(json_str, dtype={'data': str})
Output df:
data
0 00123456789
1 00223456789
Without providing the datatypes the example strips the leading zeroes, i.e. pd.read_json(json_str)
results in:
data
0 123456789
1 223456789