Dynamically update pandas NaN based on field type in django model
Question:
I need to save data from csv file to django models.
The data comes from external api so I have no control on its structure.
In my schema, I allowed all fields to be nullable.
This is my script
text = f"{path}/report.csv"
df = pd.read_csv(text)
row_iter = df.iterrows()
for index, row in row_iter:
rows = {key.replace("-", "_"): row.pop(key) for key in row.keys()}
# print(f"rows {rows}")
# default_values = {
# "amazon-order-id",merchant-order-id,purchase-date,last-updated-date,order-status,fulfillment-channel,sales-channel,order-channel,ship-service-level,product-name,sku,asin,item-status,quantity,currency,item-price,item-tax,shipping-price,shipping-tax,gift-wrap-price,gift-wrap-tax,item-promotion-discount,ship-promotion-discount,ship-city,ship-state,ship-postal-code,ship-country,promotion-ids,is-business-order,purchase-order-number,price-designation,is-iba,order-invoice-type
# }
sb, created = Order.objects.update_or_create(
sellingpartner_customer=c,
amazon_order_id=rows["amazon_order_id"],
sku=rows["sku"],
asin=rows["asin"],
defaults={**rows},
)
However, since some of the csv fields has empty values, pandas will replace it with NaN
value, this is where django returns an error
django.db.utils.OperationalError: (1054, "Unknown column 'NaN' in 'field list'")
I tried replace empty values as empty string("")
df.fillna("", inplace=True)
But django will return an error for fields that are not charfields
django.core.exceptions.ValidationError: ['“” value must be a decimal number.']
My question is, how do you handle empty values from csv file in pandas so for example if the field type is boolean, pandas will just replace empty value with boolean False for django boolean type fields, 0
for empty decimal types, and just blank for empty charfields, etc ?
Answers:
Nullable Django models won’t take np.nan
or other Pandas-compatible not-a-number
objects. It expects taking None
as in stock Python. When you have nan
values, before you save them to Django, just replace them with None
to avoid the validation error.
Edit. The straight-forward implementation of replacing your np.nan
objects with None
would be:
df.replace(np.nan, None)
This does not act "in-place". Pass inplace=True
as another parameter or otherwise re-assign the output to df
.
I need to save data from csv file to django models.
The data comes from external api so I have no control on its structure.
In my schema, I allowed all fields to be nullable.
This is my script
text = f"{path}/report.csv"
df = pd.read_csv(text)
row_iter = df.iterrows()
for index, row in row_iter:
rows = {key.replace("-", "_"): row.pop(key) for key in row.keys()}
# print(f"rows {rows}")
# default_values = {
# "amazon-order-id",merchant-order-id,purchase-date,last-updated-date,order-status,fulfillment-channel,sales-channel,order-channel,ship-service-level,product-name,sku,asin,item-status,quantity,currency,item-price,item-tax,shipping-price,shipping-tax,gift-wrap-price,gift-wrap-tax,item-promotion-discount,ship-promotion-discount,ship-city,ship-state,ship-postal-code,ship-country,promotion-ids,is-business-order,purchase-order-number,price-designation,is-iba,order-invoice-type
# }
sb, created = Order.objects.update_or_create(
sellingpartner_customer=c,
amazon_order_id=rows["amazon_order_id"],
sku=rows["sku"],
asin=rows["asin"],
defaults={**rows},
)
However, since some of the csv fields has empty values, pandas will replace it with NaN
value, this is where django returns an error
django.db.utils.OperationalError: (1054, "Unknown column 'NaN' in 'field list'")
I tried replace empty values as empty string("")
df.fillna("", inplace=True)
But django will return an error for fields that are not charfields
django.core.exceptions.ValidationError: ['“” value must be a decimal number.']
My question is, how do you handle empty values from csv file in pandas so for example if the field type is boolean, pandas will just replace empty value with boolean False for django boolean type fields, 0
for empty decimal types, and just blank for empty charfields, etc ?
Nullable Django models won’t take np.nan
or other Pandas-compatible not-a-number
objects. It expects taking None
as in stock Python. When you have nan
values, before you save them to Django, just replace them with None
to avoid the validation error.
Edit. The straight-forward implementation of replacing your np.nan
objects with None
would be:
df.replace(np.nan, None)
This does not act "in-place". Pass inplace=True
as another parameter or otherwise re-assign the output to df
.