Dynamically update pandas NaN based on field type in django model

Question:

I need to save data from csv file to django models.
The data comes from external api so I have no control on its structure.
In my schema, I allowed all fields to be nullable.

This is my script

   text = f"{path}/report.csv"
        df = pd.read_csv(text)
        row_iter = df.iterrows()

        for index, row in row_iter:
            rows = {key.replace("-", "_"): row.pop(key) for key in row.keys()}
            # print(f"rows {rows}")
            # default_values = {
            #     "amazon-order-id",merchant-order-id,purchase-date,last-updated-date,order-status,fulfillment-channel,sales-channel,order-channel,ship-service-level,product-name,sku,asin,item-status,quantity,currency,item-price,item-tax,shipping-price,shipping-tax,gift-wrap-price,gift-wrap-tax,item-promotion-discount,ship-promotion-discount,ship-city,ship-state,ship-postal-code,ship-country,promotion-ids,is-business-order,purchase-order-number,price-designation,is-iba,order-invoice-type
            # }

            sb, created = Order.objects.update_or_create(
                sellingpartner_customer=c,
                amazon_order_id=rows["amazon_order_id"],
                sku=rows["sku"],
                asin=rows["asin"],
                defaults={**rows},
            )

However, since some of the csv fields has empty values, pandas will replace it with NaN value, this is where django returns an error

   django.db.utils.OperationalError: (1054, "Unknown column 'NaN' in 'field list'")

I tried replace empty values as empty string("")

   df.fillna("", inplace=True)

But django will return an error for fields that are not charfields

   django.core.exceptions.ValidationError: ['“” value must be a decimal number.']

My question is, how do you handle empty values from csv file in pandas so for example if the field type is boolean, pandas will just replace empty value with boolean False for django boolean type fields, 0 for empty decimal types, and just blank for empty charfields, etc ?

Asked By: ira

||

Answers:

Nullable Django models won’t take np.nan or other Pandas-compatible not-a-number objects. It expects taking None as in stock Python. When you have nan values, before you save them to Django, just replace them with None to avoid the validation error.


Edit. The straight-forward implementation of replacing your np.nan objects with None would be:

df.replace(np.nan, None)

This does not act "in-place". Pass inplace=True as another parameter or otherwise re-assign the output to df.

Answered By: ifly6
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.