Pydantic error when reading data from JSON

Question:

I am creating a code which loads the data of a .json file and loads the data using pydantic.

Here is the Python code:

import json
import pydantic
from typing import Optional, List

class Car(pydantic.BaseModel):
    manufacturer: str
    model: str
    date_of_manufacture: str
    date_of_sale: str
    number_plate: str
    price: float
    type_of_fuel: Optional[str]
    location_of_sale: Optional[str]

    

def load_data() -> None:
    
    with open("./data.json") as file:
        data = json.load(file)
        cars: List[Car] = [Car(**item) for item in data]
        print(cars[0])
    
if __name__ == "__main__":
    load_data()

And here is the JSON data:

[
    {
        "manufacturer": "BMW",
        "model": "i8",
        "date_of_manufacture": "14/06/2021",
        "date_of_sale": "19/11/2022",
        "number_plate": "ND21WHP",
        "price": "100,000",
        "type_of_fuel": "electric",
        "location_of_sale": "Leicester, England"
    },
    {
        "manufacturer": "Audi",
        "model": "TT RS",
        "date_of_manufacture": "22/02/2019",
        "date_of_sale": "12/08/2021",
        "number_plate": "LR69FOW",
        "price": "67,000",
        "type_of_fuel": "petrol",
        "location_of_sale": "Manchester, England"
    }
]

And this is the error I am getting:

File "pydanticmain.py", line 342, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 1 validation error for Car price value is not a valid float (type=type_error.float)

I have tried adding ‘.00‘ to the end of the price strings but I get the same error.

Asked By: TaranJS

||

Answers:

You need to remove the quotes around the numbers since they are being interpreted as strings.

"price": "100,000" should be:
"price": 100000

Answered By: Crimp City

You could also change the decimal comma , to a _ and keep the string.

Pydantic is taking care of the str to float conversion then.

Answered By: Paul

The problem comes from the fact that the default Pydantic validator for float simply tries to coerce the string value to float (as @Paul mentioned). And float("100,000") leads to a ValueError.

I am surprised no one suggested this, but if you don’t control the source JSON data, you can easily solve this issue by writing your own little validator to properly format the string (or parse the number properly yourself):

from pydantic import BaseModel, validator

class Car(BaseModel):
    manufacturer: str
    model: str
    date_of_manufacture: str
    date_of_sale: str
    number_plate: str
    price: float
    type_of_fuel: Optional[str]
    location_of_sale: Optional[str]

    @validator("price", pre=True)
    def adjust_number_format(cls, v: object) -> object:
        if isinstance(v, str):
            return v.replace(",", "")
        return v

The pre=True is important to make the adjustment before the default field validator receives the value. I purposefully did it like this to show that you don’t need to convert the str to a float yourself, but you could of course do that too:

...
    @validator("price", pre=True)
    def parse_number(cls, v: object) -> object:
        if isinstance(v, str):
            return float(v.replace(",", ""))
        return v

Both of these work and require no changes in the JSON document.


Finally, if you have (or anticipate to have in the future) multiple number-like fields and know that all of them may cause such problems with weirdly formatted strings, you could generalize that validator like this: (different class for demo pruposes)

from pydantic import BaseModel, validator
from pydantic.fields import ModelField


class Car2(BaseModel):
    model: str
    price: float
    year: int
    numbers: list[float]

    @validator("*", pre=True, each_item=True)
    def format_number_string(cls, v: object, field: ModelField) -> object:
        if issubclass(field.type_, (float, int)) and isinstance(v, str):
            return v.replace(",", "")
        return v


if __name__ == "__main__":
    car = Car2.parse_obj({
        "model": "foo",
        "price": "100,000",
        "year": "2,010",
        "numbers": ["1", "3.14", "10,000"]
    })
    print(car)  # model='foo' price=100000.0 year=2010 numbers=[1.0, 3.14, 10000.0]
Answered By: Daniil Fajnberg
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.