Is there a way to convert a file path field to a parsed model in-place?

Question:

If I have two models, the second of which has a file path field referencing a file, whose contents are described by the first model. Is it possible to expand the file contents in place (replace the file path with the parsed model)?

Sample models:

from pydantic import BaseModel, FilePath


class FirstModel(BaseModel):
    str_data: str
    num_list: list[int | float]


class SecondModel(BaseModel):
    some_other_field: str
    first_model: FilePath

Sample data:

{
  "str_data": "Some string data up in here",
  "num_list": [1, 2, 3.14]
}

Desired result:

>>> SecondModel(some_other_field="Other field data", first_model="path/to/data.json")
SecondModel(some_other_field="Other field data", first_model=FirstModel(str_data="Some string data up in here", num_list=[1, 2, 3.14])

So initially I would like the first model field to be expressed as a file path, but then parsed and the field set to type FirstModel. Is this possible?

I’ve tried different approaches using validators, subclassing the first model, and custom root types.

Asked By: howardj99

||

Answers:

First of all, the field type should reflect what you actually want to end up with after you parse data with your model. So the annotation for first_model should not be FilePath, but FirstModel.

Then it is still possible to "normally" initialize SecondModel by providing either a dictionary with the correct key-value-pairs to first_model or an actual instance of FirstModel. But you can also write a custom field validator with pre=True that takes care of the case, when someone provides a file path instead of "valid" data.

There are a few ways to achieve this. The simplest approach that I can think of is to simply assume first that the value is valid file path that can be opened and read. If that succeeds, we can assume the contents can be directly parsed via FirstModel. If it fails, we just return the value unchanged and let the default validators take care of the rest.

Assume we have the following data in a file called test.json in our current working directory:

{
  "str_data": "foo",
  "num_list": [1, 2, 3.14]
}

Here is a working implementation:

from pathlib import Path

from pydantic import BaseModel, validator


class FirstModel(BaseModel):
    str_data: str
    num_list: list[float]


class SecondModel(BaseModel):
    some_other_field: str
    first_model: FirstModel

    @validator("first_model", pre=True)
    def load_json_to_first_model(cls, v: object) -> object:
        try:
            contents = Path(str(v)).read_text()
        except (TypeError, OSError):
            return v
        return FirstModel.parse_raw(contents)


if __name__ == "__main__":
    obj = SecondModel.parse_obj({
        "some_other_field": "bar",
        "first_model": "test.json",
    })
    print(obj)

Output:

some_other_field='bar' first_model=FirstModel(str_data='foo', num_list=[1.0, 2.0, 3.14])

If we provide an invalid path or the file cannot be opened, the error we get will simply come from the default validator telling us that first_model is not a valid dictionary. You can customize this further in your custom validator if you want, for example by differentiating how you handle PermissionError and FileNotFoundError instead of catching the base OSError.

Side note, a type union of float | int reduces to float in Python even though there is technically no subclass relationship. This means you can omit the int. All values will be cast to float then. (See the Pydantic documentation on that matter.)

Answered By: Daniil Fajnberg
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.