Is there a way to convert a file path field to a parsed model in-place?
Question:
If I have two models, the second of which has a file path field referencing a file, whose contents are described by the first model. Is it possible to expand the file contents in place (replace the file path with the parsed model)?
Sample models:
from pydantic import BaseModel, FilePath
class FirstModel(BaseModel):
str_data: str
num_list: list[int | float]
class SecondModel(BaseModel):
some_other_field: str
first_model: FilePath
Sample data:
{
"str_data": "Some string data up in here",
"num_list": [1, 2, 3.14]
}
Desired result:
>>> SecondModel(some_other_field="Other field data", first_model="path/to/data.json")
SecondModel(some_other_field="Other field data", first_model=FirstModel(str_data="Some string data up in here", num_list=[1, 2, 3.14])
So initially I would like the first model field to be expressed as a file path, but then parsed and the field set to type FirstModel
. Is this possible?
I’ve tried different approaches using validators, subclassing the first model, and custom root types.
Answers:
First of all, the field type should reflect what you actually want to end up with after you parse data with your model. So the annotation for first_model
should not be FilePath
, but FirstModel
.
Then it is still possible to "normally" initialize SecondModel
by providing either a dictionary with the correct key-value-pairs to first_model
or an actual instance of FirstModel
. But you can also write a custom field validator with pre=True
that takes care of the case, when someone provides a file path instead of "valid" data.
There are a few ways to achieve this. The simplest approach that I can think of is to simply assume first that the value is valid file path that can be opened and read. If that succeeds, we can assume the contents can be directly parsed via FirstModel
. If it fails, we just return the value unchanged and let the default validators take care of the rest.
Assume we have the following data in a file called test.json
in our current working directory:
{
"str_data": "foo",
"num_list": [1, 2, 3.14]
}
Here is a working implementation:
from pathlib import Path
from pydantic import BaseModel, validator
class FirstModel(BaseModel):
str_data: str
num_list: list[float]
class SecondModel(BaseModel):
some_other_field: str
first_model: FirstModel
@validator("first_model", pre=True)
def load_json_to_first_model(cls, v: object) -> object:
try:
contents = Path(str(v)).read_text()
except (TypeError, OSError):
return v
return FirstModel.parse_raw(contents)
if __name__ == "__main__":
obj = SecondModel.parse_obj({
"some_other_field": "bar",
"first_model": "test.json",
})
print(obj)
Output:
some_other_field='bar' first_model=FirstModel(str_data='foo', num_list=[1.0, 2.0, 3.14])
If we provide an invalid path or the file cannot be opened, the error we get will simply come from the default validator telling us that first_model
is not a valid dictionary. You can customize this further in your custom validator if you want, for example by differentiating how you handle PermissionError
and FileNotFoundError
instead of catching the base OSError
.
Side note, a type union of float | int
reduces to float
in Python even though there is technically no subclass relationship. This means you can omit the int
. All values will be cast to float
then. (See the Pydantic documentation on that matter.)
If I have two models, the second of which has a file path field referencing a file, whose contents are described by the first model. Is it possible to expand the file contents in place (replace the file path with the parsed model)?
Sample models:
from pydantic import BaseModel, FilePath
class FirstModel(BaseModel):
str_data: str
num_list: list[int | float]
class SecondModel(BaseModel):
some_other_field: str
first_model: FilePath
Sample data:
{
"str_data": "Some string data up in here",
"num_list": [1, 2, 3.14]
}
Desired result:
>>> SecondModel(some_other_field="Other field data", first_model="path/to/data.json")
SecondModel(some_other_field="Other field data", first_model=FirstModel(str_data="Some string data up in here", num_list=[1, 2, 3.14])
So initially I would like the first model field to be expressed as a file path, but then parsed and the field set to type FirstModel
. Is this possible?
I’ve tried different approaches using validators, subclassing the first model, and custom root types.
First of all, the field type should reflect what you actually want to end up with after you parse data with your model. So the annotation for first_model
should not be FilePath
, but FirstModel
.
Then it is still possible to "normally" initialize SecondModel
by providing either a dictionary with the correct key-value-pairs to first_model
or an actual instance of FirstModel
. But you can also write a custom field validator with pre=True
that takes care of the case, when someone provides a file path instead of "valid" data.
There are a few ways to achieve this. The simplest approach that I can think of is to simply assume first that the value is valid file path that can be opened and read. If that succeeds, we can assume the contents can be directly parsed via FirstModel
. If it fails, we just return the value unchanged and let the default validators take care of the rest.
Assume we have the following data in a file called test.json
in our current working directory:
{
"str_data": "foo",
"num_list": [1, 2, 3.14]
}
Here is a working implementation:
from pathlib import Path
from pydantic import BaseModel, validator
class FirstModel(BaseModel):
str_data: str
num_list: list[float]
class SecondModel(BaseModel):
some_other_field: str
first_model: FirstModel
@validator("first_model", pre=True)
def load_json_to_first_model(cls, v: object) -> object:
try:
contents = Path(str(v)).read_text()
except (TypeError, OSError):
return v
return FirstModel.parse_raw(contents)
if __name__ == "__main__":
obj = SecondModel.parse_obj({
"some_other_field": "bar",
"first_model": "test.json",
})
print(obj)
Output:
some_other_field='bar' first_model=FirstModel(str_data='foo', num_list=[1.0, 2.0, 3.14])
If we provide an invalid path or the file cannot be opened, the error we get will simply come from the default validator telling us that first_model
is not a valid dictionary. You can customize this further in your custom validator if you want, for example by differentiating how you handle PermissionError
and FileNotFoundError
instead of catching the base OSError
.
Side note, a type union of float | int
reduces to float
in Python even though there is technically no subclass relationship. This means you can omit the int
. All values will be cast to float
then. (See the Pydantic documentation on that matter.)