Custom json serializer for JSON column in SQLAlchemy
Question:
I have following ORM object (simplified):
import datetime as dt
from sqlalchemy import create_engine, Integer, Column, DateTime
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Session, declarative_base
Base = declarative_base()
class Metrics(Base):
__tablename__ = 'metrics'
id = Column(Integer, primary_key=True)
ts = Column(DateTime, default=dt.datetime.now())
computed_values = Column(JSONB)
dates = Column(JSONB)
entry = Metrics(computed_values={'foo': 12.3, 'bar':45.6},
dates=[dt.date.today()])
engine = create_engine('postgresql://postgres:postgres@localhost:5432/my_schema')
with Session(engine, future=True) as session:
session.add(entry)
session.commit()
Each row has:
id
primary key
ts
timestamp when row was inserted
computed_values
actual JSONB data to be stored
dates
JSONB to store a list of dates for which the data was calculated.
While I have no issues with the computed_values
column, the datetime.date
objects in the list inside dates
column cannot be serialized by default SQLAlchemy JSON serializer.
My thought is to redefine serializer behavior for date
object for that exact column. To do that I have to either define my own custom JSON serializer, or use some ready-made one, like orjson. Since I’m likely to encounter many other JSON serialization issues on the project I’d prefer the latter.
Digging into the JSONB
class and it’s superclasses, I thought that following should do the trick:
class Metrics(Base):
__tablename__ = 'metrics'
# ---%<--- snip ---%<---
dates = Column(JSONB(json_serializer=lambda obj: orjson.dumps(obj, option=orjson.OPT_NAIVE_UTC)))
# ---%<--- snip ---%<---
but it didn’t:
File "metrics.py", line 30, in Metrics
dates = Column(JSONB(json_serializer=lambda obj: orjson.dumps(obj, option=orjson.OPT_NAIVE_UTC)))
TypeError: __init__() got an unexpected keyword argument 'json_serializer'
What am I doing wrong and how to properly define custom SQLAlchemy serializer for JSON (and JSONB) Columns?
Answers:
Looks like you should be able to get what you want by modifying your create_engine
statement.
From the docstring in SQLAlchemy:
Custom serializers and deserializers are specified at the dialect level,
that is using :func:`_sa.create_engine`. The reason for this is that when
using psycopg2, the DBAPI only allows serializers at the per-cursor
or per-connection level. E.g.::
engine = create_engine("postgresql://scott:tiger@localhost/test",
json_serializer=my_serialize_fn,
json_deserializer=my_deserialize_fn
)
So resulting code should be as following:
import datetime as dt
import orjson
from sqlalchemy import create_engine, Integer, Column, DateTime
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Session, declarative_base
Base = declarative_base()
class Metrics(Base):
__tablename__ = 'metrics'
id = Column(Integer, primary_key=True)
ts = Column(DateTime, default=dt.datetime.now())
computed_values = Column(JSONB)
dates = Column(JSONB)
entry = Metrics(computed_values={'foo': 12.3, 'bar':45.6},
dates=[dt.date.today()])
def orjson_serializer(obj):
"""
Note that `orjson.dumps()` return byte array, while sqlalchemy expects string, thus `decode()` call.
"""
return orjson.dumps(obj, option=orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_NAIVE_UTC).decode()
engine = create_engine('postgresql://postgres:postgres@localhost:5432/my_schema',
json_serializer=orjson_serializer,
json_deserializer=orjson.loads)
with Session(engine, future=True) as session:
session.add(entry)
session.commit()
I have following ORM object (simplified):
import datetime as dt
from sqlalchemy import create_engine, Integer, Column, DateTime
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Session, declarative_base
Base = declarative_base()
class Metrics(Base):
__tablename__ = 'metrics'
id = Column(Integer, primary_key=True)
ts = Column(DateTime, default=dt.datetime.now())
computed_values = Column(JSONB)
dates = Column(JSONB)
entry = Metrics(computed_values={'foo': 12.3, 'bar':45.6},
dates=[dt.date.today()])
engine = create_engine('postgresql://postgres:postgres@localhost:5432/my_schema')
with Session(engine, future=True) as session:
session.add(entry)
session.commit()
Each row has:
id
primary keyts
timestamp when row was insertedcomputed_values
actual JSONB data to be storeddates
JSONB to store a list of dates for which the data was calculated.
While I have no issues with the computed_values
column, the datetime.date
objects in the list inside dates
column cannot be serialized by default SQLAlchemy JSON serializer.
My thought is to redefine serializer behavior for date
object for that exact column. To do that I have to either define my own custom JSON serializer, or use some ready-made one, like orjson. Since I’m likely to encounter many other JSON serialization issues on the project I’d prefer the latter.
Digging into the JSONB
class and it’s superclasses, I thought that following should do the trick:
class Metrics(Base):
__tablename__ = 'metrics'
# ---%<--- snip ---%<---
dates = Column(JSONB(json_serializer=lambda obj: orjson.dumps(obj, option=orjson.OPT_NAIVE_UTC)))
# ---%<--- snip ---%<---
but it didn’t:
File "metrics.py", line 30, in Metrics
dates = Column(JSONB(json_serializer=lambda obj: orjson.dumps(obj, option=orjson.OPT_NAIVE_UTC)))
TypeError: __init__() got an unexpected keyword argument 'json_serializer'
What am I doing wrong and how to properly define custom SQLAlchemy serializer for JSON (and JSONB) Columns?
Looks like you should be able to get what you want by modifying your create_engine
statement.
From the docstring in SQLAlchemy:
Custom serializers and deserializers are specified at the dialect level,
that is using :func:`_sa.create_engine`. The reason for this is that when
using psycopg2, the DBAPI only allows serializers at the per-cursor
or per-connection level. E.g.::
engine = create_engine("postgresql://scott:tiger@localhost/test",
json_serializer=my_serialize_fn,
json_deserializer=my_deserialize_fn
)
So resulting code should be as following:
import datetime as dt
import orjson
from sqlalchemy import create_engine, Integer, Column, DateTime
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Session, declarative_base
Base = declarative_base()
class Metrics(Base):
__tablename__ = 'metrics'
id = Column(Integer, primary_key=True)
ts = Column(DateTime, default=dt.datetime.now())
computed_values = Column(JSONB)
dates = Column(JSONB)
entry = Metrics(computed_values={'foo': 12.3, 'bar':45.6},
dates=[dt.date.today()])
def orjson_serializer(obj):
"""
Note that `orjson.dumps()` return byte array, while sqlalchemy expects string, thus `decode()` call.
"""
return orjson.dumps(obj, option=orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_NAIVE_UTC).decode()
engine = create_engine('postgresql://postgres:postgres@localhost:5432/my_schema',
json_serializer=orjson_serializer,
json_deserializer=orjson.loads)
with Session(engine, future=True) as session:
session.add(entry)
session.commit()