Custom json serializer for JSON column in SQLAlchemy

Question:

I have following ORM object (simplified):

import datetime as dt

from sqlalchemy import create_engine, Integer, Column, DateTime
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Session, declarative_base

Base = declarative_base()

class Metrics(Base):
    __tablename__ = 'metrics'

    id = Column(Integer, primary_key=True)
    ts = Column(DateTime, default=dt.datetime.now())
    computed_values = Column(JSONB)
    dates = Column(JSONB)


entry = Metrics(computed_values={'foo': 12.3, 'bar':45.6},
                dates=[dt.date.today()])

engine = create_engine('postgresql://postgres:postgres@localhost:5432/my_schema')
with Session(engine, future=True) as session:
    session.add(entry)
    session.commit()

Each row has:

  • id primary key
  • ts timestamp when row was inserted
  • computed_values actual JSONB data to be stored
  • dates JSONB to store a list of dates for which the data was calculated.

While I have no issues with the computed_values column, the datetime.date objects in the list inside dates column cannot be serialized by default SQLAlchemy JSON serializer.

My thought is to redefine serializer behavior for date object for that exact column. To do that I have to either define my own custom JSON serializer, or use some ready-made one, like orjson. Since I’m likely to encounter many other JSON serialization issues on the project I’d prefer the latter.

Digging into the JSONB class and it’s superclasses, I thought that following should do the trick:

class Metrics(Base):
    __tablename__ = 'metrics'

    # ---%<--- snip ---%<---
    dates = Column(JSONB(json_serializer=lambda obj: orjson.dumps(obj, option=orjson.OPT_NAIVE_UTC)))

    # ---%<--- snip ---%<---

but it didn’t:

File "metrics.py", line 30, in Metrics
    dates = Column(JSONB(json_serializer=lambda obj: orjson.dumps(obj, option=orjson.OPT_NAIVE_UTC)))
TypeError: __init__() got an unexpected keyword argument 'json_serializer'

What am I doing wrong and how to properly define custom SQLAlchemy serializer for JSON (and JSONB) Columns?

Asked By: Eduard Sukharev

||

Answers:

Looks like you should be able to get what you want by modifying your create_engine statement.

From the docstring in SQLAlchemy:

Custom serializers and deserializers are specified at the dialect level,
that is using :func:`_sa.create_engine`.  The reason for this is that when
using psycopg2, the DBAPI only allows serializers at the per-cursor
or per-connection level.   E.g.::
    engine = create_engine("postgresql://scott:tiger@localhost/test",
                            json_serializer=my_serialize_fn,
                            json_deserializer=my_deserialize_fn
                    )

So resulting code should be as following:

import datetime as dt

import orjson

from sqlalchemy import create_engine, Integer, Column, DateTime
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Session, declarative_base

Base = declarative_base()

class Metrics(Base):
    __tablename__ = 'metrics'

    id = Column(Integer, primary_key=True)
    ts = Column(DateTime, default=dt.datetime.now())
    computed_values = Column(JSONB)
    dates = Column(JSONB)

entry = Metrics(computed_values={'foo': 12.3, 'bar':45.6},
                dates=[dt.date.today()])

def orjson_serializer(obj):
    """
        Note that `orjson.dumps()` return byte array, while sqlalchemy expects string, thus `decode()` call.
    """
    return orjson.dumps(obj, option=orjson.OPT_SERIALIZE_NUMPY | orjson.OPT_NAIVE_UTC).decode()


engine = create_engine('postgresql://postgres:postgres@localhost:5432/my_schema',
    json_serializer=orjson_serializer,
    json_deserializer=orjson.loads)
with Session(engine, future=True) as session:
    session.add(entry)
    session.commit()
Answered By: Randy