Create a Full Text Search index with SQLAlchemy on PostgreSQL

Question:

I need to create a PostgreSQL Full Text Search index in Python with SQLAlchemy. Here’s what I want in SQL:

CREATE TABLE person ( id INTEGER PRIMARY KEY, name TEXT );
CREATE INDEX person_idx ON person USING GIN (to_tsvector('simple', name));

Now how do I do the second part with SQLAlchemy when using the ORM:

class Person(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String)
Asked By: Markus Meskanen

||

Answers:

You could create index using Index in __table_args__. Also I use a function to create ts_vector to make it more tidy and reusable if more than one field is required. Something like below:

from sqlalchemy.dialects import postgresql

def create_tsvector(*args):
    exp = args[0]
    for e in args[1:]:
        exp += ' ' + e
    return func.to_tsvector('english', exp)

class Person(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String)

    __ts_vector__ = create_tsvector(
        cast(func.coalesce(name, ''), postgresql.TEXT)
    )

    __table_args__ = (
        Index(
            'idx_person_fts',
            __ts_vector__,
            postgresql_using='gin'
        )
    )

Update:
A sample query using index (corrected based on comments):

people = Person.query.filter(Person.__ts_vector__.match(expressions, postgresql_regconfig='english')).all()
Answered By: sharez

The answer from @sharez is really useful (especially if you need to concatenate columns in your index). For anyone looking to create a tsvector GIN index on a single column, you can simplify the original answer approach with something like:

from sqlalchemy import Column, Index, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func


Base = declarative_base()

class Example(Base):
    __tablename__ = 'examples'

    id = Column(Integer, primary_key=True)
    textsearch = Column(String)

    __table_args__ = (
        Index(
            'ix_examples_tsv',
            func.to_tsvector('english', textsearch),
            postgresql_using='gin'
            ),
        )

Note that the comma following Index(...) in __table_args__ is not a style choice, the value of __table_args__ must be a tuple, dictionary, or None.

If you do need to create a tsvector GIN index on multiple columns, here is another way to get there using text().

from sqlalchemy import Column, Index, Integer, String, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func


Base = declarative_base()

def to_tsvector_ix(*columns):
    s = " || ' ' || ".join(columns)
    return func.to_tsvector('english', text(s))

class Example(Base):
    __tablename__ = 'examples'

    id = Column(Integer, primary_key=True)
    atext = Column(String)
    btext = Column(String)

    __table_args__ = (
        Index(
            'ix_examples_tsv',
            to_tsvector_ix('atext', 'btext'),
            postgresql_using='gin'
            ),
        )
Answered By: benvc

It has been answered already by @sharez and @benvc. I needed to make it work with weights though. This is how I did it based on their answers :

from sqlalchemy import Column, func, Index, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.operators import op

CONFIG = 'english'

Base = declarative_base()

def create_tsvector(*args):
    field, weight = args[0]
    exp = func.setweight(func.to_tsvector(CONFIG, field), weight)
    for field, weight in args[1:]:
        exp = op(exp, '||', func.setweight(func.to_tsvector(CONFIG, field), weight))
    return exp

class Example(Base):
    __tablename__ = 'example'

    foo = Column(String)
    bar = Column(String)

    __ts_vector__ = create_tsvector(
        (foo, 'A'),
        (bar, 'B')
    )

    __table_args__ = (
        Index('my_index', __ts_vector__, postgresql_using='gin'),
    )
Answered By: Thierry G.

Thanks for this question and answers.

I’d like to add a bit more in case ppl using alembic to manage versions by
using autogenerate
which creating the index seems not be detected.

We might end up writing our own alter script which look like.

"""add fts idx

Revision ID: e3ce1ce23d7a
Revises: 079c4455d54d
Create Date: 

"""

# revision identifiers, used by Alembic.
revision = 'e3ce1ce23d7a'
down_revision = '079c4455d54d'

from alembic import op
import sqlalchemy as sa


def upgrade():
    op.create_index('idx_content_fts', 'table_name',
            [sa.text("to_tsvector('english', content)")],
            postgresql_using='gin')


def downgrade():
    op.drop_index('idx_content_fts')
Answered By: Jing

Previous answers here were helpful for pointing in the right direction.
Below, a distilled & simplified approach using ORM approach & TSVectorType helper from sqlalchemy-utils (that is quite basic and can be simply copy/pasted to avoid external dependencies if needed https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html):

Defining a TSVECTOR column (TSVectorType) in your ORM model (declarative) populated automatically from the source text field(s)

import sqlalchemy as sa
from sqlalchemy_utils.types.ts_vector import TSVectorType
# ^-- https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html


class MyModel(Base):
    __tablename__ = 'mymodel'
    id = sa.Column(sa.Integer, primary_key=True)
    content = sa.Column(sa.String, nullable=False)

    content_tsv = sa.Column(
        TSVectorType("content", regconfig="english"),
        sa.Computed("to_tsvector('english', "content")", persisted=True))
    #      ^-- equivalent for SQL:
    #   COLUMN content_tsv TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "content")) STORED;

    __table_args__ = (
        # Indexing the TSVector column
        sa.Index("idx_mymodel_content_tsv", content_tsv, postgresql_using="gin"), 
    )

For additional details on querying using ORM, see https://stackoverflow.com/a/73999486/11750716 (there is an important difference between SQLAlchemy 1.4 and SQLAlchemy 2.0).

Answered By: Jean Monet