Sqlalchemy insert column data based on values in other columns/tables

Question:

I’m working on a project that parses json data and inserts it to tables. The data comes from multiple sources and the constants, as you can see in the screenshot, are "game_ref" and "game_datetime". My goal is to be able to take the "event_id" in the MLB table and insert it to the MGM table based on the "game_datetime" and "game_ref" values in each table. Can anyone point me toward which sqlalchemy method (core or ORM) would be best to accomplish this task?

enter image description here

Base = declarative_base()

class GamelistMLB(Base):
    __tablename__ = 'mlb_gamelist'
    
    event_id = Column(Integer, primary_key=True)
    game_ref = Column(String)
    game_datetime = Column(Integer)

    mgm_id = relationship("GamelistMGM", backref="mlb_gamelist")

    
class GamelistMGM(Base):
    __tablename__ = 'mgm_gamelist'
    
    book_id = Column(Integer, primary_key=True)
    game_ref = Column(String)
    game_datetime = Column(Integer)
    event_id = Column(Integer, ForeignKey('mlb_gamelist.event_id'))


def create_tables(engine):
    Base.metadata.create_all(engine)
Asked By: Nick

||

Answers:

Assuming you already have data in the database, the below should work for you.
If, however, you can plug in the solution at the data load phase, it might be best to do the linking immediately on load.

Basically, what you would like to achieve in SQL terms is the following:

-- update the column `event_id` on MGM from MLB
UPDATE  mgm_gamelist
SET     event_id = mlb_gamelist.event_id

FROM    mlb_gamelist
WHERE   
        mgm_gamelist.event_id IS NULL -- unless it is already set

    -- where the data row matches that of MLB
    AND mgm_gamelist.game_ref = mlb_gamelist.game_ref
    AND mgm_gamelist.game_datetime = mlb_gamelist.game_datetime

The same implementation in python:

# assign aliases to the `Table` underlying mapped classes to shorten the expression
tMGM = GamelistMGM.__table__
tMLB = GamelistMLB.__table__

statement = (
    tMGM.update()
    .where(tMGM.c.event_id.is_(None))
    .where(tMGM.c.game_ref == tMLB.c.game_ref)
    .where(tMGM.c.game_datetime == tMLB.c.game_datetime)
    .values(event_id=tMLB.c.event_id)
)
res = session.execute(statement)
print(res.rowcount)

Now, this "UPDATE FROM" might not work with all RDBMS but it does with Postgres.


Edit-1:
The case of immediately linking during the load phase is showing below:

for _sample_json_row in _sample_json_input:
    mgm = GamelistMGM(**_sample_json_row)

    # try to find MLB to link to
    mlb = (
        session.query(GamelistMLB)
        .filter(GamelistMLB.game_ref == mgm.game_ref)
        .filter(GamelistMLB.game_datetime == mgm.game_datetime)
        .first()
    )
    if mlb:
        # this will update the `event_id` automatically
        mgm.mgm_id = mlb  # NOTE: you should rename `mgm_id` to just `mlb` or `rel_mlb`

    session.add(mgm)
session.commit()
Answered By: van
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.