Sqlalchemy insert column data based on values in other columns/tables
Question:
I’m working on a project that parses json data and inserts it to tables. The data comes from multiple sources and the constants, as you can see in the screenshot, are "game_ref" and "game_datetime". My goal is to be able to take the "event_id" in the MLB table and insert it to the MGM table based on the "game_datetime" and "game_ref" values in each table. Can anyone point me toward which sqlalchemy method (core or ORM) would be best to accomplish this task?
Base = declarative_base()
class GamelistMLB(Base):
__tablename__ = 'mlb_gamelist'
event_id = Column(Integer, primary_key=True)
game_ref = Column(String)
game_datetime = Column(Integer)
mgm_id = relationship("GamelistMGM", backref="mlb_gamelist")
class GamelistMGM(Base):
__tablename__ = 'mgm_gamelist'
book_id = Column(Integer, primary_key=True)
game_ref = Column(String)
game_datetime = Column(Integer)
event_id = Column(Integer, ForeignKey('mlb_gamelist.event_id'))
def create_tables(engine):
Base.metadata.create_all(engine)
Answers:
Assuming you already have data in the database, the below should work for you.
If, however, you can plug in the solution at the data load phase, it might be best to do the linking immediately on load.
Basically, what you would like to achieve in SQL
terms is the following:
-- update the column `event_id` on MGM from MLB
UPDATE mgm_gamelist
SET event_id = mlb_gamelist.event_id
FROM mlb_gamelist
WHERE
mgm_gamelist.event_id IS NULL -- unless it is already set
-- where the data row matches that of MLB
AND mgm_gamelist.game_ref = mlb_gamelist.game_ref
AND mgm_gamelist.game_datetime = mlb_gamelist.game_datetime
The same implementation in python:
# assign aliases to the `Table` underlying mapped classes to shorten the expression
tMGM = GamelistMGM.__table__
tMLB = GamelistMLB.__table__
statement = (
tMGM.update()
.where(tMGM.c.event_id.is_(None))
.where(tMGM.c.game_ref == tMLB.c.game_ref)
.where(tMGM.c.game_datetime == tMLB.c.game_datetime)
.values(event_id=tMLB.c.event_id)
)
res = session.execute(statement)
print(res.rowcount)
Now, this "UPDATE FROM" might not work with all RDBMS but it does with Postgres.
Edit-1:
The case of immediately linking during the load phase is showing below:
for _sample_json_row in _sample_json_input:
mgm = GamelistMGM(**_sample_json_row)
# try to find MLB to link to
mlb = (
session.query(GamelistMLB)
.filter(GamelistMLB.game_ref == mgm.game_ref)
.filter(GamelistMLB.game_datetime == mgm.game_datetime)
.first()
)
if mlb:
# this will update the `event_id` automatically
mgm.mgm_id = mlb # NOTE: you should rename `mgm_id` to just `mlb` or `rel_mlb`
session.add(mgm)
session.commit()
I’m working on a project that parses json data and inserts it to tables. The data comes from multiple sources and the constants, as you can see in the screenshot, are "game_ref" and "game_datetime". My goal is to be able to take the "event_id" in the MLB table and insert it to the MGM table based on the "game_datetime" and "game_ref" values in each table. Can anyone point me toward which sqlalchemy method (core or ORM) would be best to accomplish this task?
Base = declarative_base()
class GamelistMLB(Base):
__tablename__ = 'mlb_gamelist'
event_id = Column(Integer, primary_key=True)
game_ref = Column(String)
game_datetime = Column(Integer)
mgm_id = relationship("GamelistMGM", backref="mlb_gamelist")
class GamelistMGM(Base):
__tablename__ = 'mgm_gamelist'
book_id = Column(Integer, primary_key=True)
game_ref = Column(String)
game_datetime = Column(Integer)
event_id = Column(Integer, ForeignKey('mlb_gamelist.event_id'))
def create_tables(engine):
Base.metadata.create_all(engine)
Assuming you already have data in the database, the below should work for you.
If, however, you can plug in the solution at the data load phase, it might be best to do the linking immediately on load.
Basically, what you would like to achieve in SQL
terms is the following:
-- update the column `event_id` on MGM from MLB
UPDATE mgm_gamelist
SET event_id = mlb_gamelist.event_id
FROM mlb_gamelist
WHERE
mgm_gamelist.event_id IS NULL -- unless it is already set
-- where the data row matches that of MLB
AND mgm_gamelist.game_ref = mlb_gamelist.game_ref
AND mgm_gamelist.game_datetime = mlb_gamelist.game_datetime
The same implementation in python:
# assign aliases to the `Table` underlying mapped classes to shorten the expression
tMGM = GamelistMGM.__table__
tMLB = GamelistMLB.__table__
statement = (
tMGM.update()
.where(tMGM.c.event_id.is_(None))
.where(tMGM.c.game_ref == tMLB.c.game_ref)
.where(tMGM.c.game_datetime == tMLB.c.game_datetime)
.values(event_id=tMLB.c.event_id)
)
res = session.execute(statement)
print(res.rowcount)
Now, this "UPDATE FROM" might not work with all RDBMS but it does with Postgres.
Edit-1:
The case of immediately linking during the load phase is showing below:
for _sample_json_row in _sample_json_input:
mgm = GamelistMGM(**_sample_json_row)
# try to find MLB to link to
mlb = (
session.query(GamelistMLB)
.filter(GamelistMLB.game_ref == mgm.game_ref)
.filter(GamelistMLB.game_datetime == mgm.game_datetime)
.first()
)
if mlb:
# this will update the `event_id` automatically
mgm.mgm_id = mlb # NOTE: you should rename `mgm_id` to just `mlb` or `rel_mlb`
session.add(mgm)
session.commit()