Polars read AWS RDS DB with a table containing column of type jsonb

Question:

I’m trying to read AWS RDS DB using the following method through polars:

df_rds_table_test = pl.read_database_uri(sql_query, uri)

Postgres DB contains a table with column name ‘json_message’ of type jsonb and other columns of type String. When the table is read by polars, it treats data type of json_message column as String.

Further, i’m using DuckDB on the polars dataframe to perform SQL operations.

SQL_QUERY_SRC_JOIN = "select id,json_message.amount as amount from df_rds_table_test where id=10020240501"

src_df = duckdb.sql(SQL_QUERY_SRC_JOIN).pl()

I’m getting an exception that states

duckdb.duckdb.BinderException: Binder Error: Cannot extract field 'amount' from expression "json_message" because it is not a struct, union, or json

Not sure if there’s a way that we can cast the datatype to jsonb in polars as its not supported. I tried casting json_message to struct but i’m getting error.

Also i tried casting json_message to type JSON in duckdb query which didn’t help neither.

Sample json_message:

{
  "amount": 0,
  "c_id": null,
  "branch": "0502",
  "user_id": "U999999"}

Great if someone could please help me to access the json string in the polars dataframe using duckdb.

Answers:

Are you sure that casting to json doesn’t work? It works in this example:

tbl = duckdb.sql("""
from (values
   ('{"amount": 0,"c_id": null, "branch": "0502", "user_id": "U999999"}')
)
select col0 as json_message
""")

┌────────────────────────────────────────────────────────────────────┐
│                            json_message                            │
│                              varchar                               │
├────────────────────────────────────────────────────────────────────┤
│ {"amount": 0,"c_id": null, "branch": "0502", "user_id": "U999999"} │
└────────────────────────────────────────────────────────────────────┘

duckdb.sql("""
    select (json_message::json).amount as amount from tbl
""")

┌────────┐
│ amount │
│  json  │
├────────┤
│ 0      │
└────────┘

or with polars:

df = tbl.pl()
(
    df
    .with_columns(pl.col("json_message").str.json_decode())
    .select(pl.col("json_message").struct.field("amount"))
)

┌────────┐
│ amount │
│ ---    │
│ i64    │
╞════════╡
│ 0      │
└────────┘
Answered By: Roman Pekar