Python variables in Jupyter SQL cells

Question:

I have a dataframe that needs to be joined with the result set from a query. The query uses a column from a dataframe to filter the data in the database.

data_list = list(df['needed_column'])

I would like to use the variable in an sql query executed in a Jupyter sql cell.

%%sql
SELECT
    column_1,
    column_2,
    column_3
FROM my_database.my_table
WHERE
    column_1 IN data_list

Is there anyway that this can be done?

Asked By: drake10k

||

Answers:

An workaround would be to execute the query inline as a variable.

data_list = str(list(df['needed_column']).replace('[', '(').replace(']', ')')

query_string = f"""
SELECT
    column_1,
    column_2,
    column_3
FROM my_database.my_table
WHERE
    column_1 IN {data_list}
"""

result_set = %sql $query_string
Answered By: drake10k

You can do it via Jupysql, there’s a simple example on getting and using local variables here.

So basically:

dynamic_limit = 5
dynamic_column = "island, sex"
%sql SELECT {{dynamic_column}} FROM penguins.csv LIMIT {{dynamic_limit}}
Answered By: Ido Michael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.