Can tqdm be used with Database Reads?

Question:

While reading large relations from a SQL database to a pandas dataframe, it would be nice to have a progress bar, because the number of tuples is known statically and the I/O rate could be estimated. It looks like the tqdm module has a function tqdm_pandas which will report progress on mapping functions over columns, but by default calling it does not have the effect of reporting progress on I/O like this. Is it possible to use tqdm to make a progress bar on a call to pd.read_sql?

Asked By: seewalker

||

Answers:

Edit: Answer is misleading – chunksize has no effect on database side of the operation. See comments below.

You could use the chunksize parameter to do something like this:

chunks = pd.read_sql('SELECT * FROM table', con=conn, chunksize=100)

df = pd.DataFrame()
for chunk in tqdm(chunks):
    df = pd.concat([df, chunk])

I think this would use less memory as well.

Answered By: Alex

yes! you can!

expanding the answer here, and Alex answer, to include tqdm, we get:

# get total number or rows
q = f"SELECT COUNT(*) FROM table"
total_rows = pd.read_sql_query(q, conn).values[0, 0]
# note that COUNT implementation should not download the whole table. 
# some engine will prefer you to use SELECT MAX(ROWID) or whatever...

# read table with tqdm status bar
q = f"SELECT * FROM table"
rows_in_chunk = 1_000
chunks = pd.read_sql_query(q, conn, chunksize=rows_in_chunk)
df = tqdm(chunks, total=total_rows/rows_in_chunk)
df = pd.concat(df)

output example:

39%|███▉      | 99/254.787 [01:40<02:09,  1.20it/s]
Answered By: lisrael1
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.