Comparing parquet file schema to db schema in python (including decimal precisions)

Question:

If I have a parquet file with columns that have, for example, types Decimal(38, 22) or Decimal(20, 4), is there a way to compare them to the existing schema in database in python (for example check if Decimal(38, 22) corresponds to the same column that has type numeric(38.22) in db)? As far as I understand, pyarrow and python in general reads Decimal values as double. Is there a way to read the file and represent such values in Decimal, and compare it to db schema, including the precision and scale?

Asked By: ekm0d

||

Answers:

You can use pyarrow to inspect the schema of a parquet file and find out what each decimal field precision and scale are:

import pyarrow as pa
import pyarrow.parquet as pq

parquet_file = pq.ParquetFile("table.parquet")
for field in parquet_file.schema_arrow:
    if pa.types.is_decimal(field.type):
        print(field.name, field.type.scale, field.type.precision)
Answered By: 0x26res
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.