fastparquet

Can a parquet file exceed 2.1GB?

Can a parquet file exceed 2.1GB? Question: I’m having an issue storing a large dataset (around 40GB) in a single parquet file. I’m using the fastparquet library to append pandas.DataFrames to this parquet dataset file. The following is a minimal example program that appends chunks to a parquet file until it crashes as the file-size …

Total answers: 1

Write nested parquet format from Python

Write nested parquet format from Python Question: I have a flat parquet file where one varchar columns store JSON data as a string and I want to transform this data to a nested structure, i.e. the JSON data becomes nested parquet. I know the schema of the JSON in advance if this is of any …

Total answers: 1

Does any Python library support writing arrays of structs to Parquet files?

Does any Python library support writing arrays of structs to Parquet files? Question: I want to write data where some columns are arrays of strings or arrays of structs (typically key-value pairs) into a Parquet file for use in AWS Athena. After finding two Python libraries (Arrow and fastparquet) supporting writing to Parquet files I …

Total answers: 1

How to read partitioned parquet files from S3 using pyarrow in python

How to read partitioned parquet files from S3 using pyarrow in python Question: I looking for ways to read data from multiple partitioned directories from s3 using python. data_folder/serial_number=1/cur_date=20-12-2012/abcdsd0324324.snappy.parquet data_folder/serial_number=2/cur_date=27-12-2012/asdsdfsd0324324.snappy.parquet pyarrow’s ParquetDataset module has the capabilty to read from partitions. So I have tried the following code : >>> import pandas as pd >>> import …

Total answers: 5