Pyarrow slice pushdown for Azure data lake
Pyarrow slice pushdown for Azure data lake Question: I want to access Parquet files on an Azure data lake, and only retrieve some rows. Here is a reproducible example, using a public dataset: import pyarrow.dataset as ds from adlfs import AzureBlobFileSystem abfs_public = AzureBlobFileSystem( account_name="azureopendatastorage") dataset_public = ds.dataset(‘az://nyctlc/yellow/puYear=2010/puMonth=1/part-00000-tid-8898858832658823408-a1de80bd-eed3-4d11-b9d4-fa74bfbd47bc-426339-18.c000.snappy.parquet’, filesystem=abfs_public) The processing time is the same …