Can not read csv with pandas in azure functions with python

Question:

I have created an Azure Blob Storage Trigger in Azure function in python.
A CSV file adds in blob storage and I try to read it with pandas.

import logging
import pandas as pd

import azure.functions as func


def main(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob n"
                 f"Name: {myblob.name}n"
                 f"Blob Size: {myblob.length} bytes")

    df_new = pd.read_csv(myblob)
    print(df_new.head())

If I pass myblob to pd.read_csv, then I get UnsupportedOperation: read1

Python blob trigger function processed blob 
Name: samples-workitems/Data_26112022_080027.csv
Blob Size: None bytes
[2022-11-27T16:19:25.650Z] Executed 'Functions.BlobTrigger1' (Failed, Id=2df388f5-a8dc-4554-80fa-f809cfaeedfe, Duration=1472ms)
[2022-11-27T16:19:25.655Z] System.Private.CoreLib: Exception while executing function: Functions.BlobTrigger1. System.Private.CoreLib: Result: Failure
Exception: UnsupportedOperation: read1

If I pass myblob.read(),

df_new = pd.read_csv(myblob.read())

it gives TypeError: Expected file path name or file-like object, got <class 'bytes'> type

Python blob trigger function processed blob 
Name: samples-workitems/Data_26112022_080027.csv
Blob Size: None bytes
[2022-11-27T16:09:56.513Z] Executed 'Functions.BlobTrigger1' (Failed, Id=e3825c28-7538-4e30-bad2-2526f9811697, Duration=1468ms)
[2022-11-27T16:09:56.518Z] System.Private.CoreLib: Exception while executing function: Functions.BlobTrigger1. System.Private.CoreLib: Result: Failure
Exception: TypeError: Expected file path name or file-like object, got <class 'bytes'> type

From Azure functions Docs:

InputStream is File-like object representing an input blob.

From Pandas read_csv Docs:

read_csv takes filepath_or_bufferstr, path object or file-like object

So technically I should read this object. What piece of puzzle am I missing here?

Asked By: Shaida Muhammad

||

Answers:

If you refer to this article, it says that this piece of code will work. But this is recommended for smaller files as the entire files goes into memory. Not recommended to be used for larger files.

import logging
import pandas as pd

import azure.functions as func
from io import BytesIO

def main(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob n"
                 f"Name: {myblob.name}n"
                 f"Blob Size: {myblob.length} bytes")
    df_new = pd.read_csv(BytesIO(myblob.read()))
    print(df_new.head())
Answered By: Anupam Chand
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.