Having issues with my excel to csv conversion in aws lambda using the AWSDatawrangler layer

Question:

I have a function that reads an excel file into a dataframe and then I save that dataframe in an s3 bucket using the awswrangler api to_csv function. The excel file has data starting from different rows and columns.

My conversion code looks something like this:

df = wr.s3.read_excel(
              path = 's3://bucket/filepath/', 
              sheet_name = 'sheetname', 
              header = 5, 
              index_col = False).iloc[:, 3:]

wr.s3.to_csv(df, path = "s3://bucket/filepath/', 
                 dataset = True, 
                 filename_prefix = 'sheetname')

The code works properly but I am having 2 issues:

  1. The csv outputs an additional index column even though I have specified index_col = False
  2. The file csv output name comes with additional serial numbers which I dont want. I just want something like sheetname.csv. I looked at the documentation and didn’t find any good leads on how I can do that.

The current output looks something like sheetname8f1e8wefdf15f4wgfe5fef46we54f.csv

Asked By: Murtaza Mohsin

||

Answers:

You can specify the full path in the path parameter. You don’t need to use filename_prefix at all. Using filename_prefix is why you are getting those extra characters in the filename.

Also, you can disable the index in the output by setting index = False.

wr.s3.to_csv(df, path = "s3://bucket/filepath/sheetname.csv", 
                 index = False)
Answered By: Mark B