Writing pandas dataframe to S3 bucket (AWS)

Question:

I have an AWS Lambda function which queries API and creates a dataframe, I want to write this file to an S3 bucket, I am using:

import pandas as pd
import s3fs

df.to_csv('s3.console.aws.amazon.com/s3/buckets/info/test.csv', index=False)

I am getting an error:

No such file or directory: ‘s3.console.aws.amazon.com/s3/buckets/info/test.csv’

But that directory exists, because I am reading files from there. What is the problem here?

I’ve read the previous files like this:

s3_client = boto3.client('s3')
s3_client.download_file('info', 'secrets.json', '/tmp/secrets.json')

How can I upload the whole dataframe to an S3 bucket?

Asked By: Jonas Palačionis

||

Answers:

This

“s3.console.aws.amazon.com/s3/buckets/info/test.csv”

is not a S3 URI, you need to pass a S3 URI to save to s3. Moreover, you do not need to import s3fs (you only need it installed),

Just try:

import pandas as pd

df = pd.DataFrame()
# df.to_csv("s3://<bucket_name>/<obj_key>")

# In your case
df.to_csv("s3://info/test.csv")

NOTE: You need to create bucket on aws s3 first.

Answered By: null

You can use boto3 package also for storing data to S3:

from io import StringIO  # python3 (or BytesIO for python2)
import boto3

bucket = 'info'  # already created on S3
csv_buffer = StringIO()
df.to_csv(csv_buffer)

s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())
Answered By: wowkin2

You can use AWS SDK for Pandas, a library that extends Pandas to work smoothly with AWS data stores.

import awswrangler as wr
df = wr.s3.read_csv("s3://bucket/file.csv")

The library is available in AWS Lambda with the addition of the layer called AWSSDKPandas-Python.