Is it possible to get the contents of an S3 file without downloading it using boto3?
Question:
I am working on a process to dump files from a Redshift
database, and would prefer not to have to locally download the files to process the data. I saw that Java
has a StreamingObject
class that does what I want, but I haven’t seen anything similar in boto3
.
Answers:
If you have a mybucket
S3 bucket, which contains a beer
key, here is how to download and fetch the value without storing it in a local file:
import boto3
s3 = boto3.resource('s3')
print s3.Object('mybucket', 'beer').get()['Body'].read()
This may or may not be relevant to what you want to do, but for my situation one thing that worked well was using tempfile:
import tempfile
import boto3
bucket_name = '[BUCKET_NAME]'
key_name = '[OBJECT_KEY_NAME]'
s3 = boto3.resource('s3')
temp = tempfile.NamedTemporaryFile()
s3.Bucket(bucket_name).download_file(key_name, temp.name)
# do what you will with your file...
temp.close()
I use that solution, actually:
import boto3
s3_client = boto3.client('s3')
def get_content_from_s3(bucket: str, key: str) -> str:
"""Save s3 content locally
param: bucket, s3 bucket
param: key, path to the file, f.i. folder/subfolder/file.txt
"""
s3_file = s3_client.get_ojct(Bucket=bucket, Key=key)['Body'].read()
return s3_file.decode('utf-8').strip()
smart_open
is a Python 3 library for efficient streaming of very large files from/to storages such as S3, GCS, Azure Blob Storage, HDFS, WebHDFS, HTTP, HTTPS, SFTP, or local filesystem.
https://pypi.org/project/smart-open/
import boto3
import smart_open
client = boto3.client(service_name='s3',
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_KEY,
)
url = 's3://.............'
fin = smart_open.open(url, 'r', transport_params={'client':client})
for line in fin:
data = json.loads(line)
print(data)
fin.close()
I am working on a process to dump files from a Redshift
database, and would prefer not to have to locally download the files to process the data. I saw that Java
has a StreamingObject
class that does what I want, but I haven’t seen anything similar in boto3
.
If you have a mybucket
S3 bucket, which contains a beer
key, here is how to download and fetch the value without storing it in a local file:
import boto3
s3 = boto3.resource('s3')
print s3.Object('mybucket', 'beer').get()['Body'].read()
This may or may not be relevant to what you want to do, but for my situation one thing that worked well was using tempfile:
import tempfile
import boto3
bucket_name = '[BUCKET_NAME]'
key_name = '[OBJECT_KEY_NAME]'
s3 = boto3.resource('s3')
temp = tempfile.NamedTemporaryFile()
s3.Bucket(bucket_name).download_file(key_name, temp.name)
# do what you will with your file...
temp.close()
I use that solution, actually:
import boto3
s3_client = boto3.client('s3')
def get_content_from_s3(bucket: str, key: str) -> str:
"""Save s3 content locally
param: bucket, s3 bucket
param: key, path to the file, f.i. folder/subfolder/file.txt
"""
s3_file = s3_client.get_ojct(Bucket=bucket, Key=key)['Body'].read()
return s3_file.decode('utf-8').strip()
smart_open
is a Python 3 library for efficient streaming of very large files from/to storages such as S3, GCS, Azure Blob Storage, HDFS, WebHDFS, HTTP, HTTPS, SFTP, or local filesystem.
https://pypi.org/project/smart-open/
import boto3
import smart_open
client = boto3.client(service_name='s3',
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_KEY,
)
url = 's3://.............'
fin = smart_open.open(url, 'r', transport_params={'client':client})
for line in fin:
data = json.loads(line)
print(data)
fin.close()