Writing directly into binary mode with csv.writer
Question:
I currently use the following code to create a binary file that I then directly upload into AWS S3. Now I was told it’s possible to write with the csv.writer
directly into the binary mode and avoid the extra step with io.StringIO()
. How does that work?
buffer = io.StringIO()
writer = csv.writer(buffer)
writer.writerow(["a", "b", "c"])
buffer_2 = io.BytesIO(buffer.getvalue().encode())
BUCKET_NAME = 'fbprophet'
OBJECT_NAME = 'blah.csv'
s3.upload_fileobj(buffer_2, BUCKET_NAME, OBJECT_NAME)
Answers:
What you’ve got there looks reasonable to me. The post you link to talks about writing to files, not in-memory streams. A file can be opened in either text or binary mode, which determines whether it operates on strings (str
) or raw bytes (bytes
). But the in-memory file-like objects from the io
package aren’t as flexible: you have StringIO
for strings, and BytesIO
for bytes.
Because csv
requires a text stream (strings), and boto requires a binary stream (bytes), a conversion step is necessary.
I would recommend to pass the actual encoding to the encode()
function though, to avoid falling back to Python’s system-dependent default:
buffer_2 = io.BytesIO(buff.getvalue().encode('utf-8'))
I currently use the following code to create a binary file that I then directly upload into AWS S3. Now I was told it’s possible to write with the csv.writer
directly into the binary mode and avoid the extra step with io.StringIO()
. How does that work?
buffer = io.StringIO()
writer = csv.writer(buffer)
writer.writerow(["a", "b", "c"])
buffer_2 = io.BytesIO(buffer.getvalue().encode())
BUCKET_NAME = 'fbprophet'
OBJECT_NAME = 'blah.csv'
s3.upload_fileobj(buffer_2, BUCKET_NAME, OBJECT_NAME)
What you’ve got there looks reasonable to me. The post you link to talks about writing to files, not in-memory streams. A file can be opened in either text or binary mode, which determines whether it operates on strings (str
) or raw bytes (bytes
). But the in-memory file-like objects from the io
package aren’t as flexible: you have StringIO
for strings, and BytesIO
for bytes.
Because csv
requires a text stream (strings), and boto requires a binary stream (bytes), a conversion step is necessary.
I would recommend to pass the actual encoding to the encode()
function though, to avoid falling back to Python’s system-dependent default:
buffer_2 = io.BytesIO(buff.getvalue().encode('utf-8'))