How to upload a pandas dataframe as a csv stream without saving on disk?
Question:
I want to upload a pandas dataframe to a server as csv file without saving it on the disk. Is there a way to create a more or less "fake csv" file which pretends to be a real file?
Here is some example code:
First I get my data from a sql query and store it as a dataframe.
In the upload_ga_data
function I want to have something with this logic:
media = MediaFileUpload('df',
mimetype='application/octet-stream',
resumable=False)
Full example:
from __future__ import print_function
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.errors import HttpError
from apiclient.http import MediaFileUpload
import pymysql
import pandas as pd
con = x
ga_query = """
SELECT XXXXX
"""
df = pd.read_sql_query(ga_query,con)
df.to_csv('ga_export.csv', sep=',', encoding='utf-8', index = False)
def upload_ga_data():
try:
media = MediaFileUpload('ga_export.csv',
mimetype='application/octet-stream',
resumable=False)
daily_upload = service.management().uploads().uploadData(
accountId=accountId,
webPropertyId=webPropertyId,
customDataSourceId=customDataSourceId,
media_body=media).execute()
print ("Upload was successfull")
except TypeError as error:
# Handle errors in constructing a query.
print ('There was an error in constructing your query : %s' % error)
Answers:
The required behavior is possible using stream:
to create a more or less “fake csv” file which pretends to be a real file
Python makes File Descriptor (with open
) and Stream (with io.StringIO
) behave similarly. Then anywhere you can use a file descriptor can also use a String Stream.
The easiest way to create a text stream is with open(), optionally
specifying an encoding:
f = open("myfile.txt", "r", encoding="utf-8")
In-memory text streams are also available as StringIO objects:
f = io.StringIO("some initial text data")
The text stream API is described in detail in the documentation of
TextIOBase.
In Pandas you can do it with any function having path_or_buf
argument in its signature, such as to_csv
:
DataFrame.to_csv(
path_or_buf
=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.')
Following code exports a dummy DataFrame in CSV format into a String Stream (not physical file, in-memory octet-stream):
import io
import pandas as pd
df = pd.DataFrame(list(range(10)))
stream = io.StringIO()
df.to_csv(stream, sep=";")
When you want to get access to the stream content, just issue:
>>> stream.getvalue()
';0n0;0n1;1n2;2n3;3n4;4n5;5n6;6n7;7n8;8n9;9n'
It returns the content without having the need to use a real file.
Though the other answer is an excellent start, there may be some who are confused on how to complete op’s whole task. Here is a way to go from writing a dataframe to a stream to preparing that stream for upload using Google apiclient.http module. A key difference from op’s attempt is that I pass the stream itself to a MediaIOBaseUpload instead of a MediaFileUpload. The file is assumed to be utf-8 like OP’s file. This runs fine for me until the media is being uploaded, then I have an error " self._fp.write(s.encode(‘ascii’, ‘surrogateescape’))
UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘xe9’ in position 2313: ordinal not in range(128)"
import io
import pandas as pd
from googleapiclient.errors import HttpError
from apiclient.http import MediaIOBaseUpload # Changed this from MediaFileUpload
df = pd.DataFrame(list(range(10)))
stream = io.StringIO()
# writing df to the stream instead of a file:
df.to_csv(stream, sep=',', encoding='utf-8', index = False)
try:
media = MediaIOBaseUpload(stream,
mimetype='application/octet-stream',
resumable=False)
#### Your upload logic here using media just created ####
except HttpError as error:
#### Handle your errors in uploading here ####
Because I have a unicode character, I developed the alternative code which accomplishes the same thing but can handle the unicode characters.
import io
import pandas as pd
from googleapiclient.errors import HttpError
from apiclient.http import MediaIOBaseUpload # Changed this from MediaFileUpload
df = pd.DataFrame(list(range(10)))
records = df.to_csv(line_terminator='rn', index=False).encode('utf-8')
bytes = io.BytesIO(records)
try:
media = MediaIOBaseUpload(bytes,
mimetype='application/octet-stream',
resumable=False)
#### Your upload logic here using media just created ####
except HttpError as error:
#### Handle your errors in uploading here ####
I used:
from googleapiclient.http import MediaIoBaseUpload
versus @Katherine’s:
from apiclient.http import MediaIOBaseUpload
But other than that, @Katherine’s alternative solution worked perfectly for me as I was developing a solution to write a dataframe to a csv file in Google Drive running from a Google Cloud Function.
I want to upload a pandas dataframe to a server as csv file without saving it on the disk. Is there a way to create a more or less "fake csv" file which pretends to be a real file?
Here is some example code:
First I get my data from a sql query and store it as a dataframe.
In the upload_ga_data
function I want to have something with this logic:
media = MediaFileUpload('df',
mimetype='application/octet-stream',
resumable=False)
Full example:
from __future__ import print_function
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.errors import HttpError
from apiclient.http import MediaFileUpload
import pymysql
import pandas as pd
con = x
ga_query = """
SELECT XXXXX
"""
df = pd.read_sql_query(ga_query,con)
df.to_csv('ga_export.csv', sep=',', encoding='utf-8', index = False)
def upload_ga_data():
try:
media = MediaFileUpload('ga_export.csv',
mimetype='application/octet-stream',
resumable=False)
daily_upload = service.management().uploads().uploadData(
accountId=accountId,
webPropertyId=webPropertyId,
customDataSourceId=customDataSourceId,
media_body=media).execute()
print ("Upload was successfull")
except TypeError as error:
# Handle errors in constructing a query.
print ('There was an error in constructing your query : %s' % error)
The required behavior is possible using stream:
to create a more or less “fake csv” file which pretends to be a real file
Python makes File Descriptor (with open
) and Stream (with io.StringIO
) behave similarly. Then anywhere you can use a file descriptor can also use a String Stream.
The easiest way to create a text stream is with open(), optionally
specifying an encoding:f = open("myfile.txt", "r", encoding="utf-8")
In-memory text streams are also available as StringIO objects:
f = io.StringIO("some initial text data")
The text stream API is described in detail in the documentation of
TextIOBase.
In Pandas you can do it with any function having path_or_buf
argument in its signature, such as to_csv
:
DataFrame.to_csv(
path_or_buf
=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.')
Following code exports a dummy DataFrame in CSV format into a String Stream (not physical file, in-memory octet-stream):
import io
import pandas as pd
df = pd.DataFrame(list(range(10)))
stream = io.StringIO()
df.to_csv(stream, sep=";")
When you want to get access to the stream content, just issue:
>>> stream.getvalue()
';0n0;0n1;1n2;2n3;3n4;4n5;5n6;6n7;7n8;8n9;9n'
It returns the content without having the need to use a real file.
Though the other answer is an excellent start, there may be some who are confused on how to complete op’s whole task. Here is a way to go from writing a dataframe to a stream to preparing that stream for upload using Google apiclient.http module. A key difference from op’s attempt is that I pass the stream itself to a MediaIOBaseUpload instead of a MediaFileUpload. The file is assumed to be utf-8 like OP’s file. This runs fine for me until the media is being uploaded, then I have an error " self._fp.write(s.encode(‘ascii’, ‘surrogateescape’))
UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘xe9’ in position 2313: ordinal not in range(128)"
import io
import pandas as pd
from googleapiclient.errors import HttpError
from apiclient.http import MediaIOBaseUpload # Changed this from MediaFileUpload
df = pd.DataFrame(list(range(10)))
stream = io.StringIO()
# writing df to the stream instead of a file:
df.to_csv(stream, sep=',', encoding='utf-8', index = False)
try:
media = MediaIOBaseUpload(stream,
mimetype='application/octet-stream',
resumable=False)
#### Your upload logic here using media just created ####
except HttpError as error:
#### Handle your errors in uploading here ####
Because I have a unicode character, I developed the alternative code which accomplishes the same thing but can handle the unicode characters.
import io
import pandas as pd
from googleapiclient.errors import HttpError
from apiclient.http import MediaIOBaseUpload # Changed this from MediaFileUpload
df = pd.DataFrame(list(range(10)))
records = df.to_csv(line_terminator='rn', index=False).encode('utf-8')
bytes = io.BytesIO(records)
try:
media = MediaIOBaseUpload(bytes,
mimetype='application/octet-stream',
resumable=False)
#### Your upload logic here using media just created ####
except HttpError as error:
#### Handle your errors in uploading here ####
I used:
from googleapiclient.http import MediaIoBaseUpload
versus @Katherine’s:
from apiclient.http import MediaIOBaseUpload
But other than that, @Katherine’s alternative solution worked perfectly for me as I was developing a solution to write a dataframe to a csv file in Google Drive running from a Google Cloud Function.