How to store Dataframe data to Firebase Storage?

Question:

Given a pandas Dataframe which contains some data, what is the best to store this data to Firebase?

Should I convert the Dataframe to a local file (e.g. .csv, .txt) and then upload it on Firebase Storage, or is it also possible to directly store the pandas Dataframe without conversion? Or are there better best practices?

Update 01/03 – So far I’ve come with this solution, which requires writing a csv file locally, then reading it in and uploading it and then deleting the local file. I doubt however that this is the most efficient method, thus I would like to know if it can be done better and quicker?

import os
import firebase_admin
from firebase_admin import db, storage

cred   = firebase_admin.credentials.Certificate(cert_json)
app    = firebase_admin.initialize_app(cred, config)
bucket = storage.bucket(app=app)

def upload_df(df, data_id):
    """
    Upload a Dataframe as a csv to Firebase Storage
    :return: storage_ref
    """

    # Storage location + extension
    storage_ref = data_id + ".csv"

    # Store locally
    df.to_csv(data_id)

    # Upload to Firebase Storage
    blob    = bucket.blob(storage_ref)
    with open(data_id,'rb') as local_file:
        blob.upload_from_file(local_file)

    # Delete locally
    os.remove(data_id)

    return storage_ref
Asked By: JohnAndrews

||

Answers:

With python-firebase and to_dict:

postdata = my_df.to_dict()

# Assumes any auth/headers you need are already taken care of.
result = firebase.post('/my_endpoint', postdata, {'print': 'pretty'})
print(result)
# Snapshot info

You can get the data back using the snapshot info and endpoint, and reestablish the df with from_dict(). You could adapt this solution to SQL and JSON solutions, which pandas also has support for.

Alternatively and depending on where you script executes from, you might consider treating firebase as a db and using the dbapi from firebase_admin (check this out.)

As for whether it’s according to best practice, it’s difficult to say without knowing anything about your use case.

Answered By: Charles Landau

if you just want to reduce code length and the steps of creating and deleting files, you can use upload_from_string:

import firebase_admin
from firebase_admin import db, storage

cred   = firebase_admin.credentials.Certificate(cert_json)
app    = firebase_admin.initialize_app(cred, config)
bucket = storage.bucket(app=app)

def upload_df(df, data_id):
    """
    Upload a Dataframe as a csv to Firebase Storage
    :return: storage_ref
    """
    storage_ref = data_id + '.csv'
    blob = bucket.blob(storage_ref)
    blob.upload_from_string(df.to_csv())

    return storage_ref

https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html#google.cloud.storage.blob.Blob.upload_from_string

Answered By: fcsr

After figuring out for hours, the following solution works for me. You need to convert your csv file to bytes & then upload it.

import pyrebase
import pandas

firebaseConfig = {
   "apiKey": "xxxxx",
   "authDomain": "xxxxx",
   "projectId": "xxxxx",
   "storageBucket": "xxxxx",
   "messagingSenderId": "xxxxx",
   "appId": "xxxxx",
   "databaseURL":"xxxxx"
};

firebase = pyrebase.initialize_app(firebaseConfig)

storage = firebase.storage()

df = pd.read_csv("/content/Future Prices.csv")

# here is the magic. Convert your csv file to bytes and then upload it
df_string = df.to_csv(index=False)
db_bytes = bytes(df_string, 'utf8')

fileName = "Future Prices.csv"

storage.child("predictions/" + fileName).put(db_bytes)

That’s all Happy Coding!

Answered By: Muhammad Talha

I found that starting from very modest size of dataframe (below 100KB!) it’s paying off to compress them before storing. I used pickle below. Your file is available on the usual firebase storage this way, and you get gains in memory and speed, both when writing and when reading.

import firebase_admin
from firebase_admin import credentials, initialize_app, storage
import pickle

cred = credentials.Certificate(json_cert_file)
firebase_admin.initialize_app(cred, {'storageBucket': 'YOUR_storageBucket (without gs://)'})
bucket = storage.bucket()

file_name = data_id + ".pkl"
blob = bucket.blob(file_name)

# write df to storage
blob.upload_from_string(pickle.dumps(df))

# read df from storage
df = pickle.loads(blob.download_as_string())
Answered By: RoyRos
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.