Use seek, write and readline methods on a CSV file stored on Google Cloud Storage (bucket)

Question:

I have multiple methods on my Python script to work with a csv file. It’s working on my local machine but it does not when I am working with the same csv file stored inside a Google Cloud Storage bucket. I need to keep track of my current_position in the file so this is why I am using seek() and tell(). I tried to use the pandas library but there is no such methods. Does anyone has a basic example of a Python script to read a csv stored in a GCP bucket with those methods?

def read_line_from_csv(position):
    #df = pandas.read_csv('gs://trends_service_v1/your_path.csv')
    with open('keywords.csv') as f:
        f.seek(position)
        keyword = f.readline()
        position = f.tell()
        f.close()
        return position, keyword


def save_new_position(current_positon):
    f = open("position.csv", "w")
    f.write(str(current_positon))
    f.close()
    update_csv_bucket("position.csv")


def get_position_reader():
    try:
        with open('position.csv') as f:
            return int(f.readline())
    except OSError as e:
        print(e)
Asked By: Pierre56

||

Answers:

Official library do not have such capabilities I think.
You can download file first than open it and work normally.

Apart from official one you can use gcsfs which implements missing functionality

import gcsfs
fs = gcsfs.GCSFileSystem(project='my-google-project')
with fs.open('my-bucket/my-file.txt', 'rb') as f:
    print(f.seek(location))
Answered By: Emil Gi

Another way other than @emil-gi’s suggestions would be to use the method mentioned here

#Download the contents of this blob as a bytes object
blob.download_as_string()

Where blob is the object associated with your CSV in your GCS bucket.
If you need to create the connection to the blob first (I don’t know what you do in other parts of the code), use the docs

Answered By: Marco Massetti

You can use Google Cloud Storage fileio.
For instance:

from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(file_path) #folder/filename.csv

#Instantiate a BlobReader
blobReader=storage.fileio.BlobReader(blob)

#Get current position in your file
print(blobReader.tell()) 

#Read line by line
print(blobReader.readline().decode('utf-8')) #read and print row 1
print(blobReader.readline().decode('utf-8')) #read and print row 2

#Read chunk of X bytes 
print(blobReader.read(1000).decode('utf-8')) #read next 1000 bytes

#To seek a specific position.
blobReader.seek(position)
Answered By: Clément Cardi