How to check if file exists in Google Cloud Storage?

Question:

I have a script where I want to check if a file exists in a bucket and if it doesn’t then create one.

I tried using os.path.exists(file_path) where file_path = "/gs/testbucket", but I got a file not found error.

I know that I can use the files.listdir() API function to list all the files located at a path and then check if the file I want is one of them. But I was wondering whether there is another way to check whether the file exists.

Asked By: Tanvir Shaikh

||

Answers:

I guess there is no function to check directly if the file exists given its path.
I have created a function that uses the files.listdir() API function to list all the files in the bucket and match it against the file name that we want. It returns true if found and false if not.

Answered By: Tanvir Shaikh

You can use custom function (shown below) to check file exists or not

def is_file_available(filepath):
 #check if the file is available
 fileavability = 'yes';
 try: 
  fp = files.open(filepath, 'r')
  fp.close()
 except Exception,e:
  fileavability = 'no'
 return fileavability 

use the above function in following way

 filepath = '/gs/test/testme.txt'
 fileavability = is_file_available(filepath)

note: in above function you may get also result as ‘no’ when read permission is not given to the application which is trying to read the file.

Answered By: Amit Vikram

Slight variation on Amit’s answer from a few years ago, updated for the cloudstorage api.

import cloudstorage as gcs

def GCSExists(gcs_file):
    '''
    True if file exists; pass complete /bucket/file
    '''
    try:
        file = gcs.open(gcs_file,'r')
        file.close()
        status = True
    except:
        status = False
    return status
Answered By: Matthew Dunn

You can use the stat function to get a files info. This will in practice do a HEAD request to google cloud storage instead of a GET, which is a bit less resource intensive.

import cloudstorage as gcs
# return stat if there is one, else None or false. A stat record should be truthy
def is_file_available(filepath):

  try:
    return gcs.stat(filepath)
  except gcs_errors.NotFoundError as e:
    return False
Answered By: Mark

This post is old, you can actually now check if a file exists on GCP using the blob class, but because it took me a while to find an answer, adding here for the others who are looking for a solution

from google.cloud import storage

name = 'file_i_want_to_check.txt'   
storage_client = storage.Client()
bucket_name = 'my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)

Documentation is here

Hope this helps!

Edit

As per the comment by @om-prakash, if the file is in a folder, then the name should include the path to the file:

name = "folder/path_to/file_i_want_to_check.txt"
Answered By: nickthefreak

Yes! It posible! from this

And this is my code:

def get_by_signed_url(self, object_name, bucket_name=GCLOUD_BUCKET_NAME):
    bucket = self.client_storage.bucket(bucket_name)
    blob = bucket.blob(object_name)

    #this is check if file exist or not
    stats = blob.exists(self.client_storage)
    if not stats:
        raise NotFound(messages.ERROR_NOT_FOUND)

    url_lifetime = self.expiration  # Seconds in an hour
    serving_url = blob.generate_signed_url(url_lifetime)
    return self.session.get(serving_url)
Answered By: Ardi Nusawan

The file I am searching on google cloud storage: init.sh

Full path: gs://cw-data/spark_app_code/init.sh

>>> from google.cloud import storage

>>> def is_exist(bucket_name,object):
...     client = storage.Client()
...     bucket = client.bucket(bucket_name)
...     blob = bucket.get_blob(object)
...     try:
...             return blob.exists(client)
...     except:
...             return False
...
>>> is_exist('cw-data','spark_app_code')
    False
>>> is_exist('cw-data','spark_app_code/')
    True
>>> is_exist('cw-data','init.sh')
    False
>>> is_exist('cw-data','spark_app_code/init.sh')
    True
>>> is_exist('cw-data','/init.sh')
    False
>>>

Here, the files are not stored in the way they are stored on local filesystems rather they are stored as keys. So, while searching the file on google storage use absolute path rather than just filename.

Answered By: Ajit K'sagar

It’s as easy as use the exists method within a blob object:

from google.cloud import storage

def blob_exists(projectname, credentials, bucket_name, filename):
   client = storage.Client(projectname, credentials=credentials)
   bucket = client.get_bucket(bucket_name)
   blob = bucket.blob(filename)
   return blob.exists()
Answered By: javinievas

If you’re working with gcs files on a service like “Google AI Platform”, use tensorflow to check whether a file exists or not:

import tensorflow as tf
file_exists = tf.gfile.Exists('gs://your-bucket-name/your-file.txt')
Answered By: Tobias Ernst

If you are looking for a solution in NodeJS, then here it is:

var storage = require('@google-cloud/storage')();
var myBucket = storage.bucket('my-bucket');

var file = myBucket.file('my-file');

file.exists(function(err, exists) {});

// If the callback is omitted, then this function return a Promise.
file.exists().then(function(data) {
  var exists = data[0];
});

If you need more help, you can refer to this doc:
https://cloud.google.com/nodejs/docs/reference/storage/1.5.x/File#exists

Answered By: Akash Kaushik

The answer provided by @nickthefreak is correct, and so is the comment by Om Prakash. One other note is that the bucket_name should not include gs:// in front or a / at the end.

Piggybacking off @nickthefreak’s example and Om Prakash’s comment:

from google.cloud import storage

name = 'folder1/another_folder/file_i_want_to_check.txt'   

storage_client = storage.Client()
bucket_name = 'my_bucket_name'  # Do not put 'gs://my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)

stats will be a Boolean (True or False) depending on whether the file exists in the Storage Bucket.

(I don’t have enough reputation points to comment, but I wanted to save other people some time because I wasted way too much time with this).

Answered By: TalkDataToMe

Since the accepted answer on this question didn’t provide much detail – here’s a modern solution using gsutil that functions as described by that answer.

This becomes more effective than the other answers if you need to query your GCS files many times in your script.

def bucket_to_list(bucketname: str):
    '''
    Return bucket's contents to python list of strings. 
    We also slice off the bucket name on each line, 
    in case we need to search many buckets for one file.
    '''
    return subprocess.run(['gsutil','ls','-r', bucketname + '**'], shell=False, text=True, stdout=subprocess.PIPE).stdout.replace(bucketname, "").splitlines()

Use in the following way:

# call once for each bucket to store bucket contents 
mybucket1 = 'gs://mybucket1/'
mybucket1list = bucket_to_list(mybucket1)

# limiting list to a bucket's "subdirectories"
mybucket2 = 'gs://mybucket2/subdir1/subdir2/'
mybucket2list = bucket_to_list(mybucket2)

# example filename list to check, we dont need to add the gs:// paths 
filestocheck = ['file1.ext', 'file2.ext', 'file3.ext']

# check both buckets for files in our filelist
for file in filestocheck:
    if file in mybucket1list:
        # do something if file exists in bucket1
    elif file in mybucket2list:
        # do something if file exists in bucket2
    else:
        # do something if file doesn't exist in either bucket 
Answered By: lys

from google.cloud import storage

def if_file_exists(name:str,bucket_name:str):
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    stats = storage.Blob.from_string(f"gs://{bucket_name}/{name}").exists(storage_client)
    return stats

print(if_file_exists(‘audios/courses/ActivityPlaying/1320210506130438.wav’,GC_BUCKET_NAME),">>>")

name args is the remaining path of the file

if_file_exists function takes two positional args first one is the object key and second is the bucket name and return true if file exists else false

Answered By: sadab khan