How to check if file exists in Google Cloud Storage?
Question:
I have a script where I want to check if a file exists in a bucket and if it doesn’t then create one.
I tried using os.path.exists(file_path)
where file_path = "/gs/testbucket"
, but I got a file not found error.
I know that I can use the files.listdir()
API function to list all the files located at a path and then check if the file I want is one of them. But I was wondering whether there is another way to check whether the file exists.
Answers:
I guess there is no function to check directly if the file exists given its path.
I have created a function that uses the files.listdir()
API function to list all the files in the bucket and match it against the file name that we want. It returns true if found and false if not.
You can use custom function (shown below) to check file exists or not
def is_file_available(filepath):
#check if the file is available
fileavability = 'yes';
try:
fp = files.open(filepath, 'r')
fp.close()
except Exception,e:
fileavability = 'no'
return fileavability
use the above function in following way
filepath = '/gs/test/testme.txt'
fileavability = is_file_available(filepath)
note: in above function you may get also result as ‘no’ when read permission is not given to the application which is trying to read the file.
Slight variation on Amit’s answer from a few years ago, updated for the cloudstorage api.
import cloudstorage as gcs
def GCSExists(gcs_file):
'''
True if file exists; pass complete /bucket/file
'''
try:
file = gcs.open(gcs_file,'r')
file.close()
status = True
except:
status = False
return status
You can use the stat function to get a files info. This will in practice do a HEAD request to google cloud storage instead of a GET, which is a bit less resource intensive.
import cloudstorage as gcs
# return stat if there is one, else None or false. A stat record should be truthy
def is_file_available(filepath):
try:
return gcs.stat(filepath)
except gcs_errors.NotFoundError as e:
return False
This post is old, you can actually now check if a file exists on GCP using the blob class, but because it took me a while to find an answer, adding here for the others who are looking for a solution
from google.cloud import storage
name = 'file_i_want_to_check.txt'
storage_client = storage.Client()
bucket_name = 'my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)
Documentation is here
Hope this helps!
Edit
As per the comment by @om-prakash, if the file is in a folder, then the name should include the path to the file:
name = "folder/path_to/file_i_want_to_check.txt"
Yes! It posible! from this
And this is my code:
def get_by_signed_url(self, object_name, bucket_name=GCLOUD_BUCKET_NAME):
bucket = self.client_storage.bucket(bucket_name)
blob = bucket.blob(object_name)
#this is check if file exist or not
stats = blob.exists(self.client_storage)
if not stats:
raise NotFound(messages.ERROR_NOT_FOUND)
url_lifetime = self.expiration # Seconds in an hour
serving_url = blob.generate_signed_url(url_lifetime)
return self.session.get(serving_url)
The file I am searching on google cloud storage: init.sh
Full path: gs://cw-data/spark_app_code/init.sh
>>> from google.cloud import storage
>>> def is_exist(bucket_name,object):
... client = storage.Client()
... bucket = client.bucket(bucket_name)
... blob = bucket.get_blob(object)
... try:
... return blob.exists(client)
... except:
... return False
...
>>> is_exist('cw-data','spark_app_code')
False
>>> is_exist('cw-data','spark_app_code/')
True
>>> is_exist('cw-data','init.sh')
False
>>> is_exist('cw-data','spark_app_code/init.sh')
True
>>> is_exist('cw-data','/init.sh')
False
>>>
Here, the files are not stored in the way they are stored on local filesystems rather they are stored as keys. So, while searching the file on google storage use absolute path rather than just filename.
It’s as easy as use the exists method within a blob object:
from google.cloud import storage
def blob_exists(projectname, credentials, bucket_name, filename):
client = storage.Client(projectname, credentials=credentials)
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(filename)
return blob.exists()
If you’re working with gcs files on a service like “Google AI Platform”, use tensorflow to check whether a file exists or not:
import tensorflow as tf
file_exists = tf.gfile.Exists('gs://your-bucket-name/your-file.txt')
If you are looking for a solution in NodeJS, then here it is:
var storage = require('@google-cloud/storage')();
var myBucket = storage.bucket('my-bucket');
var file = myBucket.file('my-file');
file.exists(function(err, exists) {});
// If the callback is omitted, then this function return a Promise.
file.exists().then(function(data) {
var exists = data[0];
});
If you need more help, you can refer to this doc:
https://cloud.google.com/nodejs/docs/reference/storage/1.5.x/File#exists
The answer provided by @nickthefreak is correct, and so is the comment by Om Prakash. One other note is that the bucket_name should not include gs://
in front or a /
at the end.
Piggybacking off @nickthefreak’s example and Om Prakash’s comment:
from google.cloud import storage
name = 'folder1/another_folder/file_i_want_to_check.txt'
storage_client = storage.Client()
bucket_name = 'my_bucket_name' # Do not put 'gs://my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)
stats will be a Boolean (True or False) depending on whether the file exists in the Storage Bucket.
(I don’t have enough reputation points to comment, but I wanted to save other people some time because I wasted way too much time with this).
Since the accepted answer on this question didn’t provide much detail – here’s a modern solution using gsutil
that functions as described by that answer.
This becomes more effective than the other answers if you need to query your GCS files many times in your script.
def bucket_to_list(bucketname: str):
'''
Return bucket's contents to python list of strings.
We also slice off the bucket name on each line,
in case we need to search many buckets for one file.
'''
return subprocess.run(['gsutil','ls','-r', bucketname + '**'], shell=False, text=True, stdout=subprocess.PIPE).stdout.replace(bucketname, "").splitlines()
Use in the following way:
# call once for each bucket to store bucket contents
mybucket1 = 'gs://mybucket1/'
mybucket1list = bucket_to_list(mybucket1)
# limiting list to a bucket's "subdirectories"
mybucket2 = 'gs://mybucket2/subdir1/subdir2/'
mybucket2list = bucket_to_list(mybucket2)
# example filename list to check, we dont need to add the gs:// paths
filestocheck = ['file1.ext', 'file2.ext', 'file3.ext']
# check both buckets for files in our filelist
for file in filestocheck:
if file in mybucket1list:
# do something if file exists in bucket1
elif file in mybucket2list:
# do something if file exists in bucket2
else:
# do something if file doesn't exist in either bucket
from google.cloud import storage
def if_file_exists(name:str,bucket_name:str):
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob.from_string(f"gs://{bucket_name}/{name}").exists(storage_client)
return stats
print(if_file_exists(‘audios/courses/ActivityPlaying/1320210506130438.wav’,GC_BUCKET_NAME),">>>")
name args is the remaining path of the file
if_file_exists function takes two positional args first one is the object key and second is the bucket name and return true if file exists else false
I have a script where I want to check if a file exists in a bucket and if it doesn’t then create one.
I tried using os.path.exists(file_path)
where file_path = "/gs/testbucket"
, but I got a file not found error.
I know that I can use the files.listdir()
API function to list all the files located at a path and then check if the file I want is one of them. But I was wondering whether there is another way to check whether the file exists.
I guess there is no function to check directly if the file exists given its path.
I have created a function that uses the files.listdir()
API function to list all the files in the bucket and match it against the file name that we want. It returns true if found and false if not.
You can use custom function (shown below) to check file exists or not
def is_file_available(filepath):
#check if the file is available
fileavability = 'yes';
try:
fp = files.open(filepath, 'r')
fp.close()
except Exception,e:
fileavability = 'no'
return fileavability
use the above function in following way
filepath = '/gs/test/testme.txt'
fileavability = is_file_available(filepath)
note: in above function you may get also result as ‘no’ when read permission is not given to the application which is trying to read the file.
Slight variation on Amit’s answer from a few years ago, updated for the cloudstorage api.
import cloudstorage as gcs
def GCSExists(gcs_file):
'''
True if file exists; pass complete /bucket/file
'''
try:
file = gcs.open(gcs_file,'r')
file.close()
status = True
except:
status = False
return status
You can use the stat function to get a files info. This will in practice do a HEAD request to google cloud storage instead of a GET, which is a bit less resource intensive.
import cloudstorage as gcs
# return stat if there is one, else None or false. A stat record should be truthy
def is_file_available(filepath):
try:
return gcs.stat(filepath)
except gcs_errors.NotFoundError as e:
return False
This post is old, you can actually now check if a file exists on GCP using the blob class, but because it took me a while to find an answer, adding here for the others who are looking for a solution
from google.cloud import storage
name = 'file_i_want_to_check.txt'
storage_client = storage.Client()
bucket_name = 'my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)
Documentation is here
Hope this helps!
Edit
As per the comment by @om-prakash, if the file is in a folder, then the name should include the path to the file:
name = "folder/path_to/file_i_want_to_check.txt"
Yes! It posible! from this
And this is my code:
def get_by_signed_url(self, object_name, bucket_name=GCLOUD_BUCKET_NAME):
bucket = self.client_storage.bucket(bucket_name)
blob = bucket.blob(object_name)
#this is check if file exist or not
stats = blob.exists(self.client_storage)
if not stats:
raise NotFound(messages.ERROR_NOT_FOUND)
url_lifetime = self.expiration # Seconds in an hour
serving_url = blob.generate_signed_url(url_lifetime)
return self.session.get(serving_url)
The file I am searching on google cloud storage: init.sh
Full path: gs://cw-data/spark_app_code/init.sh
>>> from google.cloud import storage
>>> def is_exist(bucket_name,object):
... client = storage.Client()
... bucket = client.bucket(bucket_name)
... blob = bucket.get_blob(object)
... try:
... return blob.exists(client)
... except:
... return False
...
>>> is_exist('cw-data','spark_app_code')
False
>>> is_exist('cw-data','spark_app_code/')
True
>>> is_exist('cw-data','init.sh')
False
>>> is_exist('cw-data','spark_app_code/init.sh')
True
>>> is_exist('cw-data','/init.sh')
False
>>>
Here, the files are not stored in the way they are stored on local filesystems rather they are stored as keys. So, while searching the file on google storage use absolute path rather than just filename.
It’s as easy as use the exists method within a blob object:
from google.cloud import storage
def blob_exists(projectname, credentials, bucket_name, filename):
client = storage.Client(projectname, credentials=credentials)
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(filename)
return blob.exists()
If you’re working with gcs files on a service like “Google AI Platform”, use tensorflow to check whether a file exists or not:
import tensorflow as tf
file_exists = tf.gfile.Exists('gs://your-bucket-name/your-file.txt')
If you are looking for a solution in NodeJS, then here it is:
var storage = require('@google-cloud/storage')();
var myBucket = storage.bucket('my-bucket');
var file = myBucket.file('my-file');
file.exists(function(err, exists) {});
// If the callback is omitted, then this function return a Promise.
file.exists().then(function(data) {
var exists = data[0];
});
If you need more help, you can refer to this doc:
https://cloud.google.com/nodejs/docs/reference/storage/1.5.x/File#exists
The answer provided by @nickthefreak is correct, and so is the comment by Om Prakash. One other note is that the bucket_name should not include gs://
in front or a /
at the end.
Piggybacking off @nickthefreak’s example and Om Prakash’s comment:
from google.cloud import storage
name = 'folder1/another_folder/file_i_want_to_check.txt'
storage_client = storage.Client()
bucket_name = 'my_bucket_name' # Do not put 'gs://my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)
stats will be a Boolean (True or False) depending on whether the file exists in the Storage Bucket.
(I don’t have enough reputation points to comment, but I wanted to save other people some time because I wasted way too much time with this).
Since the accepted answer on this question didn’t provide much detail – here’s a modern solution using gsutil
that functions as described by that answer.
This becomes more effective than the other answers if you need to query your GCS files many times in your script.
def bucket_to_list(bucketname: str):
'''
Return bucket's contents to python list of strings.
We also slice off the bucket name on each line,
in case we need to search many buckets for one file.
'''
return subprocess.run(['gsutil','ls','-r', bucketname + '**'], shell=False, text=True, stdout=subprocess.PIPE).stdout.replace(bucketname, "").splitlines()
Use in the following way:
# call once for each bucket to store bucket contents
mybucket1 = 'gs://mybucket1/'
mybucket1list = bucket_to_list(mybucket1)
# limiting list to a bucket's "subdirectories"
mybucket2 = 'gs://mybucket2/subdir1/subdir2/'
mybucket2list = bucket_to_list(mybucket2)
# example filename list to check, we dont need to add the gs:// paths
filestocheck = ['file1.ext', 'file2.ext', 'file3.ext']
# check both buckets for files in our filelist
for file in filestocheck:
if file in mybucket1list:
# do something if file exists in bucket1
elif file in mybucket2list:
# do something if file exists in bucket2
else:
# do something if file doesn't exist in either bucket
from google.cloud import storage
def if_file_exists(name:str,bucket_name:str):
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob.from_string(f"gs://{bucket_name}/{name}").exists(storage_client)
return stats
print(if_file_exists(‘audios/courses/ActivityPlaying/1320210506130438.wav’,GC_BUCKET_NAME),">>>")
name args is the remaining path of the file
if_file_exists function takes two positional args first one is the object key and second is the bucket name and return true if file exists else false