Saving Pillow Images from PDF to Google Cloud Server

Question:

I am working on a Django web app that takes in PDF files and performs some image processing to each page of the PDFs. I am given a PDF and I need to save each page into my Google Cloud Storage. I am using pdf2image’s convert_from_path() to generate a list of Pillow images for each page in the PDF. Now, I want to save these images to Google Cloud Storages but I can’t figure it out.

I have successfully saved these Pillow images locally but I do not know how to do this in the cloud.

fullURL = file.pdf.url
client = storage.Client()
bucket = client.get_bucket('name-of-my-bucket')
blob = bucket.blob(file.pdf.name[:-4] + '/')
blob.upload_from_string('', content_type='application/x-www-form-urlencoded;charset=UTF-8')
pages = convert_from_path(fullURL, 400)
for i,page in enumerate(pages):
    blob = bucket.blob(file.pdf.name[:-4] + '/' + str(i) + '.jpg')
    blob.upload_from_string('', content_type='image/jpeg')
    outfile = file.pdf.name[:-4] + '/' + str(i) + '.jpg'
    page.save(outfile)
    of = open(outfile, 'rb')
    blob.upload_from_file(of)
Asked By: kninjaboi

||

Answers:

Since you have saved the files locally, then they are available in your local directory where the web app is running.

What you can do simply is to iterate through the files of that directory and upload them to the Google Cloud Storage one by one.

Here is a sample code:

You will need this library:

google-cloud-storage

Python code:

#Libraries
import os
from google.cloud import storage

#Public variable declarations:
bucket_name = "[BUCKET_NAME]"
local_directory = "local/directory/of/the/files/for/uploading/"
bucket_directory = "uploaded/files/" #Where the files will be uploaded in the bucket

#Upload file from source to destination
def upload_blob(source_file_name, destination_blob_name):
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_filename(source_file_name)

#Iterate through all files in that directory and upload one by one using the same filename
def upload_files():
    for filename in os.listdir(local_directory):
        upload_blob(local_directory + filename, bucket_directory + filename)
    return "File uploaded!"

#Call this function in your code:
upload_files()

NOTE: I have tested the code in Google App Engine web app and it worked for me. Take the idea of how it is working and modify it according to your needs. I hope that was helpful.

Answered By: Andrei Cusnir

So start off by not using blobstore. They are trying to get
rid of it and get people to use cloud storage. First set up cloud storage

https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/setting-up-cloud-storage

I use webapp2 and not Django but I’m sure you can figure it out. Also I don’t use Pillow images so you’ll have to open the image that you’re going to upload. Then do something like this (this assumes you’re trying to post the data):

  import cloudstorage as gcs
  import io
  import StringIO 
  from google.appengine.api import app_identity

before get and post in its own section

     def create_file(self, filename, Dacontents):

    write_retry_params = gcs.RetryParams(backoff_factor=1.1)
    gcs_file = gcs.open(filename,
                        'w',
                        content_type='image/jpeg',
                        options={'x-goog-meta-foo': 'foo',
                                'x-goog-meta-bar': 'bar'},
                        retry_params=write_retry_params)
    gcs_file.write(Dacontents)
    gcs_file.close()

in get in your HTML

   <form action="/(whatever yoururl is)" method="post"enctype="multipart/form-data">
  <input type="file" name="orders"/>
   <input type="submit"/>
    </form>

In Post

    orders=self.request.POST.get(‘orders)#this is for webapp2

    bucket_name = os.environ.get('BUCKET_NAME',app_identity.get_default_gcs_bucket_name())
    bucket = '/' + bucket_name
    OpenOrders=orders.file.read()
    if OpenOrders:
        filename = bucket + '/whateverYouWantToCallIt'            
        self.create_file(filename,OpenOrders)
Answered By: Brandon Wegner

You don’t need to save the image locally without saving locally also you can write the image directly to gcs bucket as described below:

import io
from PIL import Image
from google.cloud import storage
from pdf2image import convert_from_bytes

storage_client = storage.Client()

def convert_pil_image_to_byte_array(img):
    img_byte_array = io.BytesIO()
    img.save(img_byte_array, format='JPEG', subsampling=0, quality=100)
    img_byte_array = img_byte_array.getvalue()
    return img_byte_array

def write_to_gcs_bucket(bucket_name, source_prefix, target_prefix):
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.get_blob(source_prefix)
    contents = blob.download_as_string()
    images = convert_from_bytes(contents,first_page = 5)
    for i in range(len(images)):
        object_byte = convert_pil_image_to_byte_array(images[i])
        file_name = 'slide' + str(i) + '.jpg'
        blob = bucket.blob(target_prefix + file_name)
        blob.upload_from_string(object_byte)
    
Answered By: Shashank Tripathi