How to convert a PDF from base64 string to a file?
Question:
I have a PDF as a base64 string and I need to write it to file using Python.
I tried this:
import base64
base64String = "data:application/pdf;base64,JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."
with open('temp.pdf', 'wb') as theFile:
theFile.write(base64.b64decode(base64String))
But it didn’t create a valid PDF file.
What am I missing?
Answers:
From my understanding base64decode only takes in a base64 string and looks like you have some headers on your string that are not encoded.
I would remove “data:application/pdf;base64,”
check out the doc here: https://docs.python.org/2/library/base64.html
When I’ve used it in the past, I have only used the encoded string.
Does writing it by using the codecs.decode
function work?
also as Mark stated, you can try to remove the data:application/pdf;base64,
portion of the string as this section of the string is not to be decoded.:
import codecs
base64String = "JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."
with open("test.pdf", "wb") as f:
f.write(codecs.decode(base64string, "base64"))
This is not just base64 encoded data, but data-uri encoded:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs
There is another post on stack overflow asking how to parse such strings in Python:
How to parse data-uri in python?
The gist of it is to remove the header (everything up to and including the first comma):
theFile.write(base64.b64decode(base64String.split(",")[1:2]))
NOTE: I use [1:2] instead of [1] because it won’t throw an exception if there is only 1 element in the list because nothing follows the comma (empty data).
Extending @Jebby‘s answer using Base64 (had the same issue as @SmartManoj)
import base64
base64String = "JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."
with open("test.pdf", "wb") as f:
f.write(base64.b64decode(base64string))
Here is my solution::–
from base64 import b64decode
def base64_to_pdf(file):
file_bytes = b64decode(file, validate=True)
if file_bytes[0:4] != b"%PDF":
raise ValueError("Missing the PDF file signature")
with open("file.pdf", "wb") as f:
return f.write(file_bytes)
I have a PDF as a base64 string and I need to write it to file using Python.
I tried this:
import base64
base64String = "data:application/pdf;base64,JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."
with open('temp.pdf', 'wb') as theFile:
theFile.write(base64.b64decode(base64String))
But it didn’t create a valid PDF file.
What am I missing?
From my understanding base64decode only takes in a base64 string and looks like you have some headers on your string that are not encoded.
I would remove “data:application/pdf;base64,”
check out the doc here: https://docs.python.org/2/library/base64.html
When I’ve used it in the past, I have only used the encoded string.
Does writing it by using the codecs.decode
function work?
also as Mark stated, you can try to remove the data:application/pdf;base64,
portion of the string as this section of the string is not to be decoded.:
import codecs
base64String = "JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."
with open("test.pdf", "wb") as f:
f.write(codecs.decode(base64string, "base64"))
This is not just base64 encoded data, but data-uri encoded:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs
There is another post on stack overflow asking how to parse such strings in Python:
How to parse data-uri in python?
The gist of it is to remove the header (everything up to and including the first comma):
theFile.write(base64.b64decode(base64String.split(",")[1:2]))
NOTE: I use [1:2] instead of [1] because it won’t throw an exception if there is only 1 element in the list because nothing follows the comma (empty data).
Extending @Jebby‘s answer using Base64 (had the same issue as @SmartManoj)
import base64
base64String = "JVBERi0xLjQKJeHp69MKMSAwIG9iago8PC9Qcm9kdWNlciAoU2tpYS9..."
with open("test.pdf", "wb") as f:
f.write(base64.b64decode(base64string))
Here is my solution::–
from base64 import b64decode
def base64_to_pdf(file):
file_bytes = b64decode(file, validate=True)
if file_bytes[0:4] != b"%PDF":
raise ValueError("Missing the PDF file signature")
with open("file.pdf", "wb") as f:
return f.write(file_bytes)