How do I unzip base 64 encoded data inside JSON using Python?

Question:

I’m using Selenium Grid and I need to download files for the automated tests download. I’m running into an issue where the response is not the file contents itself. Instead, the contents are zipped first and then wrapped within JSON.

Here’s the response I’m trying to unzip:

b'{n  "filename": "test.txt",n  "contents": "UEsDBBQACAgIAFV1TlYAAAAAAAAAAAAAAAAIAAAAdGVzdC50eHQDAFBLBwgAAAAAAgAAAAAAAABQSwECFAAUAAgICABVdU5WAAAAAAIAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAdGVzdC50eHRQSwUGAAAAAAEAAQA2AAAAOAAAAAAA"n}'

The data above should contain an empty file called test.txt.

According to the documentation, the contents are a zipped folder in Base64 encoding. I want to be able to unzip this string and read the contents of the test.txt file within, but I’m not sure how to unzip a byte string with an encoding of base 64.

I’m using Python so if anyone knows how to unzip the contents and read the test.txt file within that would be extremely helpful.

Here’s the documentation on Selenium Grid for Downloading files:
https://www.selenium.dev/documentation/grid/configuration/cli_options/#important-information-when-dowloading-a-file

Asked By: Ryan Nygard

||

Answers:

You will need to b64decode the contents after the json has been parsed, and then store the result in either a temporary file or a BytesIO object for it to be usable with ZipFile:

import json
from base64 import b64decode
from io import BytesIO
from pathlib import Path
from zipfile import ZipFile

data = b'{n  "filename": "test.txt",n  "contents": "UEsDBBQACAgIAFV1TlYAAAAAAAAAAAAAAAAIAAAAdGVzdC50eHQDAFBLBwgAAAAAAgAAAAAAAABQSwECFAAUAAgICABVdU5WAAAAAAIAAAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAdGVzdC50eHRQSwUGAAAAAAEAAQA2AAAAOAAAAAAA"n}'

contents = json.loads(data)['contents']
bio = BytesIO(b64decode(contents))

We can then see the metadata about the files stored within that zip file (this step is not required):

>>> with ZipFile(bio) as zip_file:
...     zip_file.infolist()
...
[<ZipInfo filename='test.txt' compress_type=deflate file_size=0 compress_size=2>]

To extract all files, you can use the extractall method:

>>> path = Path('temp/extracted')

>>> path.mkdir()

>>> with ZipFile(bio) as zip_file:
...     zip_file.extractall(path)
...

>>> for p in path.iterdir():
...     print(p.as_posix())
...
temp/extracted/test.txt

It is also possible to extract individual files, if desired.

Answered By: dskrypa