Python Selenium: How to retrieve base64 images after clicking a button

Question:

I am attempting to retrieve base64 captcha images with Python Selenium.

The problem I have is that I can only retrieve the HTML that is 1 step before the images arrive.

My steps are as follows:

# load packages

from selenium.webdriver import EdgeOptions
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.microsoft import EdgeChromiumDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = EdgeOptions()
options.add_argument("--headless")
options.add_argument('disable-gpu')
driver = webdriver.Edge(service=Service(EdgeChromiumDriverManager().install()), options=options)

# access website
driver.get("https://boards.4channel.org/o/")
# open post
driver.execute_script("document.getElementsByClassName('mobilePostFormToggle mobile hidden button')[0].click()")
# click this ID
driver.execute_script("document.getElementById('t-load').click()")

# This will get the initial html - before javascript
html1 = driver.page_source

# This will get the html after on-load javascript
html2 = driver.execute_script("return document.documentElement.innerHTML;")

# check for base64 images
'Loading' in html1  # True
'Loading' in html2  # True

# check for base64 images
'data:image/png;base64' in html1  # False
'data:image/png;base64' in html2  # False

The HTML object that I think is of interest is as follows:

<button id="t-load" type="button" data-board="o" data-tid="0" style="font-size: 11px; padding: 0px; width: 90px; box-sizing: border-box; margin: 0px 6px 0px 0px; vertical-align: middle; height: 18px;">Get Captcha</button>
Asked By: John Stud

||

Answers:

@John Stud, I tried executing your code in my environment after removing headless mode so I can see what is happening. My observations are

mobilePostFormToggle mobile hidden button

The above element is hidden and was not clicking on the link "[Start a New Thread]", so I changed it to the below line for clicking on link "[Start a New Thread]"

start_a_new_session = driver.find_element(By.XPATH, "//div[@id='togglePostFormLink']/a[text()='Start a New Thread']")
driver.execute_script("arguments[0].click();", start_a_new_session)

You have put 60 seconds as the wait time before clicking on the captcha button. I feel it needs only 2-3 seconds. But once you click the captcha button, it takes a bit. I had to put 40 seconds but somehow on my browser, it could not launch the image. It gives the following error; hence, I cannot give you the confirmation. Let me know if this helps to resolve your problem:

enter image description here

Answered By: ketanvj

Reading through the question and OP’s comments, it is clear the crux of the issue is not retrieving/manipulating/decoding the base64 captcha image (which in fact is composed of 2 base64 images with different sizes, so imagine the joy of retrieving, decoding and combining them into one exact replica of what is displayed on the screen), but simply getting the captcha image as seen on the screen. Here is a solution to this problem:

from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
from PIL import Image

import undetected_chromedriver as uc


options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument('--disable-notifications')
options.add_argument("--window-size=1280,720")
options.add_argument('--ignore-certificate-errors')
options.add_argument('--allow-running-insecure-content')
# options.add_argument('--headless')

browser = uc.Chrome(options=options)

wait = WebDriverWait(browser, 20)
url = 'https://boards.4channel.org/o/'
browser.get(url) 

wait.until(EC.element_to_be_clickable((By.PARTIAL_LINK_TEXT, 'Start a New Thread'))).click()
t.sleep(1)
wait.until(EC.element_to_be_clickable((By.XPATH, '//button[@id="t-load"]'))).click()
captcha_img_background = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@id="t-bg"]')))
captcha_img_background.screenshot('full_captcha_image.png')
print('got captcha!')
b64img_background = wait.until(EC.element_to_be_clickable((By.XPATH, '//div[@id="t-bg"]'))).get_attribute('style').split('url("data:image/png;base64,')[1].split('");')[0]
bgimgdata = base64.b64decode(b64img_background)
with open('bg_image.png', 'wb') as f:
    f.write(bgimgdata)
print('also saved the base64 image as bg_image.png')

This will save the captcha image as displayed on screen, including both background and foreground, so it can be further used in ML training data etc.

EDIT: updated code, to exemplify decoding and saving a base64 image as well (saving the background image, which can be scrolled horizontally with the slider).

I ended up using undetected chromedriver after Firefox & Chrome failed to load the captcha images.

Undetected chromedriver docs: https://github.com/ultrafunkamsterdam/undetected-chromedriver

Selenium docs: https://www.selenium.dev/documentation/

Answered By: platipus_on_fire