GoogleCaptcha roadblock in website scraper

Question:

I am currently working on a scraper for aniworld.to.
My goal is it to enter the anime name and get all of the Episodes downloaded.
I have everything working except one thing…
The websites has a Watch button. That Button redirects you to https://aniworld.to/redirect/SOMETHING and that Site has a captcha which means the link is not in the html…
Is there a way to bypass this/get the link in python? Or a way to display the captcha so I can solve it?
Because the captcha only appears every lightyear.
The only thing I need from that page is the redirect link. It looks like this:
https://vidoza.net/embed-something.html
My very very wip code is here if it helps: https://github.com/wolfswolke/aniworld_scraper

Asked By: ZKWolf

||

Answers:

Mitchdu showed me how to do it.
If anyone else needs help here is my code: https://github.com/wolfswolke/aniworld_scraper/blob/main/src/logic/captcha.py

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from threading import Thread

import os
def open_captcha_window(full_url):
    working_dir = os.getcwd()
    path_to_ublock = r'{}extensionsublock'.format(working_dir)
    options = webdriver.ChromeOptions()
    options.add_argument("app=" + full_url)
    options.add_argument("window-size=423,705")
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    if os.path.exists(path_to_ublock):
        options.add_argument('load-extension=' + path_to_ublock)

    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get(full_url)

    wait = WebDriverWait(driver, 100, 0.3)
    wait.until(lambda redirect: redirect.current_url != full_url)

    new_page = driver.current_url
    Thread(target=threaded_driver_close, args=(driver,)).start()
    return new_page


def threaded_driver_close(driver):
    driver.close()
Answered By: ZKWolf
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.