Web scraping Access denied | Cloudflare to restrict access

Question:

I’m trying to access and get data from www.cclonline.com website using python script.
this is the code.

import requests
from requests_html import HTML

source = requests.get('https://www.cclonline.com/category/409/PC-Components/Graphics-Cards/')
html = HTML(html=source.text)
print(source.status_code)
print(html.text)

this is the errors i get,

403
Access denied | www.cclonline.com used Cloudflare to restrict access
Please enable cookies.
Error 1020
Ray ID: 64c0c2f1ccb5d781 • 2021-05-08 06:51:46 UTC
Access denied
What happened?
This website is using a security service to protect itself from online attacks.

how can i solve this problem? Thanks.

Asked By: dfcsdf

||

Answers:

So the site’s robots.txt does not explicitly says no bot is allowed. But you need to make your request look like it’s coming from an actual browser.
Now to solve the issue at hand. The response says you need to have cookies enabled. So that can be solved by using a headless browser like selenium. Selenium has everything a browser has to offer (it basically uses google chrome or a browser of your chosen as a driver). It will make the server think the request is coming from an actual browser and will return a response.

Learn more about how to use selenium for scraping here.

Also remember to adjust crawl time accordingly. Make pauses after each request and swap user-agents often.

Answered By: exilour

There’s no a silver bullet for solving cloudflare challenges, I’ve tried in my projects the solutions proposed here on this website, using playwright with different options https://substack.thewebscraping.club/p/cloudflare-how-to-scrape