urllib.error.HTTPError: HTTP Error 403: Forbidden with urllib.requests

Question:

I am trying to read an image URL from the internet and be able to get the image onto my machine via python, I used example used in this blog post https://www.geeksforgeeks.org/how-to-open-an-image-from-the-url-in-pil/ which was https://media.geeksforgeeks.org/wp-content/uploads/20210318103632/gfg-300×300.png, however, when I try my own example it just doesn’t seem to work I’ve tried the HTTP version and it still gives me the 403 error. Does anyone know what the cause could be?

import urllib.request

urllib.request.urlretrieve(
  "http://image.prntscr.com/image/ynfpUXgaRmGPwj5YdZJmaw.png",
   "gfg.png")

Output:

urllib.error.HTTPError: HTTP Error 403: Forbidden

Asked By: patricebailey1998

||

Answers:

The server at prntscr.com is actively rejecting your request. There are many reasons why that could be. Some sites will check for the user agent of the caller to make see if that’s the case. In my case, I used httpie to test if it would allow me to download through a non-browser app. It worked. So then I simply reused made up a user header to see if it’s just the lack of user-agent.

import urllib.request

opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'MyApp/1.0')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve(
  "http://image.prntscr.com/image/ynfpUXgaRmGPwj5YdZJmaw.png",
   "gfg.png")

It worked! Now I don’t know what logic the server uses. For instance, I tried a standard Mozilla/5.0 and that did not work. You won’t always encounter this issue (most sites are pretty lax in what they allow as long as you are reasonable), but when you do, try playing with the user-agent. If nothing works, try using the same user-agent as your browser for instance.

Answered By: wombat

I had the same problem and it was due to an expired URL. I checked the response text and I was getting "URL signature expired" which is a message you wouldn’t normally see unless you checked the response text.

This means some URLs just expire, usually for security purposes. Try to get the URL again and update the URL in your script. If there isn’t a new URL for the content you’re trying to scrape, then unfortunately you can’t scrape for it.

Answered By: Toakley