How can I extract Azure IP ranges json file programatically through python?

Question:

I want to download the ipranges.json (which is updated weekly) from https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519
I have this python code which keeps running forever.

import wget
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
response = wget.download(URL, "ips.json")
print(response)

How can I download the JSON file in Python?

Asked By: Rajasekhar M

||

Answers:

Because https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 is the link which automatically trigger javascript to download, therefore you just download the page, not the file

If you check downloaded file, the source will look like this

enter image description here

We realize the file will change after a while, so we have to scrape it in generic way

For convenience, I will not use wget, 2 libraries here are requests to request page and download file, beaufitulsoup to parse html

# pip install requests
# pip install bs4
import requests
from bs4 import BeautifulSoup

# request page
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
page = requests.get(URL)

# parse HTML to get the real link
soup = BeautifulSoup(page.content, "html.parser")
link = soup.find('a', {'data-bi-containername':'download retry'})['href']

# download
file_download = requests.get(link)

# save in azure_ips.json
open("azure_ips.json", "wb").write(file_download.content)
Answered By: Tấn Nguyên

Great solution, I added custom headers because I was getting back an error from the URL saying that I was using an automation user-agent string. Here is how it looks now:

     # request page with modified headers
     URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
     headers = {
         'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
     page = requests.get(URL, headers=headers)
     logging.info(page.content)
Answered By: Hugo Nova