How can I extract Azure IP ranges json file programatically through python?
Question:
I want to download the ipranges.json
(which is updated weekly) from https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519
I have this python code which keeps running forever.
import wget
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
response = wget.download(URL, "ips.json")
print(response)
How can I download the JSON file in Python?
Answers:
Because https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519
is the link which automatically trigger javascript to download, therefore you just download the page, not the file
If you check downloaded file, the source will look like this
We realize the file will change after a while, so we have to scrape it in generic way
For convenience, I will not use wget, 2 libraries here are requests
to request page and download file, beaufitulsoup
to parse html
# pip install requests
# pip install bs4
import requests
from bs4 import BeautifulSoup
# request page
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
page = requests.get(URL)
# parse HTML to get the real link
soup = BeautifulSoup(page.content, "html.parser")
link = soup.find('a', {'data-bi-containername':'download retry'})['href']
# download
file_download = requests.get(link)
# save in azure_ips.json
open("azure_ips.json", "wb").write(file_download.content)
Great solution, I added custom headers because I was getting back an error from the URL saying that I was using an automation user-agent string. Here is how it looks now:
# request page with modified headers
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
page = requests.get(URL, headers=headers)
logging.info(page.content)
I want to download the ipranges.json
(which is updated weekly) from https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519
I have this python code which keeps running forever.
import wget
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
response = wget.download(URL, "ips.json")
print(response)
How can I download the JSON file in Python?
Because https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519
is the link which automatically trigger javascript to download, therefore you just download the page, not the file
If you check downloaded file, the source will look like this
We realize the file will change after a while, so we have to scrape it in generic way
For convenience, I will not use wget, 2 libraries here are requests
to request page and download file, beaufitulsoup
to parse html
# pip install requests
# pip install bs4
import requests
from bs4 import BeautifulSoup
# request page
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
page = requests.get(URL)
# parse HTML to get the real link
soup = BeautifulSoup(page.content, "html.parser")
link = soup.find('a', {'data-bi-containername':'download retry'})['href']
# download
file_download = requests.get(link)
# save in azure_ips.json
open("azure_ips.json", "wb").write(file_download.content)
Great solution, I added custom headers because I was getting back an error from the URL saying that I was using an automation user-agent string. Here is how it looks now:
# request page with modified headers
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
page = requests.get(URL, headers=headers)
logging.info(page.content)