Is it possible to download the photos from the weathercams on the FAA's website?
Question:
I am having difficulties scraping data off of this website (https://weathercams.faa.gov/cameras/state/US). I have relatively minimal experience scraping data with python, so please excuse me if this is trivial, but whenever I attempt to use Selenium (shown below) or BeautifulSoup, everything returns ‘NONE’. The code shown below is my attempt to click on one of the airports listed.
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Edge()
driver.get('https://weathercams.faa.gov/cameras/state/US')
airport = driver.find_elements_by_xpath('/html/body/div/div/div[2]/div[2]/div/div[4]/div[2]/a')
airport.click()
My intention is to loop through the airports and then save each weather cam photo. Any advice or assistance would be appreciated.
Answers:
Yes – although this is one of those cases where it’s easier to reverse-engineer the API and use the requests library, rather than try to control a browser.
First, you need to be able to figure out the siteId associated with each airport. If you navigate to the page in Firefox, and open the Network tab of the inspect element dialogue, you can find the API call it makes to find those site IDs. It’s https://api.weathercams.faa.gov/sites
.
You can then copy the request as cURL and convert it to Python requests code.
import requests
import pandas as pd
def fetch_sites():
headers = {
'User-Agent': 'Camera-Fetcher https://stackoverflow.com/q/75203189',
'Accept': '*/*',
'Referer': 'https://weathercams.faa.gov/',
}
response = requests.get('https://api.weathercams.faa.gov/sites', headers=headers)
response.raise_for_status()
data = pd.json_normalize(response.json()['payload'])
keep_cols = [
'siteId', 'siteName', 'siteIdentifier', 'icao', 'latitude', 'longitude',
'elevation', 'magVariation', 'siteInMaintenance', 'siteActive', 'thirdParty',
'validated', 'country', 'state', 'wxTable', 'operatedBy', 'attribution'
]
return data[keep_cols]
This will return a Pandas dataframe with a bunch of information about each airport. We only really need the siteId, but the other information will help if we need to fetch only cameras in a specific state, or want to save an airport code with each image.
I’m going to pick the last airport in the site list, Silver West Airport, and use its code (585) to demonstrate the next part.
Next, find the API for getting camera image data for a site. I looked in Network Inspector again. It’s https://api.weathercams.faa.gov/summary?siteId=xxx
.
I used that to write this function.
def fetch_site_cameras(site_id):
headers = {
'User-Agent': 'Camera-Fetcher https://stackoverflow.com/q/75203189',
'Accept': '*/*',
'Referer': 'https://weathercams.faa.gov/',
}
params = {
'siteId': str(site_id),
'related': 'true',
}
response = requests.get('https://api.weathercams.faa.gov/summary', params=params, headers=headers)
response.raise_for_status()
site_json = response.json()
data = []
for camera in site_json['payload']['site']['cameras']:
for image in camera.get('currentImages', []):
camera_dict = {
'direction': camera['cameraDirection'],
**image
}
data.append(camera_dict)
data = pd.DataFrame(data).drop(columns='imageFilename')
return data
Calling this with a site code produces the following dataframe.
print(fetch_site_cameras(158))
direction cameraId imageUri
0 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
1 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
2 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
3 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
4 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
.. ... ... ...
103 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
104 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
105 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
106 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
107 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
imageDatetime
0 2023-01-22T21:56:31.273Z
1 2023-01-22T21:46:42.831Z
2 2023-01-22T21:36:25.861Z
3 2023-01-22T21:26:27.947Z
4 2023-01-22T21:16:29.951Z
.. ...
103 2023-01-22T17:06:27.575Z
104 2023-01-22T16:56:28.609Z
105 2023-01-22T16:46:28.264Z
106 2023-01-22T16:36:42.719Z
107 2023-01-22T16:23:24.466Z
[108 rows x 4 columns]
The imageUri for each camera direction represents a URI at which we can download the most recent image for each camera. You can then use creqests to download the image.
I’m just going to arbitrarily pick the first camera in this list, the SW camera, to download from.
image_uri = fetch_site_cameras(585).loc[0, 'imageUri']
image = requests.get(image_uri)
with open('image.jpg', 'wb') as f:
f.write(image.content)
… and that downloads the image to image.jpg. I can open it and see the end result of our hard work:
Well, I hope you like looking at empty fields. You’re going to see a lot of them.
I am having difficulties scraping data off of this website (https://weathercams.faa.gov/cameras/state/US). I have relatively minimal experience scraping data with python, so please excuse me if this is trivial, but whenever I attempt to use Selenium (shown below) or BeautifulSoup, everything returns ‘NONE’. The code shown below is my attempt to click on one of the airports listed.
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Edge()
driver.get('https://weathercams.faa.gov/cameras/state/US')
airport = driver.find_elements_by_xpath('/html/body/div/div/div[2]/div[2]/div/div[4]/div[2]/a')
airport.click()
My intention is to loop through the airports and then save each weather cam photo. Any advice or assistance would be appreciated.
Yes – although this is one of those cases where it’s easier to reverse-engineer the API and use the requests library, rather than try to control a browser.
First, you need to be able to figure out the siteId associated with each airport. If you navigate to the page in Firefox, and open the Network tab of the inspect element dialogue, you can find the API call it makes to find those site IDs. It’s https://api.weathercams.faa.gov/sites
.
You can then copy the request as cURL and convert it to Python requests code.
import requests
import pandas as pd
def fetch_sites():
headers = {
'User-Agent': 'Camera-Fetcher https://stackoverflow.com/q/75203189',
'Accept': '*/*',
'Referer': 'https://weathercams.faa.gov/',
}
response = requests.get('https://api.weathercams.faa.gov/sites', headers=headers)
response.raise_for_status()
data = pd.json_normalize(response.json()['payload'])
keep_cols = [
'siteId', 'siteName', 'siteIdentifier', 'icao', 'latitude', 'longitude',
'elevation', 'magVariation', 'siteInMaintenance', 'siteActive', 'thirdParty',
'validated', 'country', 'state', 'wxTable', 'operatedBy', 'attribution'
]
return data[keep_cols]
This will return a Pandas dataframe with a bunch of information about each airport. We only really need the siteId, but the other information will help if we need to fetch only cameras in a specific state, or want to save an airport code with each image.
I’m going to pick the last airport in the site list, Silver West Airport, and use its code (585) to demonstrate the next part.
Next, find the API for getting camera image data for a site. I looked in Network Inspector again. It’s https://api.weathercams.faa.gov/summary?siteId=xxx
.
I used that to write this function.
def fetch_site_cameras(site_id):
headers = {
'User-Agent': 'Camera-Fetcher https://stackoverflow.com/q/75203189',
'Accept': '*/*',
'Referer': 'https://weathercams.faa.gov/',
}
params = {
'siteId': str(site_id),
'related': 'true',
}
response = requests.get('https://api.weathercams.faa.gov/summary', params=params, headers=headers)
response.raise_for_status()
site_json = response.json()
data = []
for camera in site_json['payload']['site']['cameras']:
for image in camera.get('currentImages', []):
camera_dict = {
'direction': camera['cameraDirection'],
**image
}
data.append(camera_dict)
data = pd.DataFrame(data).drop(columns='imageFilename')
return data
Calling this with a site code produces the following dataframe.
print(fetch_site_cameras(158))
direction cameraId imageUri
0 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
1 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
2 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
3 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
4 North 10479 https://weathercams.faa.gov/wxcam/wxdata/158/2...
.. ... ... ...
103 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
104 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
105 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
106 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
107 South 10480 https://weathercams.faa.gov/wxcam/wxdata/158/2...
imageDatetime
0 2023-01-22T21:56:31.273Z
1 2023-01-22T21:46:42.831Z
2 2023-01-22T21:36:25.861Z
3 2023-01-22T21:26:27.947Z
4 2023-01-22T21:16:29.951Z
.. ...
103 2023-01-22T17:06:27.575Z
104 2023-01-22T16:56:28.609Z
105 2023-01-22T16:46:28.264Z
106 2023-01-22T16:36:42.719Z
107 2023-01-22T16:23:24.466Z
[108 rows x 4 columns]
The imageUri for each camera direction represents a URI at which we can download the most recent image for each camera. You can then use creqests to download the image.
I’m just going to arbitrarily pick the first camera in this list, the SW camera, to download from.
image_uri = fetch_site_cameras(585).loc[0, 'imageUri']
image = requests.get(image_uri)
with open('image.jpg', 'wb') as f:
f.write(image.content)
… and that downloads the image to image.jpg. I can open it and see the end result of our hard work:
Well, I hope you like looking at empty fields. You’re going to see a lot of them.