Retrieving data from the Air Quality Index (AQI) website through the API and only recieving small nr. of stations

Question:

I’m working on a personal project and I’m trying to retrieve air quality data from the https://aqicn.org website using their API.

I’ve used this code, which I’ve copied and adapted for the city of Bucharest as follows:

import pandas as pd 
import folium 
import requests

# GET data from AQI website through the API

base_url = "https://api.waqi.info"
path_to_file = "~/path"

# Got token from:- https://aqicn.org/data-platform/token/#/
with open(path_to_file) as f:
    contents = f.readlines()
    key = contents[0]

# (lat, long)-> bottom left, (lat, lon)-> top right
latlngbox = "44.300264,25.920181,44.566991,26.297836" # For Bucharest 
trail_url=f"/map/bounds/?token={key}&latlng={latlngbox}" #

my_data = pd.read_json(base_url + trail_url) # Joined parts of URL
print('columns->', my_data.columns) #2 cols ‘status’ and ‘data’ JSON

### Built a dataframe from the json file 
all_rows = []
for each_row in my_data['data']:
    all_rows.append([each_row['station']['name'],
    each_row['lat'],
    each_row['lon'],
    each_row['aqi']])
df = pd.DataFrame(all_rows, columns=['station_name', 'lat', 'lon', 'aqi'])

# Cleaned the DataFrame
df['aqi'] = pd.to_numeric(df.aqi, errors='coerce') # Invalid parsing to NaN
# Remove NaN entries in col
df1 = df.dropna(subset = ['aqi'])

Unfortunately it only retrieves 4 stations whereas there are many more available on the actual site. In the API documentation the only limitation I saw was for "1,000 (one thousand) requests per second" so why can’t I get more of them?

Also, I’ve tried to modify the lat-long values and managed to get more stations, but they were outside the city I was interested in.

Here is a view of the actual perimeter I’ve used in the embedded code.

If you have any suggestions as of how I can solve this issue, I’d be very happy to read your thoughts. Thank you!

Answers:

Try using waqi through aqicn… not exactly a clean API but I found it to work quite well

import pandas as pd
url1 = 'https://api.waqi.info'
# Get token from:- https://aqicn.org/data-platform/token/#/
token = 'XXX'
box = '113.805332,22.148942,114.434299,22.561716' # polygon around HongKong via bboxfinder.com
url2=f'/map/bounds/?latlng={box}&token={token}'
my_data = pd.read_json(url1 + url2) 

all_rows = []
for each_row in my_data['data']:
    all_rows.append([each_row['station']['name'],each_row['lat'],each_row['lon'],each_row['aqi']])
    df = pd.DataFrame(all_rows,columns=['station_name', 'lat', 'lon', 'aqi'])

From there its easy to plot

df['aqi'] = pd.to_numeric(df.aqi,errors='coerce')
print('with NaN->', df.shape)


df1 = df.dropna(subset = ['aqi'])

df2 = df1[['lat', 'lon', 'aqi']]
init_loc = [22.396428, 114.109497]
max_aqi = int(df1['aqi'].max())
print('max_aqi->', max_aqi)
m = folium.Map(location = init_loc, zoom_start = 5)

heat_aqi = HeatMap(df2, min_opacity = 0.1, max_val = max_aqi,
radius = 60, blur = 20, max_zoom = 2)
m.add_child(heat_aqi)
m

Or as such

centre_point = [22.396428, 114.109497]
m2 = folium.Map(location = centre_point,tiles = 'Stamen Terrain', zoom_start= 6)
for idx, row in df1.iterrows():
    lat = row['lat']
    lon = row['lon']
    station = row['station_name'] + ' AQI=' + str(row['aqi'])
    station_aqi = row['aqi']
    if station_aqi > 300:
        pop_color = 'red'
    elif station_aqi > 200:
        pop_color = 'orange'
    else:
        pop_color = 'green'
        folium.Marker(location= [lat, lon],
        popup = station,
        icon = folium.Icon(color = pop_color)).add_to(m2)
m2

checking for stations within HK, returns 19

df[df['station_name'].str.contains('HongKong')]
Answered By: Daniel Morgan

Use Ambee Air Quality API, to retrieve air quality data using Python and Pandas.

Answered By: Richard Morrison