Collecting places using Python and Google Places API

Question:

I want to collect the places around my city, Pekanbaru, with latlong (0.507068, 101.447777) and I will convert it to the dataset. Dataset (it contains place_name, place_id, lat, long and type columns).

Below is the script that I tried.

import json
import urllib.request as url_req
import time
import pandas as pd

NATAL_CENTER = (0.507068,101.447777)
API_KEY = 'API'
API_NEARBY_SEARCH_URL = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json'
RADIUS = 30000
PLACES_TYPES = [('airport', 1), ('bank', 2)] ## TESTING

# PLACES_TYPES = [('airport', 1), ('bank', 2), ('bar', 3), ('beauty_salon', 3), ('book_store', 1), ('cafe', 1), ('church', 3), ('doctor', 3), ('dentist', 2), ('gym', 3), ('hair_care', 3), ('hospital', 2), ('pharmacy', 3), ('pet_store', 1), ('night_club', 2), ('movie_theater', 1), ('school', 3), ('shopping_mall', 1), ('supermarket', 3), ('store', 3)]

def request_api(url):
    response = url_req.urlopen(url)
    json_raw = response.read()
    json_data = json.loads(json_raw)
    return json_data

def get_places(types, pages):
    location = str(NATAL_CENTER[0]) + "," + str(NATAL_CENTER[1])
    mounted_url = ('%s'
        '?location=%s'
        '&radius=%s'
        '&type=%s'
        '&key=%s') % (API_NEARBY_SEARCH_URL, location, RADIUS, types, API_KEY)

    results = []
    next_page_token = None

    if pages == None: pages = 1

    for num_page in range(pages):
        if num_page == 0:
            api_response = request_api(mounted_url)
            results = results + api_response['results']
        else:
            page_url = ('%s'
                '?key=%s'
                '&pagetoken=%s') % (API_NEARBY_SEARCH_URL, API_KEY, next_page_token)
            api_response = request_api(str(page_url))
            results += api_response['results']

        if 'next_page_token' in api_response:
            next_page_token = api_response['next_page_token']
        else: break

        time.sleep(1)
    return results

def parse_place_to_list(place, type_name):
    # Using name, place_id, lat, lng, rating, type_name
    return [
        place['name'],
        place['place_id'],
        place['geometry']['location']['lat'],
        place['geometry']['location']['lng'],
        type_name       
    ]

def mount_dataset():
    data = []

    for place_type in PLACES_TYPES:
        type_name = place_type[0]
        type_pages = place_type[1]

        print("Getting into " + type_name)

        result = get_places(type_name, type_pages)
        result_parsed = list(map(lambda x: parse_place_to_list(x, type_name), result))
        data += result_parsed

    dataframe = pd.DataFrame(data, columns=['place_name', 'place_id', 'lat', 'lng', 'type'])
    dataframe.to_csv('places.csv')

mount_dataset()

But the script returned with Empty DataFrame.
How to solve and got the right Dataset?

Asked By: ebuzz168

||

Answers:

I am afraid the scraping of the data and storing it is prohibited by the Terms of Service of Google Maps Platform.

Have a look at the Terms of Service prior to advance with the implementation. The paragraph 3.2.4 ‘Restrictions Against Misusing the Services’ reads

(a) No Scraping. Customer will not extract, export, or otherwise scrape Google Maps Content for use outside the Services. For example, Customer will not: (i) pre-fetch, index, store, reshare, or rehost Google Maps Content outside the services; (ii) bulk download Google Maps tiles, Street View images, geocodes, directions, distance matrix results, roads information, places information, elevation values, and time zone details; (iii) copy and save business names, addresses, or user reviews; or (iv) use Google Maps Content with text-to-speech services.

source: https://cloud.google.com/maps-platform/terms/#3-license

Sorry to be bearer of bad news.

Answered By: xomena

You can scrape places from Google Maps using Google Maps Place Results API from SerpApi. This is a paid API with a free plan that processes blocks and parses them on its backend. However your task is more complex than it sounds. Extracting all possible places from a specific region’s bounds (Pecanburu in this case) requires knowing city region latitude, longitude boundaries and making a parser that will not go beyond these boundaries. This answer shows a basic, starting point example.

In order to find and scrape information about a specific place, you need to set the necessary search parameters like data:

params = {
  #...
  "data": "!4m5!3m4!1s0x89c259ac80ded951:0x9eca4e0c0fe102ea!8m2!3d40.753695799999996!4d-73.988096"
  #...
}                         

In turn, to find the data of all places, you can use Google Maps Local Results API:

local_data_results = [
    {
        'data_id': str(result['data_id']),
        'latitude': str(result['gps_coordinates']['latitude']),
        'longitude': str(result['gps_coordinates']['longitude'])
    }
    for result in results['local_results']
]

Full code with pagination in the online IDE.

from serpapi import GoogleSearch
import json

# find the `data` of all places:
params = {
    'api_key': "...",                       # https://serpapi.com/manage-api-key
    'engine': 'google_maps',                # SerpApi search engine 
    'q': 'pecanburu, airport',              # query
    'll': '@0.5139625,101.3711349,12z',     # GPS coordinates, Pekanbaru City, Indonesia
    'type': 'search',                       # list of results for the query
    'hl': 'en',                             # language
    'start': 0,                             # pagination
}

search = GoogleSearch(params)               # where data extraction happens on the backend
results = search.get_dict()                 # JSON -> Python dict

local_data_results = [
    {
        'data_id': str(result['data_id']),
        'latitude': str(result['gps_coordinates']['latitude']),
        'longitude': str(result['gps_coordinates']['longitude'])
    }
    for result in results['local_results']
]

place_results = []

# find information about a specific places (airports)
for result in local_data_results:
    data = '!4m5!3m4!1s' + result['data_id'] + '!8m2!3d' + result['latitude'] + '!4d' + result['longitude']
    
    params = {
        'api_key': "...",                   # https://serpapi.com/manage-api-key
        'engine': 'google_maps',            # SerpApi search engine
        'type': 'place',                    # list of results for the query,
        'data': data                        # place result
    }
    
    search = GoogleSearch(params)
    results = search.get_dict()

    place_results.append(results['place_results'])

print(json.dumps(place_results, indent=2, ensure_ascii=False))

Example output:

[
  {
    "position": 1,
    "title": "Sultan Syarif Kasim II International Airport",
    "place_id": "ChIJmVfyaNGv1TERkw2jndzdL-s",
    "data_id": "0x31d5afd168f25799:0xeb2fdddc9da30d93",
    "data_cid": "16947007862425718163",
    "reviews_link": "https://serpapi.com/search.json?data_id=0x31d5afd168f25799%3A0xeb2fdddc9da30d93&engine=google_maps_reviews&hl=en",
    "photos_link": "https://serpapi.com/search.json?data_id=0x31d5afd168f25799%3A0xeb2fdddc9da30d93&engine=google_maps_photos&hl=en",
    "gps_coordinates": {
      "latitude": 0.4649292,
      "longitude": 101.4482987
    },
    "place_id_search": "https://serpapi.com/search.json?engine=google_maps&google_domain=google.com&hl=en&place_id=ChIJmVfyaNGv1TERkw2jndzdL-s",
    "rating": 4.4,
    "reviews": 8050,
    "unclaimed_listing": true,
    "type": "International airport",
    "address": "FC7X+X8F, Maharatu, Marpoyan Damai, Pekanbaru City, Riau 28288, Indonesia",
    "phone": "+62 761 674694",
    "website": "https://sultansyarifkasim2-airport.co.id/",
    "user_review": ""Domestic and international Airport in pekanbaru."",
    "thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNLozScQ5sqX4z-4rpEvsZCesopUaKZexy-2-MG=w80-h106-k-no"
  },
  other results ...
]

Also if you need more code explanation, you can read How to Scrape Google Maps Place Results with SerpApi blog post.

Disclaimer, I work for SerpApi.

Answered By: Denis Skopa