get more than 10 results Google custom search API

Question:

I am trying to use Google custom search API,what I want to do is search the first 20 results, I tried changing the num=10 in URL to 20 but gives 400 Error, How can I fix or requests the second page of results ?(Note I am using search entire web)

Here is the code I am using

import requests,json
url="https://www.googleapis.com/customsearch/v1?q=SmartyKat+Catnip+Cat+Toys&cx=012572433248785697579%3A1mazi7ctlvm&num=10&fields=items(link%2Cpagemap%2Ctitle)&key={YOUR_API_KEY}"
res=requests.get(url)
di=json.loads(res.text)
Asked By: user11322408

||

Answers:

Unfortunately, it is not possible to receive more than 10 results from Google custom search API. However, if you do want more results you can make multiple calls by increasing your start parameter by 10.

See this link: https://developers.google.com/custom-search/v1/using_rest#query-params

Answered By: mSkou

The information in the accepted answer https://stackoverflow.com/a/55866268/42346 is accurate.

Below is a Python function I wrote as an extension of the function in the 4th step of this answer https://stackoverflow.com/a/37084643/42346 to return up to 100 results from the Google Search API. It increases the start parameter by 10 for each API call, handling the number of results to return automatically. For example, if you request 25 results the function will induce 3 API calls of: 10 results, 10 results, and 5 results.

Background information:
For instructions on how to set-up a Google Custom Search engine: https://stackoverflow.com/a/37084643/42346
More detail about how to specify that it search the entire web here:
https://stackoverflow.com/a/11206266/42346

from googleapiclient.discovery import build
from pprint import pprint as pp
import math

def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key)
    
    num_search_results = kwargs['num']
    if num_search_results > 100:
        raise NotImplementedError('Google Custom Search API supports max of 100 results')
    elif num_search_results > 10:
        kwargs['num'] = 10 # this cannot be > 10 in API call 
        calls_to_make = math.ceil(num_search_results / 10)
    else:
        calls_to_make = 1
        
    kwargs['start'] = start_item = 1
    items_to_return = []
    while calls_to_make > 0:
        res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
        items_to_return.extend(res['items'])
        calls_to_make -= 1
        start_item += 10
        kwargs['start'] = start_item
        leftover = num_search_results - start_item + 1
        if 0 < leftover < 10:
            kwargs['num'] = leftover
        
    return items_to_return 

And here’s an example of how you’d call that:

NUM_RESULTS = 25
MY_SEARCH = 'why do cats chase their own tails'
MY_API_KEY = 'Google API key'
MY_CSE_ID = 'Custom Search Engine ID'

results = google_search(MY_SEARCH, MY_API_KEY, MY_CSE_ID, num=NUM_RESULTS)
    
for result in results:
    pp(result)
Answered By: mechanical_meat

You can extract data from Google Search without using API, it will be enough to use BeautifulSoup web scraping library. Keep in mind that you need to take care of solving CAPTCHA or IP rate limit. Could be done with rotating proxies, user-agents.

You can search for elements on a page using a CSS selectors.

To search for CSS selectors you can use SelectorGadget Chrome extension which allows clicking on the desired element in your browser and returns corresponding CSS selector (not always work perfectly if the website is rendered via JavaScript).

It is also possible to dynamically extract all results from all possible pages using non-token based pagination. It will go through all of them, no matter how many pages there are.

You can add several options for exiting the loop, such as exit by page limit and exit if there is no "next page" button:

if page_num == page_limit:                    # exit by page limit
    break
if soup.select_one(".d6cvqb a[id=pnnext]"):   # exit on missing button
    params["start"] += 10
else:
    break

Check code with pagination in the online IDE.

from bs4 import BeautifulSoup
import requests, json, lxml

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
    "q": "SmartyKat Catnip Cat Toys",    # query
    "hl": "en",                          # language
    "gl": "uk",                          # country of the search, UK -> United Kingdom
    "start": 0,                          # number page by default up to 0
    #"num": 100                          # parameter defines the maximum number of results to return.
}

# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}

page_limit = 5        
page_num = 0

data = []

while True:
    page_num += 1
    print(f"page: {page_num}")
        
    html = requests.get("https://www.google.com/search", params=params, headers=headers, timeout=30)
    soup = BeautifulSoup(html.text, 'lxml')
    
    for result in soup.select(".tF2Cxc"):
        title = result.select_one(".DKV0Md").text
        try:
           snippet = result.select_one(".lEBKkf span").text
        except:
           snippet = None
        links = result.select_one(".yuRUbf a")["href"]
      
        data.append({
          "title": title,
          "snippet": snippet,
          "links": links
        })

    if page_num == page_limit:
        break
    if soup.select_one(".d6cvqb a[id=pnnext]"):
        params["start"] += 10
    else:
        break
print(json.dumps(data, indent=2, ensure_ascii=False))

Example output:

[
    {
    "title": "SmartyKat Catnip Chase Cat Toy - I Love My Pets",
    "snippet": "Catnip Chase™ compressed catnip toy Play SmartyKat offers a variety of toys to meet a cat's need for hunting, exercise, excitement, interaction, ...",
    "links": "https://www.ilovemypets.ph/index.php?route=product/product&product_id=1670"
  },
  {
    "title": "Kitties & Their Humans - Facebook",
    "snippet": "5 IN STOCK* SmartyKat Catnip Cat Toys Brand: SmartyKat Style: Madcap Mania™ Refillable Assorted Mice Catnip Cat Toy Style: Mice (Random Selection)...",
    "links": "https://m.facebook.com/2674028906242223/"
  },
  other results ...
]

Also, like alternative, you can use third-party API like Google Search Engine Results API from SerpApi. It’s a paid API with a free plan.

The difference is that it will bypass blocks (including CAPTCHA) from Google, no need to create the parser and maintain it.

Example SerpApi code with pagination:

from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json, os

params = {
  "api_key": "...",                 # serpapi key from https://serpapi.com/manage-api-key
  "engine": "google",               # serpapi parser engine
  "q": "SmartyKat Catnip Cat Toys", # search query
  "num": "100"                      # number of results per page (100 per page in this case)
  # other search parameters: https://serpapi.com/search-api#api-parameters
}

search = GoogleSearch(params)      # where data extraction happens

organic_results_data = []
page_num = 0

while True:
    results = search.get_dict()    # JSON -> Python dictionary
    
    page_num += 1
    
    for result in results["organic_results"]:
        organic_results_data.append({
            "title": result.get("title"),
            "snippet": result.get("snippet"),
            "link": result.get("link")
        })
    
    if "next_link" in results.get("serpapi_pagination", {}):
        search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next_link")).query)))
    else:
        break
    
print(json.dumps(organic_results_data, indent=2, ensure_ascii=False))

Output: the same as in the bs4 solution.

Answered By: Denis Skopa
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.