Using python requests with google search

Question:

I’m a newbie with python.
In PyCharm I wrote this code:

import requests
from bs4 import BeautifulSoup

response = requests.get(f"https://www.google.com/search?q=fitness+wear")
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

Instead getting the HTML of the search results, what I get is the HTML of the following page
enter image description here

I use the same code within a script on pythonanywhere.com and it works perfectly. I’ve tried lots of the solutions I found but the result is always the same, so now I’m stuck with it.

Asked By: Neuran

||

Answers:

I think this should work:

import requests
from bs4 import BeautifulSoup

with requests.Session() as s:
    url = f"https://www.google.com/search?q=fitness+wear"
    headers = {
        "referer":"referer: https://www.google.com/",
        "user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
        }
    s.post(url, headers=headers)
    response = s.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

It uses a request session and a post request to create any initial cookies (not fully sure on this) and then allows you scrape.

Answered By: Dimitriy Kruglikov

If you open up a private Window in your browser and go to google.com, you should see the same pop-up prompting you to give your consent. This is, because you don’t have session cookies send.

You have different options to tackle this.
One would be sending the cookies you can observe on the website with the request directly like so:

import requests
cookies = {"CONSENT":"YES+shp.gws-20210330-0-RC1.de+FX+412", ...}

resp = request.get(f"https://www.google.com/search?q=fitness+wear",cookies=cookies)

The solution @Dimitriy Kruglikov uses is a lot cleaner though and using sessions is a good way of having a persistent Session with the website.

Answered By: lightstack

Google doesn’t block you, you still can extract data from the HTML.

Using cookies isn’t very convenient and using session with post and get requests will lead to a bigger amount of traffic.

You can remove this popup by either using decompose() or extract() BS4 methods:

  • annoying_popup.decompose() will completely destroy it and its contents. Documentation.

  • annoying_popup.extract() will make another html tree: one rooted at the BeautifulSoup object you used to parse the document, and one rooted at the tag that was extracted. Documentation.

After that, you can scrape everything you need as well as without removing it.

See this Organic Results extraction I did recently. It scrapes title, summary, and link from Google Search Results.


Alternatively, you can use Google Search Engine Results API from SerpApi. Check out the Playground.

Code and example in online IDE:

from serpapi import GoogleSearch
import os

params = {
  "engine": "google",
  "q": "fus ro dah",
  "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results['organic_results']:
  print(f"Title: {result['title']}nSnippet: {result['snippet']}nLink: {result['link']}n")

Output:

Title: Skyrim - FUS RO DAH (Dovahkiin) HD - YouTube
Snippet: I looked around for a fan made track that included Fus Ro Dah, but the ones that I found were pretty bad - some ...
Link: https://www.youtube.com/watch?v=JblD-FN3tgs

Title: Unrelenting Force (Skyrim) | Elder Scrolls | Fandom
Snippet: If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: "Fus Rah Do" instead of the proper "Fus Ro Dah." ...
Link: https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)

Title: Fus Ro Dah | Know Your Meme
Snippet: Origin. "Fus Ro Dah" are the words for the "unrelenting force" thu'um shout in the game Elder Scrolls V: Skyrim. After reaching the first town of ...
Link: https://knowyourmeme.com/memes/fus-ro-dah

Title: Fus ro dah - Urban Dictionary
Snippet: 1. A dragon shout used in The Elder Scrolls V: Skyrim. 2.An international term for oral sex given by a female. ex.1. The Dragonborn yelled "Fus ...
Link: https://www.urbandictionary.com/define.php?term=Fus%20ro%20dah

Part of JSON:

"organic_results": [
  {
    "position": 1,
    "title": "Unrelenting Force (Skyrim) | Elder Scrolls | Fandom",
    "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)",
    "displayed_link": "https://elderscrolls.fandom.com › wiki › Unrelenting_F...",
    "snippet": "If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: "Fus Rah Do" instead of the proper "Fus Ro Dah." ...",
    "sitelinks": {
      "inline": [
        {
          "title": "Location",
          "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Location"
        },
        {
          "title": "Effect",
          "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Effect"
        },
        {
          "title": "Usage",
          "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Usage"
        },
        {
          "title": "Word Wall",
          "link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Word_Wall"
        }
      ]
    },
    "cached_page_link": "https://webcache.googleusercontent.com/search?q=cache:K3LEBjvPps0J:https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)+&cd=17&hl=en&ct=clnk&gl=us"
  }
]

Disclaimer, I work for SerpApi.

Answered By: Dmitriy Zub

I have here a very basic script doing approximately what you seem to aim for, just with a very basic gui interface. It asks for search terms, then displays results, and one can open the results by clicking, which displays the body of the text of the page:

from requests import get as request
from random import randint
from bs4 import BeautifulSoup
from tkinter import Tk, Label, simpledialog, Toplevel, Text
from urllib.parse import quote_plus
from tkinterhtml import HtmlFrame
win = Tk()

search_terms = simpledialog.askstring("Input", "Search Terms")
encoded_search = quote_plus(search_terms)
url = f"https://www.google.nl/search?q={encoded_search}&start=0&num=10"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}
        

line = request(url, timeout=10, headers=headers,cookies={'CONSENT': 'YES+cb.20210328-17-p0.en-GB+FX+{}'.format(randint(100, 999))}).text


soup = BeautifulSoup(line, "html.parser")
output = []
for result in soup.find_all('a',href = True): 
    if "www." in result.text and "›" in result.text:
        output.append({"text": result.text,"url":result["href"].split("url=")[1].split("&ved=")[0]})

for item in output:
    l = Label(win,text=item["text"] + "n" + item["url"])
    l.pack(side="top")
    l.bind('<Button-1>', lambda event, url=item["url"]: open(event, url))

def open(event,url):
    line = request(url, timeout=10, headers=headers,cookies={'CONSENT': 'YES+cb.20210328-17-p0.en-GB+FX+{}'.format(randint(100, 999))}).text
    soup = BeautifulSoup(line, "html.parser")
    body = soup.find("body")
    
    top = Toplevel()
    frame = HtmlFrame(top, horizontal_scrollbar="auto")
    frame.place(relwidth=1,relheight=1)
    frame.set_content(body)

win.mainloop()

My question though, if anybody would read this, is the following:
What is gained by using the requests library for a google search?
For instance, do I still get cookies? Does google still recognize somehow that it was my computer which made the search, in other words, can they still make a profile of me?

Answered By: Willem van Houten