Python: quoted URL is not converted correctly on the website using requests

Question

I’m trying to scrape some German sentences from Glosbe.com. The requested URL contains some utf-8 characters. The website doesn’t change the quoted characters to utf-8 characters after the request is done. The requested URl should look like this

https://glosbe.com/de/hu/abkühlen

But the requested URL from the website is not converted to utf-8 and the searched word is this

https://glosbe.com/de/hu/abk%C3%BChlen/

The used code:

def beautifulSoapPrepare(sourceLang,destLang,phrase):
    headers = {
            'User-Agent': 'My User Agent 1.0',
            'From': '[email protected]'  # This is another valid field
        }
    url="https://glosbe.com/"+sourceLang+"/"+destLang+"/"+urllib.parse.quote(phrase)+"/"
    r = requests.get(url, "lxml",headers=headers)
    soup = BeautifulSoup(r.content,features="lxml")
    return soup

The picture here shows the problem.
The problem in picture

Could you please help me solve this issue? I want the website to search for the German word abkühlen and not this abk%C3%BChlen.

Solution:
The Problem was in the URL. Once I deleted the slash at the end of the URL it worked.

Before:

url="https://glosbe.com/"+sourceLang+"/"+destLang+"/"+urllib.parse.quote(phrase)+"/"

After:

url="https://glosbe.com/"+sourceLang+"/"+destLang+"/"+urllib.parse.quote(phrase)

Asked By: mrad2

||

Source

Answer 1

Given your ultimate goal is to obtain the translation(s) of the particular word you’re looking for, the following code will give you just that (and you can eventually class it, functionalize it, whatever you want):

import requests
from bs4 import BeautifulSoup as bs

url = 'https://glosbe.com/de/hu/'

word = 'abkühlen'

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url + word, headers=headers)

soup = bs(r.text, 'html.parser')
translations = soup.select('h3.translation')
for t in translations:
    print(t.get_text(strip=True))

The result printed in terminal:

lehűl
hűtés
lehűt
hűvös
hűtés
előhűtés

Requests documentation can be found at https://requests.readthedocs.io/en/latest/

Also, BeautifulSoup docs are at: https://beautiful-soup-4.readthedocs.io/en/latest/index.html

Answered By: Barry the Platipus

Python: quoted URL is not converted correctly on the website using requests

Question:

Answers: