BeautifulSoup4 returns returning html without data in

Question:

As an example I have code like this:

import requests
from bs4 import BeautifulSoup

def get_data(url):
    r = requests.get(url).text
    soup = BeautifulSoup(r, 'html.parser')
    word = soup.find(class_='mdl-cell mdl-cell--11-col')
    print(word)


get_data('http://savodxon.uz/izoh?sher')

I don’t know why, but when I print the word there will be nothing

Like this:

<h2 class="mdl-cell mdl-cell--11-col" id="definition_l_title"></h2>

But should be like this:

<h2 id="definition_l_title" class="mdl-cell mdl-cell--11-col">acha</h2>
Asked By: Ryadovoy Zadrot

||

Answers:

The data you see on the page is loaded via JavaScript from external URL so beautifulsoup cannot see it. To load the data you can use requests module:

import requests

api_url = "https://savodxon.uz/api/get_definition"
data = requests.post(api_url, data={"word": "sher"}).json()
print(data)

Prints:

{
    "core": "",
    "definition": [
        {
            "meanings": [
                {
                    "examples": [
                        {
                            "takenFrom": "Maqol",
                            "text": "Ovchining zoʻri sher otadi, Dehqonning zoʻri yer ochadi.",
                        },
                        {
                            "takenFrom": "Maqol",
                            "text": "Oʻzingni er bilsang, oʻzgani sher bil.",
                        },
                        {
                            "takenFrom": "Ertaklar",
                            "text": "Bular [uch ogʻayni botirlar] tushgan toʻqayning narigi tomonida bir sherning makoni bor edi.",
                        },
                    ],
                    "reference": "",
                    "tags": "",
                    "text": "Mushuksimonlar oilasiga mansub, kalta va sargʻish yungli (erkaklari esa qalin yolli) yirik sutemizuvchi yirtqich hayvon; arslon.",
                },
                {
                    "examples": [
                        {
                            "takenFrom": "I. Rahim, Ixlos",
                            "text": "Bu hujjatni butun rayonga tarqatmoqchimiz, sher, obroʻying oshib, choʻqqiga koʻtarilayotganingni bilasanmi?",
                        },
                        {
                            "takenFrom": "A. Qodiriy, Oʻtgan kunlar",
                            "text": "— Balli, sher, xatni qoʻlingizdan kim oldi? — Bir chol.",
                        },
                        {
                            "takenFrom": "Yusuf va Ahmad",
                            "text": "Yoppa yov-lik otga mining, sherlarim.",
                        },
                        {
                            "takenFrom": "Bahrom va Gulandom",
                            "text": "Figʻon qilgan bunda sherlar, Yoʻlbars, qoplon, bunda erlar",
                        },
                    ],
                    "reference": "",
                    "tags": "koʻchma",
                    "text": "Shaxsni sherga nisbatlab ataydi (“azamat“, “botir“ polvon maʼnosida).",
                },
            ],
            "phrases": [
                {
                    "meanings": [
                        {
                            "examples": [
                                {
                                    "takenFrom": "Gazetadan",
                                    "text": "Ichkilikning zoʻridan sher boʻlib ketgan Yazturdi endi koʻcha harakati qoidasini unutib qoʻygan edi.",
                                },
                                {
                                    "takenFrom": "H. Tursunqulov, Hayotim qissasi",
                                    "text": "Balli, azamat, bugun jang vaqtida sher boʻlib ketding.",
                                },
                            ],
                            "reference": "",
                            "tags": "ayn.",
                            "text": "Sherlanmoq.",
                        }
                    ],
                    "tags": "",
                    "text": "Sher boʻlmoq",
                }
            ],
            "tags": "",
        }
    ],
    "isDerivative": False,
    "tailStructure": "",
    "type": "ot",
    "wordExists": True,
}

EDIT: To get words:

import requests

api_url = "https://savodxon.uz/api/search"
d = {"keyword": "sher", "names": "[object HTMLInputElement]"}
data = requests.post(api_url, data=d).json()
print(data)

Prints:

{
    "success": True,
    "matchFound": True,
    "suggestions": [
        "sher",
        "sherboz",
        "sherdil",
        "sherik",
        "sherikchilik",
        "sheriklashmoq",
        "sheriklik",
        "sherlanmoq",
        "sherobodlik",
        "sherolgʻin",
        "sheroz",
        "sheroza",
        "sherqadamlik",
        "shershikorlik",
        "sherst",
    ],
}
Answered By: Andrej Kesely

You have common problem with modern pages: this page uses JavaScript to add/update elements but BeautifulSoup/lxml, requests/urllib can’t run JavaScript.

You may need Selenium to control real web browser which can run JS. OR use (manually) DevTools in Firefox/Chrome (tab Network) to see if JavaScript reads data from some URL. And try to use this URL with requests. JS usually gets JSON which can be easy converted to Python dictionary (without BS). You can also check if page has (free) API for programmers.


Using DevTools I found it read data from other URLs (using post)

http://savodxon.uz/api/search

http://savodxon.uz/api/get_definition

and they give results as JSON data so it doesn’t need beautifulsoup

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0',
    'X-Requested-With': 'XMLHttpRequest',
}

# ---- suggestions ---

url = 'http://savodxon.uz/api/search'

payload = {
    'keyword': 'sher',
    'names': '[object HTMLInputElement]',
}

response = requests.post(url, data=payload, headers=headers)
data = response.json()
#print(data)

# ---

print('--- suggestions ---')
for word in data['suggestions']:
    print('-', word)

# --- definitons ---

url = 'http://savodxon.uz/api/get_definition'

payload = {
    'word': 'sher',
}

response = requests.post(url, data=payload, headers=headers)
data = response.json()
#print(data.keys())

print('--- definitons ---')

for item in data['definition']:
    for meaning in item['meanings']:
        print(meaning['text'])
        for example in meaning['examples']:
            print('-', example['text'], f"({example['takenFrom']})")

Result:

--- suggestions ---

- sher
- sherboz
- sherdil
- sherik
- sherikchilik
- sheriklashmoq
- sheriklik
- sherlanmoq
- sherobodlik
- sherolgʻin
- sheroz
- sheroza
- sherqadamlik
- shershikorlik
- sherst

--- definitons ---

Mushuksimonlar oilasiga mansub, kalta va sargʻish yungli (erkaklari esa qalin yolli) yirik sutemizuvchi yirtqich hayvon; arslon.
- Ovchining zoʻri sher otadi, Dehqonning zoʻri yer ochadi. (Maqol)
- Oʻzingni er bilsang, oʻzgani sher bil. (Maqol)
- Bular [uch ogʻayni botirlar] tushgan toʻqayning narigi tomonida bir sherning makoni bor edi. (Ertaklar)
Shaxsni sherga nisbatlab ataydi (“azamat“, “botir“ polvon maʼnosida).
- Bu hujjatni butun rayonga tarqatmoqchimiz, sher, obroʻying oshib, choʻqqiga koʻtarilayotganingni bilasanmi? (I. Rahim, Ixlos)
- — Balli, sher, xatni qoʻlingizdan kim oldi? — Bir chol. (A. Qodiriy, Oʻtgan kunlar)
- Yoppa yov-lik otga mining, sherlarim. (Yusuf va Ahmad)
- Figʻon qilgan bunda sherlar, Yoʻlbars, qoplon, bunda erlar (Bahrom va Gulandom)

BTW:

You may also run it without headers.

Here is example video (without sound) how to use DevTools

How to use DevTools in Firefox to find JSON data in EpicGames.com – YouTube

Answered By: furas

Answering the question you posted in the comments: no, you cannot get all the words stored in the database, because as the owner of this site I will not allow it happen 🙂

Answered By: Javlon Juraev