How to read a specific table from a given url?

Question

I am new to python and trying to download the countries GDP per capita data. I am trying to read the data from this website: https://worldpopulationreview.com/countries/by-gdp

I tried to read the data but, I found no tables found error.
I can see the data is in r.text but somehow pandas can not read that table.
How to solve the problem and read the data?

MWE

import pandas as pd
import requests

url = "https://worldpopulationreview.com/countries/by-gdp"

r = requests.get(url)
raw_html = r.text  # I can see the data is here, but pd.read_html says no tables found
df_list = pd.read_html(raw_html)
print(len(df_list))

Asked By: dallascow

||

Source

Answer 1

Data is embedded via <script id="__NEXT_DATA__" type="application/json"> and rendered by browser only, so you have to adjust your script a bit:

pd.json_normalize(
    json.loads(
        BeautifulSoup(
            requests.get(url).text
        ).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)

Example

import pandas as pd
import requests,json
from bs4 import BeautifulSoup

url = "https://worldpopulationreview.com/countries/by-gdp"


df = pd.json_normalize(
    json.loads(
        BeautifulSoup(
            requests.get(url).text
        ).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
df[['continent', 'country', 'pop','imfGDP', 'unGDP', 'gdpPerCapita']]

Output

	continent	country	pop	imfGDP	unGDP	gdpPerCapita
0	North America	United States	338290	2.08938e+13	18624475000000	61762.9
1	Asia	China	1.42589e+06	1.48626e+13	11218281029298	10423.4
…	…	…	…	…	…	…
210	Asia	Syria	22125.2	0	22163075121	1001.71
211	North America	Turks and Caicos Islands	45.703	0	917550492	20076.4

Answered By: HedgeHog

How to read a specific table from a given url?

Question:

MWE

Answers:

Example

Output