How to read a specific table from a given url?
Question:
I am new to python and trying to download the countries GDP per capita data. I am trying to read the data from this website: https://worldpopulationreview.com/countries/by-gdp
I tried to read the data but, I found no tables found error.
I can see the data is in r.text
but somehow pandas can not read that table.
How to solve the problem and read the data?
MWE
import pandas as pd
import requests
url = "https://worldpopulationreview.com/countries/by-gdp"
r = requests.get(url)
raw_html = r.text # I can see the data is here, but pd.read_html says no tables found
df_list = pd.read_html(raw_html)
print(len(df_list))
Answers:
Data is embedded via <script id="__NEXT_DATA__" type="application/json">
and rendered by browser only, so you have to adjust your script a bit:
pd.json_normalize(
json.loads(
BeautifulSoup(
requests.get(url).text
).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
Example
import pandas as pd
import requests,json
from bs4 import BeautifulSoup
url = "https://worldpopulationreview.com/countries/by-gdp"
df = pd.json_normalize(
json.loads(
BeautifulSoup(
requests.get(url).text
).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
df[['continent', 'country', 'pop','imfGDP', 'unGDP', 'gdpPerCapita']]
Output
continent
country
pop
imfGDP
unGDP
gdpPerCapita
0
North America
United States
338290
2.08938e+13
18624475000000
61762.9
1
Asia
China
1.42589e+06
1.48626e+13
11218281029298
10423.4
…
…
…
…
…
…
…
210
Asia
Syria
22125.2
0
22163075121
1001.71
211
North America
Turks and Caicos Islands
45.703
0
917550492
20076.4
I am new to python and trying to download the countries GDP per capita data. I am trying to read the data from this website: https://worldpopulationreview.com/countries/by-gdp
I tried to read the data but, I found no tables found error.
I can see the data is in r.text
but somehow pandas can not read that table.
How to solve the problem and read the data?
MWE
import pandas as pd
import requests
url = "https://worldpopulationreview.com/countries/by-gdp"
r = requests.get(url)
raw_html = r.text # I can see the data is here, but pd.read_html says no tables found
df_list = pd.read_html(raw_html)
print(len(df_list))
Data is embedded via <script id="__NEXT_DATA__" type="application/json">
and rendered by browser only, so you have to adjust your script a bit:
pd.json_normalize(
json.loads(
BeautifulSoup(
requests.get(url).text
).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
Example
import pandas as pd
import requests,json
from bs4 import BeautifulSoup
url = "https://worldpopulationreview.com/countries/by-gdp"
df = pd.json_normalize(
json.loads(
BeautifulSoup(
requests.get(url).text
).select_one('#__NEXT_DATA__').text)['props']['pageProps']['data']
)
df[['continent', 'country', 'pop','imfGDP', 'unGDP', 'gdpPerCapita']]
Output
continent | country | pop | imfGDP | unGDP | gdpPerCapita | |
---|---|---|---|---|---|---|
0 | North America | United States | 338290 | 2.08938e+13 | 18624475000000 | 61762.9 |
1 | Asia | China | 1.42589e+06 | 1.48626e+13 | 11218281029298 | 10423.4 |
… | … | … | … | … | … | … |
210 | Asia | Syria | 22125.2 | 0 | 22163075121 | 1001.71 |
211 | North America | Turks and Caicos Islands | 45.703 | 0 | 917550492 | 20076.4 |