Extracting Tables in Python from a website
Question:
I want to extract the table on this website: https://www.wikirating.com/list-of-countries-by-credit-rating/
When I try with this code, I only obtain the first two lines of the website? What do I wrong or how can I specify that I want to extract the table?
import requests
import pandas as pd
url = 'https://www.wikirating.com/list-of-countries-by-credit-rating/'
html = requests.get(url).content
df_list = pd.read_html(html)
print(df)
df.to_csv('my data.csv')
Answers:
I recommend using BeautifulSoup
. Here’s something to get you started:
import requests
from bs4 import BeautifulSoup
url = 'https://www.wikirating.com/list-of-countries-by-credit-rating/'
html = requests.get(url).content
soup = BeautifulSoup(html, 'html.parser')
# Find all tables on the page
tables = soup.find_all('table')
# Loop through each table
for table in tables:
# Find all rows in the table
rows = table.find_all('tr')
# Loop through each row and print the first three elements
for row in rows:
cells = row.find_all('td')
# grabs the first four elements of the row and reads them
if len(cells) >= 4:
print(cells[0].text, cells[1].text, cells[2].text, cells[3].text)
I want to extract the table on this website: https://www.wikirating.com/list-of-countries-by-credit-rating/
When I try with this code, I only obtain the first two lines of the website? What do I wrong or how can I specify that I want to extract the table?
import requests
import pandas as pd
url = 'https://www.wikirating.com/list-of-countries-by-credit-rating/'
html = requests.get(url).content
df_list = pd.read_html(html)
print(df)
df.to_csv('my data.csv')
I recommend using BeautifulSoup
. Here’s something to get you started:
import requests
from bs4 import BeautifulSoup
url = 'https://www.wikirating.com/list-of-countries-by-credit-rating/'
html = requests.get(url).content
soup = BeautifulSoup(html, 'html.parser')
# Find all tables on the page
tables = soup.find_all('table')
# Loop through each table
for table in tables:
# Find all rows in the table
rows = table.find_all('tr')
# Loop through each row and print the first three elements
for row in rows:
cells = row.find_all('td')
# grabs the first four elements of the row and reads them
if len(cells) >= 4:
print(cells[0].text, cells[1].text, cells[2].text, cells[3].text)