scraping a table from multiple table wikipedia
Question:
I was trying to scrape table from this url
wikipedia. There are 5 different tables there. But my target is the first table shown there. It has not much identity there, that table only contains this identity
class="wikitable sortable jquery-tablesorter"
which the other table have the same identity. I saw some source that i should take it by id
. But this table has no id
.
This
My_table = soup.find('table',{'class':'wikitable sortable'})
this is how i scrape it currently
Question
How do we choose only that table without id
Answers:
Get all tables and store in array then get table from [0] index. In this way, you can extract first table without any id.
You can select the first table using soup.find_all('table')[1]
from bs4 import BeautifulSoup
import requests
url = "https://id.wikipedia.org/wiki/Demografi_Indonesia#Jumlah_penduduk_menurut_provinsi"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
table = soup.find_all('table')[1]
rows = table.find_all('tr')
row_list = list()
for tr in rows:
td = tr.find_all('td')
row = [i.text for i in td]
row_list.append(row)
print(row_list[1:])
This is how I would do it:
import pandas as pd
url = 'https://id.wikipedia.org/wiki/Demografi_Indonesia#Jumlah_penduduk_menurut_provinsi'
df_list = pd.read_html(url)
df = df_list[1]
print(df)
import pandas as pd
url = 'https://id.wikipedia.org/wiki/Demografi_Indonesia#Jumlah_penduduk_menurut_provinsi'
list = pd.read_html(url)
dd = list[1]
I was trying to scrape table from this url
wikipedia. There are 5 different tables there. But my target is the first table shown there. It has not much identity there, that table only contains this identity
class="wikitable sortable jquery-tablesorter"
which the other table have the same identity. I saw some source that i should take it by id
. But this table has no id
.
This
My_table = soup.find('table',{'class':'wikitable sortable'})
this is how i scrape it currently
Question
How do we choose only that table without id
Get all tables and store in array then get table from [0] index. In this way, you can extract first table without any id.
You can select the first table using soup.find_all('table')[1]
from bs4 import BeautifulSoup
import requests
url = "https://id.wikipedia.org/wiki/Demografi_Indonesia#Jumlah_penduduk_menurut_provinsi"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
table = soup.find_all('table')[1]
rows = table.find_all('tr')
row_list = list()
for tr in rows:
td = tr.find_all('td')
row = [i.text for i in td]
row_list.append(row)
print(row_list[1:])
This is how I would do it:
import pandas as pd
url = 'https://id.wikipedia.org/wiki/Demografi_Indonesia#Jumlah_penduduk_menurut_provinsi'
df_list = pd.read_html(url)
df = df_list[1]
print(df)
import pandas as pd
url = 'https://id.wikipedia.org/wiki/Demografi_Indonesia#Jumlah_penduduk_menurut_provinsi'
list = pd.read_html(url)
dd = list[1]