Pagination not showing up in parsed content (BeautifulSoup)
Question:
I am new to python programming and I have a problem with pagination while using beautiful soup. all the parsed content show up except the pagination contents. image of content not showing up I have highlighted the lines which does not show up.
Website link.
from bs4 import BeautifulSoup
import requests
import time
import pandas as pd
from lxml import html
url = "https://www.yellowpages.lk/Medical.php"
result = requests.get(url)
time.sleep(5)
doc = BeautifulSoup(result.content, "lxml")
time.sleep(5)
Table = doc.find('table',{'id':'MedicalFacility'}).find('tbody').find_all('tr')
Page = doc.select('.col-lg-10')
C_List = []
D_List = []
N_List = []
A_List = []
T_List = []
W_List = []
V_List = []
M_List = []
print(doc.prettify())
print(Page)
while True:
for i in range(0,25):
Sort = Table[i]
Category = Sort.find_all('td')[0].get_text().strip()
C_List.insert(i,Category)
District = Sort.find_all('td')[1].get_text().strip()
D_List.insert(i,District)
Name = Sort.find_all('td')[2].get_text().strip()
N_List.insert(i,Name)
Address = Sort.find_all('td')[3].get_text().strip()
A_List.insert(i,Address)
Telephone = Sort.find_all('td')[4].get_text().strip()
T_List.insert(i,Telephone)
Whatsapp = Sort.find_all('td')[5].get_text().strip()
W_List.insert(i,Whatsapp)
Viber = Sort.find_all('td')[6].get_text().strip()
V_List.insert(i,Viber)
MoH_Division = Sort.find_all('td')[7].get_text().strip()
M_List.insert(i,MoH_Division)
I tried using .find() with class and .select(‘.class’) to see if the pagination contents show up so far nothing has worked
Answers:
The pagination is more or less superfluous in that page: the data is loaded anyway, and Javascript is generating pagination just for display purposes: Requests will get full data anyway.
Here is one way of getting that information in full:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}
url = 'https://www.yellowpages.lk/Medical.php'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
table = soup.select_one('table[id="MedicalFacility"]')
df = pd.read_html(str(table))[0]
print(df)
Result in terminal:
Category District Name Address Telephone WhatsApp Viber MoH Division
0 Pharmacy Gampaha A & B Pharmacy 171 Negambo Road Veyangoda 0778081515 9.477808e+10 9.477808e+10 Aththanagalla
1 Pharmacy Trincomalee A A Pharmacy 350 Main Street Kanthale 0755576998 9.475558e+10 9.475558e+10 Kanthale
2 Pharmacy Colombo A Baur & Co Pvt Ltd 55 Grandpass Rd Col 14 0768200100 9.476820e+10 9.476820e+10 CMC
3 Pharmacy Colombo A Colombo Pharmacy Ug 93 97 Peoples Park Colombo 11 0773771446 9.477377e+10 NaN CMC
4 Pharmacy Trincomalee A R Pharmacy Main Street Kinniya-3 0771413838 9.477500e+10 9.477500e+10 Kinniya
... ... ... ... ... ... ... ... ...
1968 Pharmacy Ampara Zam Zam Pharmacy Main Street Akkaraipattu 0672277698 9.477756e+10 9.477756e+10 Akkaraipattu
1969 Pharmacy Batticaloa Zattra Pharmacy Jummah Mosque Rd Oddamawadi-1 0766689060 9.476669e+10 NaN Oddamavady
1970 Pharmacy Puttalam Zeenath Pharmacy Norochcholei 0728431622 NaN NaN Kalpitiya
1971 Pharmacy Puttalam Zidha Pharmacy Norochcholei 0773271222 NaN NaN Kalpitiya
1972 Pharmacy Gampaha Zoomcare Pharmacy & Grocery 182/B/1 Rathdoluwa Seeduwa 0768378112 NaN NaN Seeduwa
1973 rows × 8 columns
See pandas documentation here. Also BeautifulSoup documentation, and lastly, Requests documentation.
If you are using pandas, all you need is just a couple of lines of code to put the entire table into a dataframe.
All you need is pandas.read_html() function as follows:
Code:
import pandas as pd
df = pd.read_html("https://www.yellowpages.lk/Medical.php")[0]
print(df)
Output:
I am new to python programming and I have a problem with pagination while using beautiful soup. all the parsed content show up except the pagination contents. image of content not showing up I have highlighted the lines which does not show up.
Website link.
from bs4 import BeautifulSoup
import requests
import time
import pandas as pd
from lxml import html
url = "https://www.yellowpages.lk/Medical.php"
result = requests.get(url)
time.sleep(5)
doc = BeautifulSoup(result.content, "lxml")
time.sleep(5)
Table = doc.find('table',{'id':'MedicalFacility'}).find('tbody').find_all('tr')
Page = doc.select('.col-lg-10')
C_List = []
D_List = []
N_List = []
A_List = []
T_List = []
W_List = []
V_List = []
M_List = []
print(doc.prettify())
print(Page)
while True:
for i in range(0,25):
Sort = Table[i]
Category = Sort.find_all('td')[0].get_text().strip()
C_List.insert(i,Category)
District = Sort.find_all('td')[1].get_text().strip()
D_List.insert(i,District)
Name = Sort.find_all('td')[2].get_text().strip()
N_List.insert(i,Name)
Address = Sort.find_all('td')[3].get_text().strip()
A_List.insert(i,Address)
Telephone = Sort.find_all('td')[4].get_text().strip()
T_List.insert(i,Telephone)
Whatsapp = Sort.find_all('td')[5].get_text().strip()
W_List.insert(i,Whatsapp)
Viber = Sort.find_all('td')[6].get_text().strip()
V_List.insert(i,Viber)
MoH_Division = Sort.find_all('td')[7].get_text().strip()
M_List.insert(i,MoH_Division)
I tried using .find() with class and .select(‘.class’) to see if the pagination contents show up so far nothing has worked
The pagination is more or less superfluous in that page: the data is loaded anyway, and Javascript is generating pagination just for display purposes: Requests will get full data anyway.
Here is one way of getting that information in full:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
}
url = 'https://www.yellowpages.lk/Medical.php'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
table = soup.select_one('table[id="MedicalFacility"]')
df = pd.read_html(str(table))[0]
print(df)
Result in terminal:
Category District Name Address Telephone WhatsApp Viber MoH Division
0 Pharmacy Gampaha A & B Pharmacy 171 Negambo Road Veyangoda 0778081515 9.477808e+10 9.477808e+10 Aththanagalla
1 Pharmacy Trincomalee A A Pharmacy 350 Main Street Kanthale 0755576998 9.475558e+10 9.475558e+10 Kanthale
2 Pharmacy Colombo A Baur & Co Pvt Ltd 55 Grandpass Rd Col 14 0768200100 9.476820e+10 9.476820e+10 CMC
3 Pharmacy Colombo A Colombo Pharmacy Ug 93 97 Peoples Park Colombo 11 0773771446 9.477377e+10 NaN CMC
4 Pharmacy Trincomalee A R Pharmacy Main Street Kinniya-3 0771413838 9.477500e+10 9.477500e+10 Kinniya
... ... ... ... ... ... ... ... ...
1968 Pharmacy Ampara Zam Zam Pharmacy Main Street Akkaraipattu 0672277698 9.477756e+10 9.477756e+10 Akkaraipattu
1969 Pharmacy Batticaloa Zattra Pharmacy Jummah Mosque Rd Oddamawadi-1 0766689060 9.476669e+10 NaN Oddamavady
1970 Pharmacy Puttalam Zeenath Pharmacy Norochcholei 0728431622 NaN NaN Kalpitiya
1971 Pharmacy Puttalam Zidha Pharmacy Norochcholei 0773271222 NaN NaN Kalpitiya
1972 Pharmacy Gampaha Zoomcare Pharmacy & Grocery 182/B/1 Rathdoluwa Seeduwa 0768378112 NaN NaN Seeduwa
1973 rows × 8 columns
See pandas documentation here. Also BeautifulSoup documentation, and lastly, Requests documentation.
If you are using pandas, all you need is just a couple of lines of code to put the entire table into a dataframe.
All you need is pandas.read_html() function as follows:
Code:
import pandas as pd
df = pd.read_html("https://www.yellowpages.lk/Medical.php")[0]
print(df)