Not able to print data from tables using Selenium Python
Question:
This the the element that I am trying to find:
tbody class="searchable ng-scope" ng-repeat="ut in vm.unitList | filter: (leaseLength: (vm.weekOption.value I l
"')}">…</tbody>
**Xpath - //*[@id="no-more-tables"]/tbody**
And this is my code:
driver.get(url)
[enter image description here][1]
property_name = driver.title
print('Property =====',property_name)
rooms = driver.find_elements(By.XPATH, '//*[@id="no-more-tables"]/tbody')
print (len(rooms))
The length of rooms are coming 0 even though I gave the correct xpath.
Ideally it should come 5
Answers:
Data in that page is being loaded dynamically by Javascript, after loading the html. The following code will (remove the complexities of selenium and) get you the data you’re after:
import requests
import pandas as pd
s = requests.Session()
data = {"route":"unitlist","command":"","data":"{"list":{"filter":{"propertyNoFilter":"PR0170000","dateFilter":"03/09/2022"}}}"}
r = s.get('https://www.hellostudent.co.uk/student-accommodation/stoke/caledonia-mills/')
r = s.post('https://rooms.hellostudent.co.uk/DynamicsNav/Call', data=data)
# print(r.json()['list']['property'])
df = pd.DataFrame(r.json()['list']['property'])
print(df)
Result:
unitType unitSubType unitDescription noOfUnits mainPropertyNo startDate endDate leaseLength pricePerWeek biannualAvailable termPaymentAvailable features
0 SHAPT SHAPT-Q4-B4-ES Silver 4-Bed Apartment Ensuite 0 PR0170000 03/09/22 25/08/23 51 89.00 false true None
1 SHAPT SHAPT-Q4-B3-ES Silver 3-Bed Apartment Ensuite 12 PR0170000 03/09/22 25/08/23 51 93.00 false true None
2 SHAPT SHAPT-Q4-B2-ES Silver 2-Bed Apartment Ensuite 7 PR0170000 03/09/22 25/08/23 51 115.00 false true None
3 STUDIO STUDIO-Q4 Silver Studio 1 PR0170000 03/09/22 25/08/23 51 153.00 false true None
4 STUDIO STUDIO-Q3 Gold Studio 0 PR0170000 03/09/22 25/08/23 51 169.00 false true None
If you want to do it with selenium, bear in mind data is in an iframe:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import time as t
import pandas as pd
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'https://www.hellostudent.co.uk/student-accommodation/stoke/caledonia-mills/'
browser.get(url)
iframe = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//iframe[@class='panel__frame']")))
browser.switch_to.frame(iframe)
t.sleep(5)
rooms_table = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "table[id='no-more-tables']")))
df = pd.read_html(str(rooms_table.get_attribute('outerHTML')))
print(df[0])
browser.quit()
Which would display the dataframe:
ROOM TYPE PRICE PER WEEK/PER PERSON WEEKS START DATE AVAILABILITY
0 Silver 4-Bed Apartment Ensuite NaN NaN NaN Sold Out
1 Silver 3-Bed Apartment Ensuite £93 51.0 03/09/22 Available - Book Now
2 Silver 2-Bed Apartment Ensuite £115 51.0 03/09/22 Available - Book Now
3 Silver Studio £153 51.0 03/09/22 Last few remaining - book now
4 Gold Studio NaN NaN NaN Sold Out
It would make more sense to actually scrape the url that iframe is loading from: https://rooms.hellostudent.co.uk/#/RoomAvailability/caledonia-mills
This the the element that I am trying to find:
tbody class="searchable ng-scope" ng-repeat="ut in vm.unitList | filter: (leaseLength: (vm.weekOption.value I l
"')}">…</tbody>
**Xpath - //*[@id="no-more-tables"]/tbody**
And this is my code:
driver.get(url)
[enter image description here][1]
property_name = driver.title
print('Property =====',property_name)
rooms = driver.find_elements(By.XPATH, '//*[@id="no-more-tables"]/tbody')
print (len(rooms))
The length of rooms are coming 0 even though I gave the correct xpath.
Ideally it should come 5
Data in that page is being loaded dynamically by Javascript, after loading the html. The following code will (remove the complexities of selenium and) get you the data you’re after:
import requests
import pandas as pd
s = requests.Session()
data = {"route":"unitlist","command":"","data":"{"list":{"filter":{"propertyNoFilter":"PR0170000","dateFilter":"03/09/2022"}}}"}
r = s.get('https://www.hellostudent.co.uk/student-accommodation/stoke/caledonia-mills/')
r = s.post('https://rooms.hellostudent.co.uk/DynamicsNav/Call', data=data)
# print(r.json()['list']['property'])
df = pd.DataFrame(r.json()['list']['property'])
print(df)
Result:
unitType unitSubType unitDescription noOfUnits mainPropertyNo startDate endDate leaseLength pricePerWeek biannualAvailable termPaymentAvailable features
0 SHAPT SHAPT-Q4-B4-ES Silver 4-Bed Apartment Ensuite 0 PR0170000 03/09/22 25/08/23 51 89.00 false true None
1 SHAPT SHAPT-Q4-B3-ES Silver 3-Bed Apartment Ensuite 12 PR0170000 03/09/22 25/08/23 51 93.00 false true None
2 SHAPT SHAPT-Q4-B2-ES Silver 2-Bed Apartment Ensuite 7 PR0170000 03/09/22 25/08/23 51 115.00 false true None
3 STUDIO STUDIO-Q4 Silver Studio 1 PR0170000 03/09/22 25/08/23 51 153.00 false true None
4 STUDIO STUDIO-Q3 Gold Studio 0 PR0170000 03/09/22 25/08/23 51 169.00 false true None
If you want to do it with selenium, bear in mind data is in an iframe:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import time as t
import pandas as pd
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
url = 'https://www.hellostudent.co.uk/student-accommodation/stoke/caledonia-mills/'
browser.get(url)
iframe = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.XPATH, "//iframe[@class='panel__frame']")))
browser.switch_to.frame(iframe)
t.sleep(5)
rooms_table = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "table[id='no-more-tables']")))
df = pd.read_html(str(rooms_table.get_attribute('outerHTML')))
print(df[0])
browser.quit()
Which would display the dataframe:
ROOM TYPE PRICE PER WEEK/PER PERSON WEEKS START DATE AVAILABILITY
0 Silver 4-Bed Apartment Ensuite NaN NaN NaN Sold Out
1 Silver 3-Bed Apartment Ensuite £93 51.0 03/09/22 Available - Book Now
2 Silver 2-Bed Apartment Ensuite £115 51.0 03/09/22 Available - Book Now
3 Silver Studio £153 51.0 03/09/22 Last few remaining - book now
4 Gold Studio NaN NaN NaN Sold Out
It would make more sense to actually scrape the url that iframe is loading from: https://rooms.hellostudent.co.uk/#/RoomAvailability/caledonia-mills