Unable to scrape kosis.kr even with selenium

Question:

I trying to scrape data from given link below. But I can not get html elements. I am using selenium with python. When I do print(driver.page_source), it prints just bunch of JS like when we try to scrape a javascript driven website with BeautifulSoup. I waited longer to render the whole page but still selenium driver can not get html rendered elements. So how do I scrape it?

https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1JH20151&vw_cd=MT_ETITLE&list_id=J1_10&scrId=&language=en&seqNo=&lang_mode=en&obj_var_id=&itm_id=&conn_path=MT_ETITLE&path=%252Feng%252FstatisticsList%252FstatisticsListIndex.do

I am trying scrape kosis.kr but selenium driver.page_source is giving nothig.

Asked By: Banglar Bachelor

||

Answers:

simply wait till the loading is finished.
for example:
until

$("#Loading").is(":visible") == false;

visualization

beautified python code example:

    flag = driver.execute_script(r'return typeof($) == undefined ? false : $("#Loading").is(":visible") == false;') 
    while flag == False:
        WebDriverWait(driver,2)
        flag = driver.execute_script(r'return  typeof($) == undefined ? false : $("#Loading").is(":visible") == false;')
Answered By: wilpeers

The data of your interest is located in nested iframes on that page. Try this to get the tabular content from there:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

link = "https://kosis.kr/statHtml/statHtml.do?orgId=101&tblId=DT_1JH20151&vw_cd=MT_ETITLE&list_id=J1_10&scrId=&language=en&seqNo=&lang_mode=en&obj_var_id=&itm_id=&conn_path=MT_ETITLE&path=%252Feng%252FstatisticsList%252FstatisticsListIndex.do"

with webdriver.Chrome() as driver:
    driver.get(link)
    WebDriverWait(driver,20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#iframe_rightMenu")))
    WebDriverWait(driver,20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#iframe_centerMenu1")))
    for item in WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"table[id='mainTable'] tr"))):
        data = [i.text for i in item.find_elements(By.CSS_SELECTOR,'th,td')]
        print(data)
Answered By: robots.txt