Scraping pdfs from this web

Question:

I am trying to scrape with python 2.7 from this website:

http://www.motogp.com/en/Results+Statistics/

I want to scrape the main one, that has many categories (Event), the one that appears next to the MotoGP Race Classification 2017 blue letters

And after that scrape for years as well. So far I have:

import re
from bs4 import BeautifulSoup
from urllib.request import urlopen
url = "http://www.motogp.com/en/Results+Statistics/"
r  = urlopen(url).read()
soup = BeautifulSoup(r)
type(soup)

match = re.search(b'"(.*?.pdf)"', r)
pdf_url="http://resources.motogp.com/files/results/2017/ARG/MotoGP/RAC/Classification" + match.group(1).decode('utf8')

The links are this type:

http://resources.motogp.com/files/results/2017/AME/MotoGP/RAC/Classification.pdf?v1_ef0b514c

So I should add the thing "?" after the character. The main problem is how to switch from event to event to get all the links in this type of format.

Asked By: Gotey

||

Answers:

According to the description you have provided above, this is how can get those pdf links:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("http://www.motogp.com/en/Results+Statistics/")

for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#event option"))):
    item.click()
    elem = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "padleft5")))
    print(elem.get_attribute("href"))
    wait.until(EC.staleness_of(elem))

driver.quit()

Partial output:

http://resources.motogp.com/files/results/2017/VAL/MotoGP/RAC/worldstanding.pdf?v1_8dbea75c
http://resources.motogp.com/files/results/2017/QAT/MotoGP/RAC/Classification.pdf?v1_f6564614
http://resources.motogp.com/files/results/2017/ARG/MotoGP/RAC/Classification.pdf?v1_9107e18d
http://resources.motogp.com/files/results/2017/AME/MotoGP/RAC/Classification.pdf?v1_ef0b514c
http://resources.motogp.com/files/results/2017/SPA/MotoGP/RAC/Classification.pdf?v1_ba33b120
Answered By: SIM
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.