Scraping img titles, but not all

Question

As an amateur I have been working on a little coding project for fun. I am looking to scrape quite a lot of data, and with the help of StackOverflow I got a pretty well working script. However, I am still missing one big step; I want to find the titles for certain images on the webpage. I can already gather all other data I need (defined by the red markings). All I need is the titles for the 3×2 image titles. See the screenshot below:

The image titles are not defined by a ‘class’, which makes it hard for me to find them. I tried using

for KTA in soup('img'):
    KTAclass = KTA.get('title')

Which does work, but also provides a lot of ‘None’s in addition to the titles I’m looking for.

My current script looks like this:

import requests
from bs4 import BeautifulSoup


def analyze(i):
    url = f"https://ktarena.com/fr/207-dofus-world-cup/match/{i}/1"
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")

    names = [a.text for a in soup.select(".name a")]
    points = [p.text for p in soup.select(".result .points")]
    arena = soup.find("span", attrs=('name')).text
    
    print(*zip(names, points,),arena)
    

for i in range(46270, 46273):  
    analyze(i)

Can anyone help me out here? Ideally I would like to add the 3 image titles per team to the zipped file currently containing team name and points.

Cheers!

Asked By: Joost

||

Source

Answer 1

not completely sure if I understand you correctly.

You can get the title of the 6 images like that:

image_titles = [elem.find("img").get("title") for elem in soup.find_all("div", {"class": "class"})]

which gives you:

['roublard', 'huppermage', 'ecaflip', 'steamer', 'feca', 'sacrieur']

for this example page

If you have any questions or I misunderstood anything, please ask 🙂

Answered By: bitflip

Answer 2

This should do it. I’ve corrected the selectors to grab the accurate number of image titles:

import requests
from bs4 import BeautifulSoup

def analyze(i):
    url = f"https://ktarena.com/fr/207-dofus-world-cup/match/{i}/1"
    page = requests.get(url)
    soup = BeautifulSoup(page.content, "html.parser")
    arena = soup.find("span", attrs=('name')).text
    title = soup.select_one("[class='team'] .name a").text
    point = soup.select(".result .points")[0].text
    image_titles = ', '.join([i['title'] for i in soup.select("[class='team']:nth-of-type(1) [class^='class'] > img")])

    title_ano = soup.select("[class='team'] .name a")[1].text
    point_ano = soup.select(".result .points")[1].text
    image_titles_ano = ', '.join([i['title'] for i in soup.select("[class='team']:nth-of-type(2) [class^='class'] > img")])

    print((title,point,image_titles),(title_ano,point_ano,image_titles_ano),arena)

for i in range(46270, 46274):  
    analyze(i)

Prints:

('Thunder', '0 pts', 'roublard, huppermage, ecaflip') ('Tweaps', '60 pts', 'steamer, feca, sacrieur') A10
('Shadow Zoo', '0 pts', 'feca, osamodas, ouginak') ('UndisClosed', '60 pts', 'eniripsa, sram, pandawa') A10
('Laugh Tale', '0 pts', 'osamodas, ecaflip, iop') ('FromTheAbyss', '60 pts', 'roublard, steamer, huppermage') A10
('Motamawa', '0 pts', 'osamodas, iop, pandawa') ('Espoo', '60 pts', 'roublard, ecaflip, sacrieur') A10

Answered By: robots.txt

Scraping img titles, but not all

Question:

Answers: