showing only 10 divs out of 50 divs in scrapy webscraping

Question:

I have been trying to scrape this page for editorial data with scrapy

In the Editorial Board Members section there are 54 editors inside 54 div tags. But when
I try to scrape data I am getting only 10 data from 10 div tags.

len(response.css("#moreGeneralEditors>div"))    

10 and the code snippet for getting data

import scrapy


class MdpjournalSpider(scrapy.Spider):
    name = 'try'
    start_urls = ["https://www.mdpi.com/journal/agrochemicals/editors"]

    def parse(self, response):
        outer_divs = response.css("div.middle-column__main.ul-spaced div.content__container>div")

        for inner_divs in outer_divs:
            if inner_divs.css("#moreGeneralEditors")!=[]:
                divs = inner_divs.css("#moreGeneralEditors>div")

                for inner_div in divs:
                    if inner_div.css("div.editor-div__content.img-exists")!=[]:
                        editor = inner_div.css("div.editor-div__content.img-exists:nth-child(2) b::text").get()
                        role = "editor"

                        yield {"editor":editor,"role":role}

                    elif inner_div.css("div.editor-div__content")!=[]:
                        editor = inner_div.css("div.editor-div__content:nth-child(1) b::text").get()
                        role = "editor"

                        yield {"editor":editor,"role":role}

editors with image and without image are in different classes. I am only concerned about this editorial board members.
All the editors data in the journal have this problem. Here is the link to list of all journals
all journals

Asked By: Harikrishnan V

||

Answers:

You are getting only 10 items because rest of 44 items are loaded dynamically from external source via API. So you have to use API url instead.

Example:

import scrapy
class TestSpider(scrapy.Spider):
    name = 'test'  
    def start_requests(self):
        api_url = 'https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3'
        headers= {
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
            'x-requested-with': 'XMLHttpRequest'
            }
        yield scrapy.Request(url=api_url, method='GET',callback=self.parse,headers=headers)

    def parse(self, response):
        pass
       
        members = response.xpath('//*[@class="editor-div__content "][1]/b') + response.xpath('//*[@class="editor-div__content img-exists"][1]/b')
        for member in members:
           
            yield {
                "editor": member.xpath('.//text()').get()
                }

Output:

{'editor': ' Dr. Pasquale Comberiati'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Audrey DunnGalvin'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Monica Greco'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Inkyu Hwang'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Inaki Izquierdo'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Gisèle Kanny'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Chang Kim'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Rosario Linacero'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Soheila J. Maleki'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Giuseppe Murdaca'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Kazuyuki Nakagome'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Eleonora Nucera'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Franziska Roth-Walter'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Youn Young Shim'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Carina Gabriela Uasuf'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Joana Costa'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Magdalena Czarnecka-Operacz'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Danilo Di Bona'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Araceli Díaz -Perales'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Maria Gasset'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Elena Gimenez-Arnau'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Houman Goudarzi'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Lars Hellman'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Christiane Hilger'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Russell Hopp'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Mats W. Johansson'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Marat V. Khodoun'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Uday Kishore'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Rebecca Knibb'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Heung-Man Lee'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Isabel Mafra'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Mario Malerba'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Arduino A. Mangoni'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Nobuaki Miyahara'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Linda Monaci'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Tatsuya Moriyama'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Maria Pino-Yanes'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Daniel P. Potaczek'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Antonietta Rossi'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Ann-Marie Malby Schoos'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Gregory Seumois'}
2022-08-23 00:46:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Dr. Cenk Suphioglu'}
2022-08-23 00:46:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Junji Yodoi'}
2022-08-23 00:46:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.mdpi.com/journal/allergies/editors/ajax?term=10&board=3>
{'editor': ' Prof. Dr. Gianvincenzo Zuccotti'}
2022-08-23 00:46:22 [scrapy.core.engine] INFO: Closing spider (finished)
2022-08-23 00:46:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 369,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 11876,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 1.539094,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 8, 22, 18, 46, 22, 26301),
 'httpcompression/response_bytes': 59114,
 'httpcompression/response_count': 1,
 'item_scraped_count': 44,
Answered By: F.Hoque