Web scraping social media followers, but the list in the 100s of thousands. Selenium runs out of memory

Question:

So I’ve been using Selenium in Chrome to go to a social media profile and scrape the usernames of its followers. However, the list is in the 100s of thousands and the page only loads a limited amount. My solution was to tell Selenium to scroll down endlessly and scrape usernames using ‘driver.find_elements’ as it goes, but after a few hundred usernames Chrome soon crashes with the error code "Ran out of memory".

Am I even capable of getting that entire list?

Is Selenium even the right tool to use or should I use Scrapy? Maybe both?

I’m at a loss on how to move forward from here.

Here’s my code just in case

from easygui import *
import time 
from selenium import webdriver 
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService;
from webdriver_manager.chrome import ChromeDriverManager;

choice = ccbox("Run the test?","",("Run it","I'm not ready yet"));
if choice == False:
    quit()

driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()));
time.sleep(60) #this is a wait to give me time to manually log in and go 
               #to followers list

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        driver.execute_script("window.scrollTo(0, 1080);")
        time.sleep(1)
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)
    last_height = new_height
Asked By: javery1337

||

Answers:

I figured it out! So every "follower" had an element and endlessly scrolling would store all of these elements in memory until it hit a limit. I solved this by deleting the elements with javascript after scrolling a certain amount, rinse and repeat until reaching the bottom 🙂

Answered By: javery1337