How do I add selenium & chromedriver to an AWS Lambda function?

Question:

I am trying to host a webscraping function on aws lambda and am running into webdriver errors for selenium. Could someone show me how you go about adding the chromedriver.exe file and how do you get the pathing to work in AWS Lambda function. This is the portion of my function that has to do with selenium,

from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.service import Service
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine

url = '``https://covid19criticalcare.com/pharmacies/``'

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 5)

I tried creating a lambda layer with the chromedriver.exe file
I followed this guide (https://dev.to/awscommunity-asean/creating-an-api-that-runs-selenium-via-aws-lambda-3ck3) but I couldn’t add the headless chromium because of the file size pushing me over my function limit (my pandas and numpy dependence layers have taken up most of my space)
I tried driver = webdriver.Chrome(with a path variable) and tried different pathing but wasn’t sure what the beginning of the path would be since its on a lambda function.

Asked By: K_Tech

Source

Answers:

I’ve been struggling adding selenium to the aws lambda for last couple days. I have a web scraping function (uses selenium and google api) which extracts data from a website and writes the outputs to a google spreadsheet. Let me explain what i did step by step and how i finally succeeded so you don’t have to deal with it as much as me:

1- I tried to add selenium as a layer described here https://www.youtube.com/watch?v=jWqbYiHudt8. What i ended up was, i was succesfull with adding selenium but deployment package is over 250mb (describe lambda quotaas here: How to increase the maximum size of the AWS lambda deployment package (RequestEntityTooLargeException)?) so it did not work.

2- To overcome deployment package size, it is a good option to add as container images(10 gb deployment package size limit). Here is a good explanation of adding as container images https://cloudbytes.dev/snippets/run-selenium-in-aws-lambda-for-ui-testing#using-the-github-repository-directly . i tried it but i could not able to deploy as described due to missing/wrong webdrivers(the shell script seems to be wrong)

3- And finally, i was fully able to publish my selenium function as docker image as described here https://github.com/umihico/docker-selenium-lambda.

There are lots of discussions about which version work with what. The most important issue about selenium is, you have to be careful about package and driver version when deploying to aws lambda.

Answered By: Utku Can

I created a guide for building a serverless architecture on aws using sam. The example I used was for a web scraper using selenium that scrapes a website and writes the data to a csv and stores it in an s3 bucket.

In case someone finds it useful –
https://medium.com/@karthiks3000/aws-serverless-architecture-with-sam-part-1-7d22203c10bd

The post that deals with adding selenium to a lambda is here –
https://medium.com/@karthiks3000/aws-serverless-architecture-with-sam-part-4-688873f5742

Answered By: karthiks3000