Why did a plawright-python app run in Docker failed? Headless=False?

Question:

I have a small application that uses fast-api and playwright to scrape data and send it back to the client.
The program is working properly when I’m running it locally, but when I try to run it as a Docker image it fails with the following error:

Looks like you launched a headed browser without having a XServer running.
Set either 'headless: true' or use 'xvfb-run <your-playwright-app>' before running Playwright. 

obviously I tried running it in Headless=True mode, but the code fails with this error:

net::ERR_EMPTY_RESPONSE at https://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true
logs
navigating to "https://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true", 
waiting until "load"

I also tried to run it locally with Headless=True and it failed with "Timeout 30000ms exceeded" error.

This is the funcion I’m using to return the page html:

    def extract_html(self):
        with sync_playwright() as p:
            browser = p.chromium.launch()
            page = browser.new_page()
            page.goto('https://book.flygofirst.com/Flight/Select?inl={}&CHD={}&s=True&o1={}&d1={}&ADT={}&dd1={}&gl=0&glo=0&cc=INR&mon=true'.format(self.infants,  self.children , self.origin,  self.destination,  self.adults, self.date))
            html = page.inner_html('#sectionBody')
            return html

and this is my Dockerfile:

FROM python:3.9-slim

COPY ../../requirements/dev.txt ./

RUN python3 -m ensurepip
RUN pip install -r dev.txt
RUN playwright install 
RUN playwright install-deps 

ENV PYTHONPATH "${PYTHONPATH}:/app/"
WORKDIR /code/src

COPY ./src /app

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]

Hope someone could figure out what I’m doing wrong.

Asked By: Daniel Avigdor

||

Answers:

After investigating and trying several things, looks like the problem is the user_agent of the browser when is in headless mode, for some reason the default user agent does not like to that page, try with:

def extract_html(self):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36')
        page.goto('http://book.flygofirst.com/Flight/Select?inl=0&CHD=0&s=True&o1=BOM&d1=BLR&ADT=1&dd1=2022-12-10&gl=0&glo=0&cc=INR&mon=true')
        html = page.inner_html('#sectionBody')
        return html
Answered By: Jaky Ruby

Locally it works as there’s GUI stuff for sure already installed in order to open a browser (especially with headless=False)
but when you’re trying to put it to Docker env additional actions required, so I’ve resolved it in this way:

Dockerfile:

FROM mcr.microsoft.com/playwright/python:v1.{lastest_stable_version}-focal  # in my case `30.0` 

RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y xvfb
RUN apt-get install -qqy x11-apps

# chromium dependencies
RUN apt-get install -y libnss3 
                       libxss1 
                       libasound2 
                       fonts-noto-color-emoji

# additional actions related to your project

ENTRYPOINT ["/bin/sh", "-c", "/usr/bin/xvfb-run -a $@", ""]  # exactly this kind of magic command :)

docker-compose.yml

  service_name:
    build: . 
    init: true
    command: # command depending on a project
    environment:
      - DISPLAY=:0
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix

Hope it will help

Answered By: KravAn