Extract first page of pdf file using pdfminer library of python3

Question

I want to get the first page data from pdffile.

I have used pdfminer and got all the data of pdffile in output but i only wants to fetch the first page data of pdffile. what should i do?

My code is given below.

from pdfminer.high_level import extract_pages
from pdfminer.layout import LTTextContainer, LTChar,LTLine,LAParams
import os
path=r'/home/user/Desktop/abc.pdf'

Extract_Data=[]

for page_layout in extract_pages(path):
    print(page_layout)
    for element in page_layout:
        if isinstance(element, LTTextContainer):
            for text_line in element:
                for character in text_line:
                    if isinstance(character, LTChar):
                        Font_size=character.size
            Extract_Data.append([Font_size,(element.get_text())])

My problem is with the page_layout, i guess.
how to get the first page data only??

Asked By: V J

||

Source

Answer 1

extract_pages has an optional argument which can do that:

def extract_pages(pdf_file, password='', page_numbers=None, maxpages=0,
                  caching=True, laparams=None):
    """Extract and yield LTPage objects
    :param pdf_file: Either a file path or a file-like object for the PDF file
        to be worked on.
    :param password: For encrypted PDFs, the password to decrypt.
    :param page_numbers: List of zero-indexed page numbers to extract.
    :param maxpages: The maximum number of pages to parse

Source: https://github.com/pdfminer/pdfminer.six/blob/22f90521b823ac5a22785d1439a64c7bdf2c2c6d/pdfminer/high_level.py#L126

So extract_pages(path, page_numbers=[0], maxpages=1)[0] should return only the first page data if I understand correctly.

Answered By: Stephan Pieterse

Extract first page of pdf file using pdfminer library of python3

Question:

Answers: