pypdf2

How to use a variable instead of a constant in the following statement in python with PyPDF2

How to use a variable instead of a constant in the following statement in python with PyPDF2 Question: object = PyPDF2.PdfFileReader(r"C:PROYECTOSPruebasPDFArchivosPDF326581098.pdf") changing the path for a variable, how would the sentence be path = "C:PROYECTOSPruebasPDFArchivosPDF326581098.pdf" object = PyPDF2.PdfFileReader(r….) Asked By: Panyvino || Source Answers: you should be able to use path = r"C:PROYECTOSPruebasPDFArchivosPDF326581098.pdf" object = …

Total answers: 1

PyPDF2.errors.PdfReadError: PDF starts with '♣▬', but '%PDF-' expected

PyPDF2.errors.PdfReadError: PDF starts with '♣▬', but '%PDF-' expected Question: I have a folder containing a lot of sub-folders, with PDF files inside. It’s a real mess to find information in these files, so I’m making a program to parse these folders and files, searching for a keyword in the PDF files, and returning the names …

Total answers: 2

Loop through folder and subfolders and merge pdf

Loop through folder and subfolders and merge pdf Question: I tried to create a script to loop through parent folder and subfolders and merge all of the pdfs into one. Below if the code I wrote so far, but I don’t know how to combine them into one script. Reference: Merge PDF files The first …

Total answers: 1

PyPDF2 : extract table of contents/outlines and their page number

PyPDF2 : extract table of contents/outlines and their page number Question: I am trying to extract the TOC/outlines from PDFs and their page number using Python (PyPDF2), I am aware of the reader.outlines but it does not return the correct page number. Pdf example: https://www.annualreports.com/HostedData/AnnualReportArchive/l/NASDAQ_LOGM_2018.pdf and the output of reader.outlines is : [{‘/Title’: ‘2018 Highlights’, …

Total answers: 3

How do I extract text in the right order from PDF using PyPDF2?

How do I extract text in the right order from PDF using PyPDF2? Question: I am currently doing a project to extract the contents of a PDF. The code runs smoothly and I am able to extract the text but the extracted text are not in the right order. The code extracts the text in …

Total answers: 2

How can I interchangeably use glob.glob("*PDF) and os.listdr("./directory")?

How can I interchangeably use glob.glob("*PDF) and os.listdr("./directory")? Question: I am trying to merge PDF files inside a folder I tried running the code from the same directory and it worked however when I copied the code to a different location and specified the directory path of PDF files, the merging process is not happening …

Total answers: 1

Error occurred while using PyPdf2 PdfFileMerger in Python

Error occurred while using PyPdf2 PdfFileMerger in Python Question: I have been creating a Python program using PyPdf2 to merge multiple pdf files. Here is the code import os from PyPDF2 import PdfFileMerger source_dir = os.getcwd() merger = PdfFileMerger() for item in os.listdir(source_dir): if item.endswith(‘pdf’): merger.append(item) merger.write(‘completed_file.pdf’) merger.close() while running the code i encountered the …

Total answers: 1

How to add watermark in all pages of PDF files with python?

How to add watermark in all pages of PDF files with python? Question: I’m try to adding watermark to every pages of my PDF file.My PDF files have 58 pages but my output file has get only last page in my PDF file. This’s my code: from PyPDF2 import PdfFileReader, PdfFileWriter watermark_pdf = PdfFileReader("watermark.pdf") watermark_page …

Total answers: 5

How to extract text from pdf in Python 3.7

How to extract text from pdf in Python 3.7 Question: I am trying to extract text from a PDF file using Python. My main goal is I am trying to create a program that reads a bank statement and extracts its text to update an excel file to easily record monthly spendings. Right now I …

Total answers: 10

How to check if PDF is scanned image or contains text

How to check if PDF is scanned image or contains text Question: I have a large number of files, some of them are scanned images into PDF and some are full/partial text PDF. Is there a way to check these files to ensure that we are only processing files which are scanned images and not …

Total answers: 12