Python: TypeError: expected str, bytes or os.PathLike object, not PdfFileReader

Question:

I have the following code. This is just a starting point. Later on I’d like to replace the static “Hello Word” text with items from a csv file that i read and loop through for every item in the csv.
I want the watermark on every page.

# importing the required modules
import PyPDF2
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

def add_watermark(wmFile, pageObj):
    # opening watermark pdf file
    wmFileObj = open(wmFile, 'rb')

    # creating pdf reader object of watermark pdf file
    pdfReader = PyPDF2.PdfFileReader(wmFileObj)

    # merging watermark pdf's first page with passed page object.
    pageObj.mergePage(pdfReader.getPage(0))

    # closing the watermark pdf file object
    wmFileObj.close()

    # returning watermarked page object
    return pageObj


def main():
    import PyPDF2
    import io
    from reportlab.pdfgen import canvas
    from reportlab.lib.pagesizes import letter
    # watermark pdf file name
    packet = io.BytesIO()
    # Create a new PDF with Reportlab
    can = canvas.Canvas(packet, pagesize=letter)
    can.setFont('Helvetica-Bold',18)
    can.drawString(10, 100, "Hello world")
    can.showPage()
    can.save()

    # Move to the beginning of the StringIO buffer
    packet.seek(0)
    mywatermark = PyPDF2.PdfFileReader(packet)

    # original pdf file name
    origFileName = 'Module1.pdf'

    # new pdf file name
    newFileName = 'watermarked_example.pdf'

    # creating pdf File object of original pdf
    pdfFileObj = open(origFileName, 'rb')

    # creating a pdf Reader object
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

    # creating a pdf writer object for new pdf
    pdfWriter = PyPDF2.PdfFileWriter()

    # adding watermark to each page
    for page in range(pdfReader.numPages):
        # creating watermarked page object
        wmpageObj = add_watermark(mywatermark, pdfReader.getPage(page))

        # adding watermarked page object to pdf writer
        pdfWriter.addPage(wmpageObj)

    # new pdf file object
    newFile = open(newFileName, 'wb')

    # writing watermarked pages to new file
    pdfWriter.write(newFile)

    # closing the original pdf file object
    pdfFileObj.close()
    # closing the new pdf file object
    newFile.close()


if __name__ == "__main__":
    main()

The error I get is:

Traceback (most recent call last):
  File "watermark.py", line 101, in <module>
    main()
  File "watermark.py", line 83, in main
    wmpageObj = add_watermark(mywatermark, pdfReader.getPage(page))
  File "watermark.py", line 32, in add_watermark
    wmFileObj = open(wmFile, 'rb')
TypeError: expected str, bytes or os.PathLike object, not PdfFileReader

I believe I get the point that it’s expecting a string, bytes or a file, which I don’t write, it’s just an “object”.

I tried a couple of things but whatever I try it makes things actually worse 🙁

Can someone help out? I’m pretty sure it’s just a small thing as I’m good in overseeing the obvious.

any help is appreciated.

thanks

Asked By: f0rd42

||

Answers:

I’ll leave the guides and imperfections to the end, here’s how you fix this piece of code:

1) Set the variable ‘packet’ to an existing PDF-file filename in the same directory that the script is in:

packet = 'my_watermark.pdf'

2) Delete the moving to the beginning of the ‘stringIO’ buffer (like we’d ever need it):

packet.seek(0)     # delete this
mywatermark = PyPDF2.PdfFileReader(packet) #delete this too

3) Give ‘packet’ as an argument instead of ‘mywatermark’ in the for-loop block:

wmpageObj = add_watermark(packet, pdfReader.getPage(page))

4) From the add_watermark function delete file openings and closings, leave only the constructing of the PdfFileReader instance, but with the parameter ‘wmFile’:

wmFileObj = open(wmFile, 'rb')                # delete this
pdfReader = PyPDF2.PdfFileReader(wmFile)      # let this be, but change wmFileObj to wmFile
pageObj.mergePage(pdfReader.getPage(0))       # let this be
wmFileObj.close()                             # delete this
return pageObj                                # let this be  

Also, in your code there are imports in your main function, move them to the beginning of the file, and do read some documentation. PyPDF2‘s documentation shows how to merge pages (it’s the module’s specialty tbh), and while it’s a bit laconical, on the other side, Reportlab‘s User Guide is very thorough, but straightforward. Always try to see the meaning too behind your code.

Answered By: Scavenger
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.