How to summarize pdf file into plain text, and create and place new file on desktop?

Question:

I want to automatically turn pdf files into text, and then take that output to save a file on my desktop.

Example:

— pdf converted text: "HELLO WORLD"

— save file on desktop on a .txt file with "HELLO WORLD" saved.

I have done:

fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)

I thought this would save my file on the desktop given the input (text) which I used as the variable to house the converted text.

Full Code:

from PyPDF2 import PdfReader

reader = PdfReader("/Users/zain/Desktop/Week2_POL305_Manfieldetal.pdf")
text = ""
for page in reader.pages:
text += page.extract_text() + "n"

print(text)

fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)

fp.write(text)
Asked By: zainalisaqib

||

Answers:

This works for me.

from PyPDF2 import PdfReader

#path to pdf file
reader=PdfReader(r'C:UserszainDesktopWeek2_POL305_Manfieldetal.pdf')

text = ""

for page in reader.pages:
    text += page.extract_text() + 'n'

#path to save file on desktop
#you can keep txt, leave nothing, or change it to another file type
fp = open(r'C:UserszainDesktoppdf_summary.txt','a')
fp.writelines(text)
Answered By: johnkhigginson

PDF may consist of all sorts of things, not only text.
You therefore have to explicitly extract text from a PDF – if that is what you want.

In package PyMuPDF you could do it this way:

import fitz  # import pymupdf
import pathlib

doc=fitz.open("input.pdf")
text = "n".join([page.get_text() for page in doc])
pathlib.Path("input.txt").write_bytes(text.encode())  # supports non ASCII text
Answered By: Jorj McKie
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.