Maintained alternatives to PyPDF2
Question:
I’m using the PyPDF2
library for extracting text, images, page width and heights, annotations, and other attributes from pdf documents. However, the library has many bugs and issues and seems not to be maintained for a long time already. (edit: PyPDF2 is maintained again)
- Is there a more vivid fork that is being maintained and developed?
- Is there a good alternative?
From what I know, reportlab
is more suitable for creating brand new pdf’s (or maybe I’m just not experienced enough with reportlab).
Answers:
PyMuPDF is a Python binding for MuPDF – a lightweight PDF and XPS viewer. Because MuPDF supports not only PDF but also XPS, OpenXPS, CBZ, CBR, FB2, and EPUB formats, so does PyMuPDF. PyMuPDF is hosted on GitHub. We also are registered on PyPI.
Its performance stats are also very promising. Following are three sections that deal with different aspects of performance:
- document parsing
- text extraction
- image rendering
Update: pypdf
(pypi) is maintained again – and I am the maintainer (of pypdf and PyPDF2) 🙂 I’ve just released a new version with several bugfixes.
Looking at the top PyPI packages, PyPDF2 is also the most used one (and pypdf==3.1.0
is almost the same as PyPDF2==3.0.0
, the community just needs a bit of time to switch to pypdf)
Three potential alternatives which are maintained (just like pypdf):
pymupdf
: uses mupdf (only for open source due to mypdf license)
pikepdf
: Uses qpdf
pdfminer.six
: A pure Python project.
I would not use:
I’m using the PyPDF2
library for extracting text, images, page width and heights, annotations, and other attributes from pdf documents. However, the library has many bugs and issues and seems not to be maintained for a long time already. (edit: PyPDF2 is maintained again)
- Is there a more vivid fork that is being maintained and developed?
- Is there a good alternative?
From what I know, reportlab
is more suitable for creating brand new pdf’s (or maybe I’m just not experienced enough with reportlab).
PyMuPDF is a Python binding for MuPDF – a lightweight PDF and XPS viewer. Because MuPDF supports not only PDF but also XPS, OpenXPS, CBZ, CBR, FB2, and EPUB formats, so does PyMuPDF. PyMuPDF is hosted on GitHub. We also are registered on PyPI.
Its performance stats are also very promising. Following are three sections that deal with different aspects of performance:
- document parsing
- text extraction
- image rendering
Update: pypdf
(pypi) is maintained again – and I am the maintainer (of pypdf and PyPDF2) 🙂 I’ve just released a new version with several bugfixes.
Looking at the top PyPI packages, PyPDF2 is also the most used one (and pypdf==3.1.0
is almost the same as PyPDF2==3.0.0
, the community just needs a bit of time to switch to pypdf)
Three potential alternatives which are maintained (just like pypdf):
pymupdf
: uses mupdf (only for open source due to mypdf license)pikepdf
: Uses qpdfpdfminer.six
: A pure Python project.
I would not use: