Why are methods of the and attributes of PdfFileReader in mixedCase

Question:

I have been using python to work with PDF’s and have realised that methods and attributes of PdfFileReader class are in mixedCase. Such as:

getNumPages()

I though they were supposed to be writter in lower case, why has this format not been updated?

Asked By: Despereaux

||

Answers:

PyPDF is a fairly old library. In the early days of Python, these matters were a lot less settled than they are today. (Even today, I bet there is some disagreement). There were inconsistencies even within Python’s standard library. (I would not be surprised if there still are; I have not checked).

As to why it has not been updated, it is of course impossible to know without reading the minds of the developers. However, breaking all existing uses for the sole purpose of conforming to PEP8, which the maintainers may not even agree with (see ekhumoro’s comment), may seem a bit excessive.

Answered By: Ture Pålsson

I am the maintainer of PyPDF2 and pypdf since April 2022.

Just like Ture PĂ„lsson explained: pypdf is old.

As to why it has not been updated, it is of course impossible to know without reading the minds of the developers. However, breaking all existing uses for the sole purpose of conforming to PEP8, which the maintainers may not even agree with (see ekhumoro’s comment), may seem a bit excessive.

I have decided that we need an update. The old versions are still there – if people make a version upgrade, they will need to watch out for the change of class names / method names.

I wrote a migration guide, added everything in the changelog and introduced a deprecation process.

We are currently at the stage where we still have helpful error messages, but the old code will not work with new versions of pypdf / PyPDF2 any longer. If people want to keep using the old version, they need to pin that version.

Why did I decide to make the upgrade?

  1. Bad names caused bad code: I’ve seen so many examples where people first get the total number of pages, then make range(page_count) and then reader.getPage(i). Instead, they could just do for page in reader.pages. Another example is that people open file handles all the time and pass the file handle to PdfFileReader. Instead, they should just pass the path to it. Or the byte stream, if they have it. Calling it PdfFileReader also makes it counter-intuitive that it can act on streams as well.
  2. Defaults: PdfReader and PdfWriter now have strict=False by default. The PdfFileReader / PdfFileWriter had strict=True. People were stumbling over this all the time
  3. Better mental models: reader.pages[i] seems way more intuitive than reader.getPage(i). It also allows people to discover more naturally that you can iterate over it / get the length of it. Just like a list. Most developers will probably just assume this is a property which is a list. That means they can suddenly apply all their knowledge of what you can do with a list.
  4. Code Readability: black helps a lot to make new code bases look familiar. Having the (now) commonly used snake_case naming scheme instead of camelCase also helps.
Answered By: Martin Thoma
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.