Why are methods of the and attributes of PdfFileReader in mixedCase
Question:
I have been using python to work with PDF’s and have realised that methods and attributes of PdfFileReader class are in mixedCase. Such as:
getNumPages()
I though they were supposed to be writter in lower case, why has this format not been updated?
Answers:
PyPDF is a fairly old library. In the early days of Python, these matters were a lot less settled than they are today. (Even today, I bet there is some disagreement). There were inconsistencies even within Python’s standard library. (I would not be surprised if there still are; I have not checked).
As to why it has not been updated, it is of course impossible to know without reading the minds of the developers. However, breaking all existing uses for the sole purpose of conforming to PEP8, which the maintainers may not even agree with (see ekhumoro’s comment), may seem a bit excessive.
I am the maintainer of PyPDF2
and pypdf
since April 2022.
Just like Ture PĂ„lsson explained: pypdf is old.
As to why it has not been updated, it is of course impossible to know without reading the minds of the developers. However, breaking all existing uses for the sole purpose of conforming to PEP8, which the maintainers may not even agree with (see ekhumoro’s comment), may seem a bit excessive.
I have decided that we need an update. The old versions are still there – if people make a version upgrade, they will need to watch out for the change of class names / method names.
I wrote a migration guide, added everything in the changelog and introduced a deprecation process.
We are currently at the stage where we still have helpful error messages, but the old code will not work with new versions of pypdf
/ PyPDF2
any longer. If people want to keep using the old version, they need to pin that version.
Why did I decide to make the upgrade?
- Bad names caused bad code: I’ve seen so many examples where people first get the total number of pages, then make
range(page_count)
and then reader.getPage(i)
. Instead, they could just do for page in reader.pages
. Another example is that people open file handles all the time and pass the file handle to PdfFileReader. Instead, they should just pass the path to it. Or the byte stream, if they have it. Calling it PdfFileReader
also makes it counter-intuitive that it can act on streams as well.
- Defaults: PdfReader and PdfWriter now have
strict=False
by default. The PdfFileReader / PdfFileWriter had strict=True
. People were stumbling over this all the time
- Better mental models:
reader.pages[i]
seems way more intuitive than reader.getPage(i)
. It also allows people to discover more naturally that you can iterate over it / get the length of it. Just like a list. Most developers will probably just assume this is a property which is a list. That means they can suddenly apply all their knowledge of what you can do with a list.
- Code Readability: black helps a lot to make new code bases look familiar. Having the (now) commonly used
snake_case
naming scheme instead of camelCase
also helps.
I have been using python to work with PDF’s and have realised that methods and attributes of PdfFileReader class are in mixedCase. Such as:
getNumPages()
I though they were supposed to be writter in lower case, why has this format not been updated?
PyPDF is a fairly old library. In the early days of Python, these matters were a lot less settled than they are today. (Even today, I bet there is some disagreement). There were inconsistencies even within Python’s standard library. (I would not be surprised if there still are; I have not checked).
As to why it has not been updated, it is of course impossible to know without reading the minds of the developers. However, breaking all existing uses for the sole purpose of conforming to PEP8, which the maintainers may not even agree with (see ekhumoro’s comment), may seem a bit excessive.
I am the maintainer of PyPDF2
and pypdf
since April 2022.
Just like Ture PĂ„lsson explained: pypdf is old.
As to why it has not been updated, it is of course impossible to know without reading the minds of the developers. However, breaking all existing uses for the sole purpose of conforming to PEP8, which the maintainers may not even agree with (see ekhumoro’s comment), may seem a bit excessive.
I have decided that we need an update. The old versions are still there – if people make a version upgrade, they will need to watch out for the change of class names / method names.
I wrote a migration guide, added everything in the changelog and introduced a deprecation process.
We are currently at the stage where we still have helpful error messages, but the old code will not work with new versions of pypdf
/ PyPDF2
any longer. If people want to keep using the old version, they need to pin that version.
Why did I decide to make the upgrade?
- Bad names caused bad code: I’ve seen so many examples where people first get the total number of pages, then make
range(page_count)
and thenreader.getPage(i)
. Instead, they could just dofor page in reader.pages
. Another example is that people open file handles all the time and pass the file handle to PdfFileReader. Instead, they should just pass the path to it. Or the byte stream, if they have it. Calling itPdfFileReader
also makes it counter-intuitive that it can act on streams as well. - Defaults: PdfReader and PdfWriter now have
strict=False
by default. The PdfFileReader / PdfFileWriter hadstrict=True
. People were stumbling over this all the time - Better mental models:
reader.pages[i]
seems way more intuitive thanreader.getPage(i)
. It also allows people to discover more naturally that you can iterate over it / get the length of it. Just like a list. Most developers will probably just assume this is a property which is a list. That means they can suddenly apply all their knowledge of what you can do with a list. - Code Readability: black helps a lot to make new code bases look familiar. Having the (now) commonly used
snake_case
naming scheme instead ofcamelCase
also helps.