Python design mistakes

Question:

A while ago, when I was learning Javascript, I studied Javascript: the good parts, and I particularly enjoyed the chapters on the bad and the ugly parts. Of course, I did not agree with everything, as summing up the design defects of a programming language is to a certain extent subjective – although, for instance, I guess everyone would agree that the keyword with was a mistake in Javascript. Nevertheless, I find it useful to read such reviews: even if one does not agree, there is a lot to learn.

Is there a blog entry or some book describing design mistakes for Python? For instance I guess some people would count the lack of tail call optimization a mistake; there may be other issues (or non-issues) which are worth learning about.

Asked By: Andrea

||

Answers:

Is there a blog entry or some book describing design mistakes for Python?

Yes.

It’s called the Py3K list of backwards-incompatible changes.

Start here: http://docs.python.org/release/3.0.1/whatsnew/3.0.html

Read all the Python 3.x release notes for additional details on the mistakes in Python 2.

Answered By: S.Lott

My biggest peeve with Python – and one which was not really addressed in the move to 3.x – is the lack of proper naming conventions in the standard library.

Why, for example, does the datetime module contain a class itself called datetime? (To say nothing of why we have separate datetime and time modules, but also a datetime.time class!) Why is datetime.datetime in lower case, but decimal.Decimal is upper case? And please, tell me why we have that terrible mess under the xml namespace: xml.sax, but xml.etree.ElementTree – what is going on there?

Answered By: Daniel Roseman

You asked for a link or other source, but there really isn’t one. The information is spread over many different places. What really constitutes a design mistake, and do you count just syntactic and semantic issues in the language definition, or do you include pragmatic things like platform and standard library issues and specific implementation issues? You could say that Python’s dynamism is a design mistake from a performance perspective, because it makes it hard to make a straightforward efficient implementation, and it makes it hard (I didn’t say completely impossible) to make an IDE with code completion, refactoring, and other nice things. At the same time, you could argue for the pros of dynamic languages.

Maybe one approach to start thinking about this is to look at the language changes from Python 2.x to 3.x. Some people would of course argue that print being a function is inconvenient, while others think it’s an improvement. Overall, there are not that many changes, and most of them are quite small and subtle. For example, map() and filter() return iterators instead of lists, range() behaves like xrange() used to, and dict methods like dict.keys() return views instead of lists. Then there are some changes related to integers, and one of the big changes is binary/string data handling. It’s now text and data, and text is always Unicode. There are several syntactic changes, but they are more about consistency than revamping the whole language.

From this perspective, it appears that Python has been pretty well designed on the language (syntax and sematics) level since at least 2.x. You can always argue about indentation-based block syntax, but we all know that doesn’t lead anywhere… 😉

Another approach is to look at what alternative Python implementations are trying to address. Most of them address performance in some way, some address platform issues, and some add or make changes to the language itself to more efficiently solve certain kinds of tasks. Unladen swallow wants to make Python significantly faster by optimizing the runtime byte-compilation and execution stages. Stackless adds functionality for efficient, heavily threaded applications by adding constructs like microthreads and tasklets, channels to allow bidirectional tasklet communication, scheduling to run tasklets cooperatively or preemptively, and serialisation to suspend and resume tasklet execution. Jython allows using Python on the Java platform and IronPython on the .Net platform. Cython is a Python dialect which allows calling C functions and declaring C types, allowing the compiler to generate efficient C code from Cython code. Shed Skin brings implicit static typing into Python and generates C++ for standalone programs or extension modules. PyPy implements Python in a subset of Python, and changes some implementation details like adding garbage collection instead of reference counting. The purpose is to allow Python language and implementation development to become more efficient due to the higher-level language. Py V8 bridges Python and JavaScript through the V8 JavaScript engine – you could say it’s solving a platform issue. Psyco is a special kind of JIT that dynamically generates special versions of the running code for the data that is currently being handled, which can give speedups for your Python code without having to write optimised C modules.

Of these, something can be said about the current state of Python by looking at PEP-3146 which outlines how Unladen Swallow would be merged into CPython. This PEP is accepted and is thus the Python developers’ judgement of what is the most feasible direction to take at the moment. Note it addresses performance, not the language per se.

So really I would say that Python’s main design problems are in the performance domain – but these are basically the same challenges that any dynamic language has to face, and the Python family of languages and implementations are trying to address the issues. As for outright design mistakes like the ones listed in Javascript: the good parts, I think the meaning of “mistake” needs to be more explicitly defined, but you may want to check out the following for thoughts and opinions:

Answered By: Fabian Fagerholm

Things that frequently surprise inexperienced developers are candidate mistakes. Here is one, default arguments:

http://www.deadlybloodyserious.com/2008/05/default-argument-blunders/

Answered By: artsrc

A personal language peeve of mine is name binding for lambdas / local functions:

fns = []
for i in range(10):
    fns.append(lambda: i)

for fn in fns:
    print(fn()) # !!! always 9 - not what I'd naively expect

IMO, I’d much prefer looking up the names referenced in a lambda at declaration time. I understand the reasons for why it works the way it does, but still…

You currently have to work around it by binding i into a new name whos value doesn’t change, using a function closure.

Answered By: Tom Whittock

Yeah, it’s strange but I guess that’s what you get for having mutable variables.

I think the reason is that the “i” refers to a box which has a mutable value and the “for” loop will change that value over time, so reading the box value later gets you the only value there is left.
I don’t know how one would fix that short of making it a functional programming language without mutable variables (at least without unchecked mutable variables).

The workaround I use is creating a new variable with a default value (default values being evaluated at DEFINITION time in Python, which is annoying at other times) which causes copying of the value to the new box:

fns = []
for i in range(10):
    fns.append(lambda j=i: j)

for fn in fns:
    print(fn()) # works
Answered By: Danny Milosavljevic

I find it surprising that nobody mentioned the global interpreter lock.

Answered By: Andrei Alexandrescu

This is more of a minor problem with the language, rather than a fundamental mistake, but: Property overriding. If you override a property (using getters and setters), there is no easy way of getting the parent class’ property.

Answered By: bgw

One of the things I find most annoying in Python is using writelines() and readlines() on a file. readlines() not only returns a list of lines, but it also still has the n characters at the end of each line, so you have to always end up doing something like this to strip them:

lines = [l.replace("n", "").replace("r", "") for l in f.readlines()]

And when you want to use writelines() to write lines to a file, you have to add n at the end of every line in the list before you write them, sort of like this:

f.writelines([l + "n" for l in lines])

writelines() and readlines() should take care of endline characters in an OS independent way, so you don’t have to deal with it yourself.

You should just be able to go:

lines = f.readlines()

and it should return a list of lines, without n or r characters at the end of the lines.

Likewise, you should just be able to go:

f.writelines(lines)

To write a list of lines to a file, and it should use the operating systems preferred enline characters when writing the file, you shouldn’t need to do this yourself to the list first.

Answered By: Rob van der Linde

My biggest dislike is range(), because it doesn’t do what you’d expect, e.g.:

>>> for i in range(1,10): print i,
1 2 3 4 5 6 7 8 9

A naive user coming from another language would expect 10 to be printed as well.

Answered By: Phil Hunt

I think there’s a lot of weird stuff in python in the way they handle builtins/constants. Like the following:

True = "hello"
False = "hello"
print True == False

That prints True

def sorted(x):
  print "Haha, pwned"

sorted([4, 3, 2, 1])

Lolwut? sorted is a builtin global function. The worst example in practice is list, which people tend to use as a convenient name for a local variable and end up clobbering the global builtin.

Answered By: Clueless

You asked for liks; I have written a document on that topic some time ago: http://segfaulthunter.github.com/articles/biggestsurprise/

Answered By: Florian Mayer
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.