Whenever we talk about dynamic languages like Python, speed is one of the top issues. To solve this, they say PyPy is 6.3 times faster.
If PyPy can solve these great challenges, what are its weaknesses that are preventing wider adoption? That is to say, what’s preventing someone like me, a typical Python developer, from switching to PyPy right now?
Because pypy is not 100% compatible, takes 8 gigs of ram to compile, is a moving target, and highly experimental, where cpython is stable, the default target for module builders for 2 decades (including c extensions that don’t work on pypy), and already widely deployed.
Pypy will likely never be the reference implementation, but it is a good tool to have.
The second question is easier to answer: you basically can use PyPy as a drop-in replacement if all your code is pure Python. However, many widely used libraries (including some of the standard library) are written in C and compiled as Python extensions. Some of these can be made to work with PyPy, some can’t. PyPy provides the same “forward-facing” tool as Python — that is, it is Python — but its innards are different, so tools that interface with those innards won’t work.
As for the first question, I imagine it is sort of a Catch-22 with the first: PyPy has been evolving rapidly in an effort to improve speed and enhance interoperability with other code. This has made it more experimental than official.
I think it’s possible that if PyPy gets into a stable state, it may start getting more widely used. I also think it would be great for Python to move away from its C underpinnings. But it won’t happen for a while. PyPy hasn’t yet reached the critical mass where it is almost useful enough on its own to do everything you’d want, which would motivate people to fill in the gaps.
NOTE: PyPy is more mature and better supported now than it was in 2013, when this question was asked. Avoid drawing conclusions from out-of-date information.
Those are the main reasons that affect me, I’d say.
I did a small benchmark on this topic. While many of the other posters have made good points about compatibility, my experience has been that PyPy isn’t that much faster for just moving around bits. For many uses of Python, it really only exists to translate bits between two or more services. For example, not many web applications are performing CPU intensive analysis of datasets. Instead, they take some bytes from a client, store them in some sort of database, and later return them to other clients. Sometimes the format of the data is changed.
The BDFL and the CPython developers are a remarkably intelligent group of people and have a managed to help CPython perform excellent in such a scenario. Here’s a shameless blog plug: http://www.hydrogen18.com/blog/unpickling-buffers.html . I’m using Stackless, which is derived from CPython and retains the full C module interface. I didn’t find any advantage to using PyPy in that case.
That site does not claim PyPy is 6.3 times faster than CPython. To quote:
The geometric average of all benchmarks is 0.16 or 6.3 times faster than CPython
This is a very different statement to the blanket statement you made, and when you understand the difference, you’ll understand at least one set of reasons why you can’t just say “use PyPy”. It might sound like I’m nit-picking, but understanding why these two statements are totally different is vital.
To break that down:
The statement they make only applies to the benchmarks they’ve used. It says absolutely nothing about your program (unless your program is exactly the same as one of their benchmarks).
The statement is about an average of a group of benchmarks. There is no claim that running PyPy will give a 6.3 times improvement even for the programs they have tested.
There is no claim that PyPy will even run all the programs that CPython runs at all, let alone faster.
CPython has reference counting and garbage collection, PyPy has garbage collection only.
So objects tend to be deleted earlier and
__del__ is called in a more predictable way in CPython. Some software relies on this behavior, thus they are not ready for migrating to PyPy.
Some other software works with both, but uses less memory with CPython, because unused objects are freed earlier. (I don’t have any measurements to indicate how significant this is and what other implementation details affect the memory use.)
For a lot of projects, there is actually 0% difference between the different pythons in terms of speed. That is those that are dominated by engineering time and where all pythons have the same amount of library support.
Q: If PyPy can solve these great challenges (speed, memory consumption, parallelism) in comparison to CPython, what are its weaknesses that are preventing wider adoption?
A: First, there is little evidence that the PyPy team can solve the speed problem in general. Long-term evidence is showing that PyPy runs certain Python codes slower than CPython and this drawback seems to be rooted very deeply in PyPy.
Secondly, the current version of PyPy consumes much more memory than CPython in a rather large set of cases. So PyPy didn’t solve the memory consumption problem yet.
Whether PyPy solves the mentioned great challenges and will in general be faster, less memory hungry, and more friendly to parallelism than CPython is an open question that cannot be solved in the short term. Some people are betting that PyPy will never be able to offer a general solution enabling it to dominate CPython 2.7 and 3.3 in all cases.
If PyPy succeeds to be better than CPython in general, which is questionable, the main weakness affecting its wider adoption will be its compatibility with CPython. There also exist issues such as the fact that CPython runs on a wider range of CPUs and OSes, but these issues are much less important compared to PyPy’s performance and CPython-compatibility goals.
Q: Why can’t I do drop in replacement of CPython with PyPy now?
A: PyPy isn’t 100% compatible with CPython because it isn’t simulating CPython under the hood. Some programs may still depend on CPython’s unique features that are absent in PyPy such as C bindings, C implementations of Python object&methods, or the incremental nature of CPython’s garbage collector.
To make this simple: PyPy provides the speed that’s lacked by CPython but sacrifices its compatibility. Most people, however, choose Python for its flexibility and its “battery-included” feature (high compatibility), not for its speed (it’s still preferred though).
I’ve found examples, where PyPy is slower than Python.
But: Only on Windows.
C:UsersUser>python -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 294 msec per loop
C:UsersUser>pypy -m timeit -n10 -s"from sympy import isprime" "isprime(2**521-1);isprime(2**1279-1)"
10 loops, best of 3: 1.33 sec per loop
So, if you think of PyPy, forget Windows.
On Linux, you can achieve awesome accelerations.
Example (list all primes between 1 and 1,000,000):
from sympy import sieve
primes = list(sieve.primerange(1, 10**6))
This runs 10(!) times faster on PyPy than on Python.
But not on windows. There it is only 3x as fast.
PyPy has had Python 3 support for a while, but according to this HackerNoon post by Anthony Shaw from April 2nd, 2018, PyPy3 is still several times slower than PyPy (Python 2).
For many scientific calculations, particularly matrix calculations, numpy is a better choice (see FAQ: Should I install numpy or numpypy?).
For Project Euler problems, I make frequent use of PyPy, and for simple numerical calculations often
from __future__ import division is sufficient for my purposes, but Python 3 support is still being worked on as of 2018, with your best bet being on 64-bit Linux. Windows PyPy3.5 v6.0, the latest as of December 2018, is in beta.
To cite the Zen of Python:
For example, Python 3.8 introduced fstring =.
There might be other features in Python 3.8+ which are more important to you. PyPy does not support Python 3.8+ at the moment.
Shameless self-advertisement: Killer Features by Python version – if you want to know more things you miss by using older Python versions