Python statistics package: difference between statsmodel and scipy.stats

Question:

I need some advice on selecting statistics package for Python, I’ve done quite some search, but not sure if I get everything right, specifically on the differences between statsmodels and scipy.stats.

One thing that I know is those with scikits namespace are specific “branches” of scipy, and what used to be scikits.statsmodels is now called statsmodels. On the other hand there is also scipy.stats. What are the differences between the two, and which one is the statistics package for Python?

Thanks.

–EDIT–

I changed the title because some answers are not really related to the question, and I suppose that’s because the title is not clear enough.

Asked By: herrfz

||

Answers:

I think THE statistics package is numpy/scipy. It works also great if you want to plot your data using matplotlib.
However, as far as I know, matplotlib doesn’t work with Python 3.x yet.

Answered By: user2015601

Statsmodels has scipy.stats as a dependency. Scipy.stats has all of the probability distributions and some statistical tests. It’s more like library code in the vein of numpy and scipy. Statsmodels on the other hand provides statistical models with a formula framework similar to R and it works with pandas DataFrames. There are also statistical tests, plotting, and plenty of helper functions in statsmodels. Really it depends on what you need, but you definitely don’t have to choose one. They have different aims and strengths.

Answered By: jseabold

I try to use pandas/statsmodels/scipy for my work on a day-to-day basis, but sometimes those packages come up a bit short (LOESS, anybody?). The problem with the RPy module is (last I checked, at least) that it wants a specific version of R that isn’t current—my R installation is 2.16 (I think) and RPy wanted 2.14. So either you have to have two parallel installations of R, or you have to downgrade. (If you don’t have R installed, then you can just install the correct version of R and use RPy.)

So when I need something that isn’t in pandas/statsmodels/scipy I write R scripts, and run them with the subprocess module. This lets me interact with R as little as possible (which I really don’t like programming in), but I can still leverage all the stuff that R has that the Python packages don’t.

The lesson is that there isn’t ever one solution to any problem—you have to assemble a whole bunch of parts that are all useful to you (and maybe write some of your own), in a way that you understand, to solve problems. (R aficionados will disagree, of course!)

Answered By: BenDundee
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.