Calculate Standard Deviation on a List of List that includes strings on some columns

Question:

I am trying to get the standard deviation of a lists of lists but not on all ‘columns’ of the list since some are numbers (middle columns). So I would skip those.

param_data = [["a", 2, 3, 6, 7, "b"],
              ["c", 6, 7, 8, 2, "d"],
              ["e", 5, 6, 8, 1, "f"]]

Expected results is:

params = [std.dev(2, 6, 5),
          std.dev(3, 7, 6),
          std.dev(6, 8, 8),
          std.dev(7, 2, 1)]

Note: not evaluating the standard deviation because it is not relevant to the question, just expressed that would be evaluated.
I tried using zip(*param_data) but cannot figure out how to only zip columns 1-4.

Asked By: chepox

||

Answers:

Use a list comprehension to get the corresponding elements, and then use zip.

from statistics import stdev

param_data = [["a", 2, 3, 6, 7, "b"],["c", 6, 7, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]

elements = [x[1:5] for x in param_data]
print([stdev(x) for x in zip(*elements)])

This assumes that the numbers are always in that 1:5 slice; if that’s not the case, we’ll need more information.

Answered By: Rahul

Here is a solution with better error handling:

from statistics import stdev
from numbers import Number

def resilient_stdevs(columns):
    cols = list(zip(*columns))
    for c in cols:
        if isinstance(c[0], Number):
            if not all(isinstance(x, Number) for x in c):
                raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
        if not isinstance(c[0], Number):
            if not all(not isinstance(x, Number) for x in c):
                raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
    return [stdev(xs) for xs in cols if isinstance(xs[0], (int, float))]

param_data = [["a", 2, "s", 6, 7, "b"],["c", 6, 4, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]

print(resilient_stdevs(param_data))

This crashes (as it should) with a clear error message:

line 12, in resilient_stdevs
    raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
TypeError: Cannot compute stdev of mixed types in a single column: got ('s', 4, 6)

You can use list(zip(*param_data)) to transpose and isinstance to check the types, this works wherever the string columns are, even in the middle or if you have more of them:

from statistics import stdev

param_data = [["a", 2, 3, 6, 7, "b"],["c", 6, 7, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]

cols = list(zip(*param_data))

params = [stdev(xs) for xs in cols if isinstance(xs[0], (int, float))]

print(params)

Output:

[2.0816659994661326, 2.0816659994661326, 1.1547005383792515, 3.2145502536643185]
Answered By: Caridorc

try it:

from statistics import stdev

param_data = [["a", 2, 3, 6, 7, "b"],
              ["c", 6, 7, 8, 2, "d"],
              ["e", 5, 6, 8, 1, "f"]]

params = []
for column in zip(*param_data):
    try:
        params.append(stdev(column))
    except TypeError:
        pass

print(params)
Answered By: Kelly Bundy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.