Calculate Standard Deviation on a List of List that includes strings on some columns
Question:
I am trying to get the standard deviation of a lists of lists but not on all ‘columns’ of the list since some are numbers (middle columns). So I would skip those.
param_data = [["a", 2, 3, 6, 7, "b"],
["c", 6, 7, 8, 2, "d"],
["e", 5, 6, 8, 1, "f"]]
Expected results is:
params = [std.dev(2, 6, 5),
std.dev(3, 7, 6),
std.dev(6, 8, 8),
std.dev(7, 2, 1)]
Note: not evaluating the standard deviation because it is not relevant to the question, just expressed that would be evaluated.
I tried using zip(*param_data) but cannot figure out how to only zip columns 1-4.
Answers:
Use a list comprehension to get the corresponding elements, and then use zip
.
from statistics import stdev
param_data = [["a", 2, 3, 6, 7, "b"],["c", 6, 7, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]
elements = [x[1:5] for x in param_data]
print([stdev(x) for x in zip(*elements)])
This assumes that the numbers are always in that 1:5 slice; if that’s not the case, we’ll need more information.
Here is a solution with better error handling:
from statistics import stdev
from numbers import Number
def resilient_stdevs(columns):
cols = list(zip(*columns))
for c in cols:
if isinstance(c[0], Number):
if not all(isinstance(x, Number) for x in c):
raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
if not isinstance(c[0], Number):
if not all(not isinstance(x, Number) for x in c):
raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
return [stdev(xs) for xs in cols if isinstance(xs[0], (int, float))]
param_data = [["a", 2, "s", 6, 7, "b"],["c", 6, 4, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]
print(resilient_stdevs(param_data))
This crashes (as it should) with a clear error message:
line 12, in resilient_stdevs
raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
TypeError: Cannot compute stdev of mixed types in a single column: got ('s', 4, 6)
You can use list(zip(*param_data))
to transpose and isinstance
to check the types, this works wherever the string columns are, even in the middle or if you have more of them:
from statistics import stdev
param_data = [["a", 2, 3, 6, 7, "b"],["c", 6, 7, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]
cols = list(zip(*param_data))
params = [stdev(xs) for xs in cols if isinstance(xs[0], (int, float))]
print(params)
Output:
[2.0816659994661326, 2.0816659994661326, 1.1547005383792515, 3.2145502536643185]
try it:
from statistics import stdev
param_data = [["a", 2, 3, 6, 7, "b"],
["c", 6, 7, 8, 2, "d"],
["e", 5, 6, 8, 1, "f"]]
params = []
for column in zip(*param_data):
try:
params.append(stdev(column))
except TypeError:
pass
print(params)
I am trying to get the standard deviation of a lists of lists but not on all ‘columns’ of the list since some are numbers (middle columns). So I would skip those.
param_data = [["a", 2, 3, 6, 7, "b"],
["c", 6, 7, 8, 2, "d"],
["e", 5, 6, 8, 1, "f"]]
Expected results is:
params = [std.dev(2, 6, 5),
std.dev(3, 7, 6),
std.dev(6, 8, 8),
std.dev(7, 2, 1)]
Note: not evaluating the standard deviation because it is not relevant to the question, just expressed that would be evaluated.
I tried using zip(*param_data) but cannot figure out how to only zip columns 1-4.
Use a list comprehension to get the corresponding elements, and then use zip
.
from statistics import stdev
param_data = [["a", 2, 3, 6, 7, "b"],["c", 6, 7, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]
elements = [x[1:5] for x in param_data]
print([stdev(x) for x in zip(*elements)])
This assumes that the numbers are always in that 1:5 slice; if that’s not the case, we’ll need more information.
Here is a solution with better error handling:
from statistics import stdev
from numbers import Number
def resilient_stdevs(columns):
cols = list(zip(*columns))
for c in cols:
if isinstance(c[0], Number):
if not all(isinstance(x, Number) for x in c):
raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
if not isinstance(c[0], Number):
if not all(not isinstance(x, Number) for x in c):
raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
return [stdev(xs) for xs in cols if isinstance(xs[0], (int, float))]
param_data = [["a", 2, "s", 6, 7, "b"],["c", 6, 4, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]
print(resilient_stdevs(param_data))
This crashes (as it should) with a clear error message:
line 12, in resilient_stdevs
raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
TypeError: Cannot compute stdev of mixed types in a single column: got ('s', 4, 6)
You can use list(zip(*param_data))
to transpose and isinstance
to check the types, this works wherever the string columns are, even in the middle or if you have more of them:
from statistics import stdev
param_data = [["a", 2, 3, 6, 7, "b"],["c", 6, 7, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]
cols = list(zip(*param_data))
params = [stdev(xs) for xs in cols if isinstance(xs[0], (int, float))]
print(params)
Output:
[2.0816659994661326, 2.0816659994661326, 1.1547005383792515, 3.2145502536643185]
try it:
from statistics import stdev
param_data = [["a", 2, 3, 6, 7, "b"],
["c", 6, 7, 8, 2, "d"],
["e", 5, 6, 8, 1, "f"]]
params = []
for column in zip(*param_data):
try:
params.append(stdev(column))
except TypeError:
pass
print(params)