Slicing numpy arrays in a for loop

Question

I have a multidimentional numpy array of elasticities with one of the dimensions being "age-groups" (below/above 18 years) and the other "income-groups" (low/high income).

I would like to create a table with the mean elasticities for each combination of subgroups using a for loop.

My code is as follows:

import numpy as np

elasticity = np.random.rand(2,92)
print(elasticity.shape)

income = ['i0','i1']
age_gr= [':18','18:']

table = {}
for i in range(len(age_gr)):
    for j in range(len(income)):
        key = age_gr[i]+"_"+income[j]
        table[key] = np.mean(elasticity[age_gr[i],j])
print(table)

My problem is that "age_gr[i]" gives me an error "IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices
". In reality I have many more age-groups, so I can’t do this manually.

I would like to have something like this as a result:

with … representing the mean of elasticities for the sub-group.

Asked By: Stata_user

||

Source

Answer 1

The exact expected output is unclear, but for sure you cannot use a ':18' string and expect it to behave like a slice (slice(None, 18)).

You can use a function to convert:

import numpy as np

elasticity = np.random.rand(2,92)
print(elasticity.shape)

income = ['i0','i1']
age_gr= [':18','18:']

def str_to_slice(s):
    return slice(*(int(x) if x.isdigit() else None for x in s.split(':')))

table = {}
for i in range(len(age_gr)):
    for j in range(len(income)):
        key = age_gr[i]+"_"+income[j]
        table[key] = np.mean(elasticity[str_to_slice(age_gr[i]), j])
print(table)

Output:

{':18_i0': 0.19273470668594983,
 ':18_i1': 0.484071263606304,
 '18:_i0': nan,
 '18:_i1': nan}

Answered By: mozway

Answer 2

The error you’re seeing is because you’re trying to use a string value ("18:") as an index for the numpy array. Instead, you should use the corresponding integer indices for the age groups.

An example could be:

age_gr_idx = {'<18': 0, '18+': 1}

Then, in your loop, you can use this mapping to get the correct integer index for each age group:

import numpy as np

elasticity = np.random.rand(2, 92)
print(elasticity.shape)

income = ['i0', 'i1']
age_gr = ['<18', '18+']
age_gr_idx = {'<18': 0, '18+': 1}

table = {}
for i in range(len(age_gr)):
    for j in range(len(income)):
        key = age_gr[i] + "_" + income[j]
        table[key] = np.mean(elasticity[age_gr_idx[age_gr[i]], j])

Of course this is not the only way to accomplish this kind of result, but I think that is quite close to your solution.

Answered By: Benny

Slicing numpy arrays in a for loop

Question:

Answers: