pop index out of range error in Python when modifying a copy of lists and dictionaries while iterating through the original dictionary

Question:

I have two Python lists. One list, called area = [1500, 500, 1500, 2000, 2500, 2000], stores the areas of homes. Another list, called price = [30000, 10000, 20000, 40000, 50000, 45000], stores the corresponding prices of these homes. I created a dictionary called priceDict, whose keys are the unique areas of homes and whose values are lists of the prices of homes of the same area. I created another dictionary, called priceIndexDict, whose keys are the unique areas of homes and whose values are lists of indices of the prices of homes of the same area as shown in the price list.
I wrote the following Python code to create the two dictionaries.

area = [1200, 1200, 1200, 2000]
price = [15000, 11000, 17000, 25000]

priceDict = {}
priceIndexDict = {}
zipList = list(zip(area, price))
zipListIndex = 0

for k, v in zipList:
    priceDict.setdefault(k, []).append(v)
    priceIndexDict.setdefault(k, []).append(zipListIndex)
zipListIndex += 1

print(priceDict)
print(priceIndexDict)

After I correctly got the priceDict = {1200: [15000, 11000, 17000], 2000: [25000]} and priceIndexDict = {1200: [0, 1, 2], 2000: [3]}, I would then like to create the new list newPrice and the new dictionary newPriceDict with the same contents as price and priceDict but with outliers of house prices for homes of the same size removed, as well as the new list newArea with the same content as “`area“ but with the corresponding areas for the house price outliers removed.

The following steps were used to determine the outliers:

  • Select the home to test.
  • Create a list of prices of other homes of the same size. It will be called compList in the examples.
  • If there are no other homes of the same size, the house being tested is not an outlier.
  • Otherwise:
    • Calculate the mean price, P[m], and the standard deviation, σ, for the homes compList.
    • If |price[i] – P[m]|> 3 * σ, the house is an outlier.

I wrote the following Python code. Note that function definitions for the mean(), variance() and stdev() functions are not shown here.

newArea = area.copy()
newPrice = price.copy()
newPriceDict = priceDict.copy()

for (houseArea, housePrice) in priceDict.items():
    if len(housePrice) == 1:
        continue
    for i in range(len(housePrice)):
        compList = housePrice.copy()
        compList.pop(i)
        if abs(housePrice[i] - mean(compList)) > 3 * stdev(compList):
            newArea.pop(priceIndexDict[houseArea][i])
            newPrice.pop(priceIndexDict[houseArea][i])
            newPriceDict[houseArea].pop(i)

print(newArea)
print(newPrice)
print(newPriceDict)

However, when I executed the code, the IDE displayed the following error:

line 44, in <module>
    compList.pop(i)
IndexError: pop index out of range

How can I fix this error?

Asked By: YuanLinTech

||

Answers:

newPriceDict[houseArea].pop(i) modifies housePrice. newPriceDict is a shallow copy of priceDict, so the values in the dictionary are the same as the values in PriceDict.

Use copy.deepcopy() to make a deep copy.

from copy import deepcopy
from statistics import mean, stdev

area = [1200, 1200, 1200, 2000]
price = [15000, 11000, 17000, 25000]

priceDict = {}
priceIndexDict = {}
zipList = list(zip(area, price))
zipListIndex = 0

for k, v in zipList:
    priceDict.setdefault(k, []).append(v)
    priceIndexDict.setdefault(k, []).append(zipListIndex)
zipListIndex += 1

print(priceDict)
print(priceIndexDict)

newArea = area.copy()
newPrice = price.copy()
newPriceDict = deepcopy(priceDict)

for (houseArea, housePrice) in priceDict.items():
    if len(housePrice) == 1:
        continue
    for i in range(len(housePrice)):
        compList = housePrice.copy()
        compList.pop(i)
        if abs(housePrice[i] - mean(compList)) > 3 * stdev(compList):
            newArea.pop(priceIndexDict[houseArea][i])
            newPrice.pop(priceIndexDict[houseArea][i])
            newPriceDict[houseArea].pop(i)

print(newArea)
print(newPrice)
print(newPriceDict)
Answered By: Barmar
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.