KeyError when attempting to access columns of a dataframe

Question

I’m attempting to create a function which removes unwanted columns from a dataframe based on values from a list, and separates the remaining columns into two different dataframes, by moving one column of the dataframe into another dataframe.

Unwanted columns are removed in the first for-cycle, where if Tkinter variable variabletype is equal to 1, the column with index i gets removed from the table. As the columns are dropped, the index of the following columns seems to decrease by 1, and to ensure we don’t miss any columns because of this, I implemented the count variable, which takes care of this problem. If no columns are dropped during the iteration, we append the i-th element of variabletype into a local variable usedvartypes, which we will use in the second for-cycle.

The first one works fine, however the second one keeps giving me the same error over and over. What it’s supposed to do is iterate through the remaining columns by using the length of usedvartypes, and if i-th element in usedvartypes is equal to 0, we want to copy i-th column into a new dataframe, and remove it from the previous one. However, anytime I try to run this, I get a KeyError at i-th index. I don’t understand why, am I attempting to access a pandas dataframe the wrong way?

def createFinalDataframe():
    global data
    global finaldata_x
    global finaldata_y
    global variabletype    #each value represents a single column in the dataframe; equal to 0 (y) 1(unwanted) or 2(x)

    finaldata_x = data
    count = 0
    usedvartypes=[]


    for i in range(len(variabletype)):
        if (variabletype[i].get() == 1):
            finaldata_x = finaldata_x.drop(finaldata_x.columns[count], axis=1)
            count = count - 1
        else:
            usedvartypes.append(variabletype[i].get())
        count = count + 1


    for i in range(len(usedvartypes)):
        if (usedvartypes[i]==0):
            finaldata_y = []
            print(finaldata_x[i])
            finaldata_y= finaldata_x[i].copy()
            finaldata_x = finaldata_x.drop(finaldata_x.columns[i], axis=1)
            break

Asked By: cocobolodesk

||

Source

Answer 1

Us iloc here. Change print(finaldata_x[i]) to print(finaldata_x.iloc[:, i]).

Updated logic:

def createFinalDataframe():
    global data, finaldata_x, finaldata_y, variabletype

    finaldata_x = data
    count = 0
    usedvartypes=[]


    for i in range(len(variabletype)):
        if (variabletype[i].get() == 1):
            finaldata_x = finaldata_x.drop(finaldata_x.columns[count], axis=1)
            count = count - 1
        else:
            usedvartypes.append(variabletype[i].get())
        count = count + 1


    for i in range(len(usedvartypes)):
        if (usedvartypes[i]==0):
            finaldata_y = []
            print(finaldata_x.iloc[:, i])
            finaldata_y= finaldata_x.iloc[:, i].copy()
            finaldata_x = finaldata_x.drop(finaldata_x.columns[i], axis=1)
            break

Answered By: Mike – SMT

KeyError when attempting to access columns of a dataframe

Question:

Answers: