Why does a column remain in DataFrame's index even after it is dropped

Question:

Consider the following piece of code:

>>> data = pandas.DataFrame({ 'user': [1, 5, 3, 10], 'week': [1, 1, 3, 4], 'value1': [5, 4, 3, 2], 'value2': [1, 1, 1, 2] })
>>> data = data.pivot_table(index='user', columns='week', fill_value=0)
>>> data['target'] = [True, True, False, True]
>>> data
     value1       value2       target
week      1  3  4      1  3  4
user
1         5  0  0      1  0  0   True
3         0  3  0      0  1  0   True
5         4  0  0      1  0  0  False
10        0  0  2      0  0  2   True

Now if I call this:

>>> 'target' in data.columns
True

It returns True as expected. However, why does this return True as well?

>>> 'target' in data.drop('target', axis=1).columns
True

How can I drop a column from the table so it’s no longer in the index and the above statement returns False?

Asked By: martindzejky

||

Answers:

As of now (pandas 0.19.2), a multiindex will retain all the ever used labels in its structure. Dropping a column doesn’t remove its label from the multiindex and it is still referenced in it. See long GH item here.

Thus, you have to workaround the issue and make assumptions. If you are sure the labels you’re checking are on a specific index level (level 0 in your example), then one way is to do this:

'target' in data.drop('target', axis=1).columns.get_level_values(0)
Out[145]: False

If it can be any level, you can use get_values() and lookup on the entire list:

import itertools as it
list(it.chain.from_iterable(data.drop('target', axis=1).columns.get_values()))
Out[150]: ['value1', 1, 'value1', 3, 'value1', 4, 'value2', 1, 'value2', 3, 'value2', 4]
Answered By: Zeugma

I propose @Jeff’s comment as a new Answer.

data = data.drop('target', axis=1)
data.columns = data.columns.remove_unused_levels()
Answered By: Ilya
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.