python pandas: case insensitive drop column
Question:
I have a df and I want to drop a column by label but in a case insensitive way. Note: I don’t want to change anything in my df so I’d like to avoid ‘str.lower’.
heres my df:
print df
Name UnweightedBase Base q6a1 q6a2 q6a3 q6a4 q6a5 q6a6 eSubTotal
Name
Base 1006 1006 100,00% 96,81% 96,81% 96,81% 96,81% 3,19% 490,44%
q6_6 31 32 100,00% - - - - - -
q6_3 1006 1006 43,44% 26,08% 13,73% 9,22% 4,34% 3,19% 100,00%
q6_4 1006 1006 31,78% 31,71% 20,09% 10,37% 2,87% 3,19% 100,00%
Is there any magic I can apply to the code below?
df.drop(['unWeightedbase', 'Q6A1'],1)
Answers:
I think what you can do is create a function to perform the case-insensitive search for you:
In [90]:
# create a noddy df
df = pd.DataFrame({'UnweightedBase':np.arange(5)})
print(df.columns)
# create a list of the column names
col_list = list(df)
# define our function to perform the case-insensitive search
def find_col_name(name):
try:
# this uses a generator to find the index if it matches, will raise an exception if not found
return col_list[next(i for i,v in enumerate(col_list) if v.lower() == name)]
except:
return ''
df.drop(find_col_name('unweightedbase'),1)
Index(['UnweightedBase'], dtype='object')
Out[90]:
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]
my search code is attributed to this SO one: find the index of a string ignoring cases
A similar option to EdChum’s answer would be to define a general function that performs a case-insensitive search for a group of strings, and use that function to find the names of the columns to drop.
import pandas as pd
def find_case_insensitive(strings, search_for):
"""Find strings by searching for case-insensitive matches."""
lowercase_search = [s.lower() for s in search_for]
return [val for val in strings if val.lower() in lowercase_search]
df = pd.DataFrame(
{
"UnweightedBase": [1006, 31, 1006, 1006],
"q6a1": [100.0, 100.0, 43.44, 31.78],
}
)
empty_df = df.drop(
columns=find_case_insensitive(df.columns, ["unWeightedbase", "Q6A1"])
)
(The function above will ignore any searched strings that didn’t have a match; depending on how the function will be used, a different behavior may be better.)
You could also define a helper function for dropping DataFrame columns using case-insensitive names.
def drop_columns_case_insensitive(df, cols):
"""Drop columns from a DataFrame using case-insensitive column names."""
return df.drop(columns=find_case_insensitive(df.columns, cols))
empty_df = drop_columns_case_insensitive(df, ["unWeightedbase", "Q6A1"])
I have a df and I want to drop a column by label but in a case insensitive way. Note: I don’t want to change anything in my df so I’d like to avoid ‘str.lower’.
heres my df:
print df
Name UnweightedBase Base q6a1 q6a2 q6a3 q6a4 q6a5 q6a6 eSubTotal
Name
Base 1006 1006 100,00% 96,81% 96,81% 96,81% 96,81% 3,19% 490,44%
q6_6 31 32 100,00% - - - - - -
q6_3 1006 1006 43,44% 26,08% 13,73% 9,22% 4,34% 3,19% 100,00%
q6_4 1006 1006 31,78% 31,71% 20,09% 10,37% 2,87% 3,19% 100,00%
Is there any magic I can apply to the code below?
df.drop(['unWeightedbase', 'Q6A1'],1)
I think what you can do is create a function to perform the case-insensitive search for you:
In [90]:
# create a noddy df
df = pd.DataFrame({'UnweightedBase':np.arange(5)})
print(df.columns)
# create a list of the column names
col_list = list(df)
# define our function to perform the case-insensitive search
def find_col_name(name):
try:
# this uses a generator to find the index if it matches, will raise an exception if not found
return col_list[next(i for i,v in enumerate(col_list) if v.lower() == name)]
except:
return ''
df.drop(find_col_name('unweightedbase'),1)
Index(['UnweightedBase'], dtype='object')
Out[90]:
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4]
my search code is attributed to this SO one: find the index of a string ignoring cases
A similar option to EdChum’s answer would be to define a general function that performs a case-insensitive search for a group of strings, and use that function to find the names of the columns to drop.
import pandas as pd
def find_case_insensitive(strings, search_for):
"""Find strings by searching for case-insensitive matches."""
lowercase_search = [s.lower() for s in search_for]
return [val for val in strings if val.lower() in lowercase_search]
df = pd.DataFrame(
{
"UnweightedBase": [1006, 31, 1006, 1006],
"q6a1": [100.0, 100.0, 43.44, 31.78],
}
)
empty_df = df.drop(
columns=find_case_insensitive(df.columns, ["unWeightedbase", "Q6A1"])
)
(The function above will ignore any searched strings that didn’t have a match; depending on how the function will be used, a different behavior may be better.)
You could also define a helper function for dropping DataFrame columns using case-insensitive names.
def drop_columns_case_insensitive(df, cols):
"""Drop columns from a DataFrame using case-insensitive column names."""
return df.drop(columns=find_case_insensitive(df.columns, cols))
empty_df = drop_columns_case_insensitive(df, ["unWeightedbase", "Q6A1"])