Pandas column access w/column names containing spaces
Question:
If I import or create a pandas column that contains no spaces, I can access it as such:
from pandas import DataFrame
df1 = DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'data1': range(7)})
df1.data1
which would return that series for me. If, however, that column has a space in its name, it isn’t accessible via that method:
from pandas import DataFrame
df2 = DataFrame({'key': ['a','b','d'],
'data 2': range(3)})
df2.data 2 # <--- not the droid I'm looking for.
I know I can access it using .xs():
df2.xs('data 2', axis=1)
There’s got to be another way. I’ve googled it like mad and can’t think of any other way to google it. I’ve read all 96 entries here on SO that contain "column" and "string" and "pandas" and could find no previous answer. Is this the only way, or is there something better?
Answers:
I think the default way is to use the bracket method instead of the dot notation.
import pandas as pd
df1 = pd.DataFrame({
'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'dat a1': range(7)
})
df1['dat a1']
The other methods, like exposing it as an attribute are more for convenience.
Old post, but may be interesting: an idea (which is destructive, but does the job if you want it quick and dirty) is to rename columns using underscores:
df1.columns = [c.replace(' ', '_') for c in df1.columns]
While the accepted answer works for column-specification when using dictionaries or []-selection, it does not generalise to other situations where one needs to refer to columns, such as the assign
method:
> df.assign("data 2" = lambda x: x.sum(axis=1)
SyntaxError: keyword can't be an expression
If you want to apply filtering, that’s also possible with column names having spaces in it, e.g. filtering for NULL-values or empty strings:
df_package[(df_package['Country_Region Code'].notnull()) |
(df_package['Country_Region Code'] != u'')]
as I figured out thanks to Rutger Kassies answer.
If you like to supply spaced columns name to pandas method like assign you can dictionarize your inputs.
df.assign(**{'space column': (lambda x: x['space column2'])})
You can do it with df['Column Name']
If I import or create a pandas column that contains no spaces, I can access it as such:
from pandas import DataFrame
df1 = DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'data1': range(7)})
df1.data1
which would return that series for me. If, however, that column has a space in its name, it isn’t accessible via that method:
from pandas import DataFrame
df2 = DataFrame({'key': ['a','b','d'],
'data 2': range(3)})
df2.data 2 # <--- not the droid I'm looking for.
I know I can access it using .xs():
df2.xs('data 2', axis=1)
There’s got to be another way. I’ve googled it like mad and can’t think of any other way to google it. I’ve read all 96 entries here on SO that contain "column" and "string" and "pandas" and could find no previous answer. Is this the only way, or is there something better?
I think the default way is to use the bracket method instead of the dot notation.
import pandas as pd
df1 = pd.DataFrame({
'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'dat a1': range(7)
})
df1['dat a1']
The other methods, like exposing it as an attribute are more for convenience.
Old post, but may be interesting: an idea (which is destructive, but does the job if you want it quick and dirty) is to rename columns using underscores:
df1.columns = [c.replace(' ', '_') for c in df1.columns]
While the accepted answer works for column-specification when using dictionaries or []-selection, it does not generalise to other situations where one needs to refer to columns, such as the assign
method:
> df.assign("data 2" = lambda x: x.sum(axis=1)
SyntaxError: keyword can't be an expression
If you want to apply filtering, that’s also possible with column names having spaces in it, e.g. filtering for NULL-values or empty strings:
df_package[(df_package['Country_Region Code'].notnull()) |
(df_package['Country_Region Code'] != u'')]
as I figured out thanks to Rutger Kassies answer.
If you like to supply spaced columns name to pandas method like assign you can dictionarize your inputs.
df.assign(**{'space column': (lambda x: x['space column2'])})
You can do it with df['Column Name']