Proper way to access a column of a pandas dataframe
Question:
For example I have a dataframe like this.
Date Open High Low Close
0 2009-08-25 20246.789063 20476.250000 20143.509766 20435.240234
Adj Close Volume
0 20435.240234 1531430000
Using attribute or explicit naming both give me the same output:
sum(data.Date==data['Date']) == data.shape[0]
True
However I cannot access columns that are named with white space, like 'Adj Close'
with df.columnname
, but can do with df['columnname']
.
Is using df['columnname']
strictly better than using df.columnname
?
Answers:
Using .
as a column accessor is a convenience. There are many limitations beyond having spaces in the name. For example, if your column is named the same as an existing dataframe attribute or method, you won’t be able to use it with a .
. A non-exhaustive list is mean
, sum
, index
, values
, to_dict
, etc. You also cannot reference columns with numeric headers via the .
accessor.
So, yes, ['col']
is strictly better than .col
because it is more consistent and reliable.
For example I have a dataframe like this.
Date Open High Low Close
0 2009-08-25 20246.789063 20476.250000 20143.509766 20435.240234
Adj Close Volume
0 20435.240234 1531430000
Using attribute or explicit naming both give me the same output:
sum(data.Date==data['Date']) == data.shape[0]
True
However I cannot access columns that are named with white space, like 'Adj Close'
with df.columnname
, but can do with df['columnname']
.
Is using df['columnname']
strictly better than using df.columnname
?
Using .
as a column accessor is a convenience. There are many limitations beyond having spaces in the name. For example, if your column is named the same as an existing dataframe attribute or method, you won’t be able to use it with a .
. A non-exhaustive list is mean
, sum
, index
, values
, to_dict
, etc. You also cannot reference columns with numeric headers via the .
accessor.
So, yes, ['col']
is strictly better than .col
because it is more consistent and reliable.