Python pandas select condition why to write dataframe name twice like frame[frame['col1'].notna()]?
Question:
I have more experience with SQL
then with Python
and now start to use Python
more. I’ve read comparison with sql for pandas
.
Groupby
is clear to understand for me groupby('colname')
.
However why for select
we need to write name of frame twice like in example frame[frame['col1'].notna()]
I could not find a reason via web search.
Answers:
Just summarizing helpful comments:
This is called boolean masking/indexing, and is a way to select subsets of your data. It is a Python convention for numpy and pandas (which is built on numpy), pandas mask()
function can be used to achieve the same result.
Just to add, nowadays you can use the query
method to achieve a somewhat more natural SQL-like syntax, see e.g. Querying for NaN and other names in Pandas
I have more experience with SQL
then with Python
and now start to use Python
more. I’ve read comparison with sql for pandas
.
Groupby
is clear to understand for me groupby('colname')
.
However why for select
we need to write name of frame twice like in example frame[frame['col1'].notna()]
I could not find a reason via web search.
Just summarizing helpful comments:
This is called boolean masking/indexing, and is a way to select subsets of your data. It is a Python convention for numpy and pandas (which is built on numpy), pandas mask()
function can be used to achieve the same result.
Just to add, nowadays you can use the query
method to achieve a somewhat more natural SQL-like syntax, see e.g. Querying for NaN and other names in Pandas