Why does my Pandas DataFrame not display new order using `sort_values`?
Question:
New to Pandas, so maybe I’m missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4)
:
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I’m working through my code in a Python3 Jupyter notebook. I can’t get past sorting any column. Working through the different code examples for sort, I’m not seeing the output reorder when I inspect the df. So, I’ve reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I’ve previewed the results using print(df)
, df.head()
, and HTML(df.to_html())
(the last example is for Jupyter notebooks). I’ve also rerun the whole notebook from import CSV to this code. And, I’m also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don’t see how that’s relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string
. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values
?
Answers:
df.sort_values(['Total Due'])
returns a sorted DF, but it doesn’t update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)
My problem, fyi, was that I wasn’t returning the resulting dataframe, so PyCharm wasn’t bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return
at the end of my method instead of
return df
,
which the debugger must of noticed, because df
wasn’t being updated in spite of my explicit, in-place sort.
The below syntax will return a sorted dataframe but it won’t update the dataframe itself.
df.sort_values(['column name'])
So assign the reutrned dataframe to the same dataframe itself as shown below :
df = df.sort_values(['column name'])
New to Pandas, so maybe I’m missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4)
:
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I’m working through my code in a Python3 Jupyter notebook. I can’t get past sorting any column. Working through the different code examples for sort, I’m not seeing the output reorder when I inspect the df. So, I’ve reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I’ve previewed the results using print(df)
, df.head()
, and HTML(df.to_html())
(the last example is for Jupyter notebooks). I’ve also rerun the whole notebook from import CSV to this code. And, I’m also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don’t see how that’s relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string
. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values
?
df.sort_values(['Total Due'])
returns a sorted DF, but it doesn’t update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)
My problem, fyi, was that I wasn’t returning the resulting dataframe, so PyCharm wasn’t bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return
at the end of my method instead of
return df
,
which the debugger must of noticed, because df
wasn’t being updated in spite of my explicit, in-place sort.
The below syntax will return a sorted dataframe but it won’t update the dataframe itself.
df.sort_values(['column name'])
So assign the reutrned dataframe to the same dataframe itself as shown below :
df = df.sort_values(['column name'])