Check whether non-index column sorted in Pandas

Question:

Is there a way to test whether a dataframe is sorted by a given column that’s not an index (i.e. is there an equivalent to is_monotonic() for non-index columns) without calling a sort all over again, and without converting a column into an index?

Asked By: nick_eu

||

Answers:

You can use the numpy method:

import numpy as np

def is_df_sorted(df, colname):
    return (np.diff(df[colname]) > 0).all()

A more direct approach (like you suggested, but you say you don’t want it..) is to convert to an index and use the is_monotonic property:

import pandas as pd

def is_df_sorted(df, colname):
    return pd.Index(df[colname]).is_monotonic
Answered By: shx2

There are a handful of functions in pd.algos which might be of use. They’re all undocumented implementation details, so they might change from release to release:

>>> pd.algos.is[TAB]
pd.algos.is_lexsorted          pd.algos.is_monotonic_float64  pd.algos.is_monotonic_object
pd.algos.is_monotonic_bool     pd.algos.is_monotonic_int32
pd.algos.is_monotonic_float32  pd.algos.is_monotonic_int64    

The is_monotonic_* functions take an array of the specified dtype and a “timelike” boolean that should be False for most use cases. (Pandas sets it to True for a case involving times represented as integers.) The return value is a tuple whose first element represents whether the array is monotonically non-decreasing, and whose second element represents whether the array is monotonically non-increasing. Other tuple elements are version-dependent:

>>> df = pd.DataFrame({"A": [1,2,2], "B": [2,3,1]})
>>> pd.algos.is_monotonic_int64(df.A.values, False)[0]
True
>>> pd.algos.is_monotonic_int64(df.B.values, False)[0]
False

All these functions assume a specific input dtype, even is_lexsorted, which assumes the input is a list of int64 arrays. Pass it the wrong dtype, and it gets really confused:

In [32]: pandas.algos.is_lexsorted([np.array([-2, -1], dtype=np.int64)])
Out[32]: True
In [33]: pandas.algos.is_lexsorted([np.array([-2, -1], dtype=float)])
Out[33]: False
In [34]: pandas.algos.is_lexsorted([np.array([-1, -2, 0], dtype=float)])
Out[34]: True

I’m not entirely sure why Series don’t already have some kind of short-circuiting is_sorted. There might be something which makes it trickier than it seems.

Answered By: DSM

Meanwhile, since 0.19.0, there is pandas.Series.is_monotonic_increasing, pandas.Series.is_monotonic_decreasing, and pandas.Series.is_monotonic.

Answered By: Konstantin
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.