Python Pandas Select Index where index is larger than x

Question:

Say I have a DataFrame df with date as index and some values. How can I select the rows where the date is larger than some value x?

I know I can convert the index to a column and then do the select df[df['date']>x], but is that slower than doing the operation on the index?

Asked By: user3092887

||

Answers:

Example of selecting from a DataFrame with the use of index:

from numpy.random import randn
from pandas import DataFrame
from datetime import timedelta as td
import dateutil.parser

d = dateutil.parser.parse("2014-01-01")
df = DataFrame(randn(6,2), columns=list('AB'), index=[d + td(days=x) for x in range(1,7)])

In [1]: df
Out[1]:
                   A         B
2014-01-02 -1.172285  1.706200
2014-01-03  0.039511 -0.320798
2014-01-04 -0.192179 -0.539397
2014-01-05 -0.475917 -0.280055
2014-01-06  0.163376  1.124602
2014-01-07 -2.477812  0.656750

In [2]: df[df.index > dateutil.parser.parse("2014-01-04")]
Out[2]:
                   A         B
2014-01-05 -0.475917 -0.280055
2014-01-06  0.163376  1.124602
2014-01-07 -2.477812  0.656750
Answered By: Datageek

The existing answer is correct, however if we are selecting based on the index, the second method from here would be faster:

# Set index
df = df.set_index(df['date'])

# Select observations between two datetimes
df.loc[pd.Timestamp('2002-1-1 01:00:00'):pd.Timestamp('2002-1-1 04:00:00')]
Answered By: ntg

Alternatively you can use query:

In [14]: df = pd.DataFrame(
    ...:     {'alpha': list('ABCDE'), 'num': range(5)},
    ...:     index=pd.date_range('2022-06-30', '2022-07-04'),
    ...: )

In [15]: df
Out[15]: 
           alpha  num
2022-06-30     A    0
2022-07-01     B    1
2022-07-02     C    2
2022-07-03     D    3
2022-07-04     E    4

In [16]: df.query('index >= "2022-07-02"')
Out[16]: 
           alpha  num
2022-07-02     C    2
2022-07-03     D    3
2022-07-04     E    4
Answered By: rachwa