How to search for a specific date within concatenated DataFrame TimeSeries. Same Date would repeat several times in a merged df


I downloaded historical price data for ^GSPC Share Market Index (S&P500), and several other Global Indices. Date is set as index.

Selecting values in rows when date is set to index works as expected with .loc.

# S&P500 DataFrame = spx_df

Open            1.116560e+03
High            1.133870e+03
Low             1.116560e+03
Close           1.132990e+03
Volume          3.991400e+09
Dividends       0.000000e+00
Stock Splits    0.000000e+00
Name: 2010-01-04 00:00:00-05:00, dtype: float64

I then concatenated several Stock Market Global Indices into a single DataFrame for further use. In effect, any date in range will be included five times when historical data for five Stock Indices are linked in a Time Series.

markets = pd.concat(ticker_list, axis = 0)

I want to reference a single date in concatenated df and set it as a variable. I would prefer if the said variable didn’t represent a datetime object, because I would like to access it with .loc as part of def function. How does concatenate effect accessing rows via date as index if the same date repeats several times in a linked TimeSeries?

This is what I attempted so far:

# markets = concatenated DataFrame 
Reference_date = markets.loc['2010-01-04'] 
# KeyError: '2010-01-04'

Reference_date = markets.loc[markets.Date == '2010-01-04']
# This doesn't work because Date is not an attribute of the DataFrame
Asked By: maikoh



Since you have set date as index you should be able to do:
Reference_date = markets.loc[markets.index == '2010-01-04']

Answered By: Casper Knudsen

To access a specific date in the concatenated DataFrame, you can use boolean indexing instead of .loc. This will return a DataFrame that contains all rows where the date equals the reference date:

reference_date = markets[markets.index == ‘2010-01-04’]

You may also want to use query() method for searching for specific data

reference_date = markets.query('index == "2010-01-04"')

Keep in mind that the resulting variable reference_date is still a DataFrame and contains all rows that match the reference date across all the concatenated DataFrames. If you want to extract only specific columns, you can use the column name like this:

reference_date_Open = markets.query('index == "2010-01-04"')["Open"]
Answered By: rebvar ebra