Python: API request nested dictionaries to dataframe with datetime indexed values

Question:

I run a query on python to get hourly price data from an API, using the get function:

result = (requests.get(url_prices, headers=headers, params={'SpotKey':'1','Fields':'hours','FromDate':'2016-05-05','ToDate':'2016-12-05','Currency':'eur','SortType':'ascending'}).json())

where ‘SpotKey’ identifies the item I want to retrieve from the API, in this example ‘1’ is hourly price timeseries (the other parameters are self explanatory).

The result from the query is:

{'SpotKey': '1',
 'SpotName': 'APX',
 'Denomination': 'eur/mwh',
 'Elements': [{'Date': '2016-05-05T00:00:00.0000000',
   'TimeSpans': [{'TimeSpan': '00:00-01:00', 'Value': 23.69},
    {'TimeSpan': '01:00-02:00', 'Value': 21.86},
    {'TimeSpan': '02:00-03:00', 'Value': 21.26},
    {'TimeSpan': '03:00-04:00', 'Value': 20.26},
    {'TimeSpan': '04:00-05:00', 'Value': 19.79},
    {'TimeSpan': '05:00-06:00', 'Value': 19.79},
...
    {'TimeSpan': '19:00-20:00', 'Value': 57.52},
    {'TimeSpan': '20:00-21:00', 'Value': 49.4},
    {'TimeSpan': '21:00-22:00', 'Value': 42.23},
    {'TimeSpan': '22:00-23:00', 'Value': 34.99},
    {'TimeSpan': '23:00-24:00', 'Value': 33.51}]}]}

where ‘Elements’ is the relevant list containing the timeseries, structured as nested dictionaries of ‘Date’ keys and ‘TimeSpans’ keys.

Each ‘TimeSpans’ keys contains other nested dictionaries for each hour of the day, with a ‘TimeSpan’ key for the hour and a ‘Value’ key for the price.

I would like to transform it to a dataframe like:

Datetime           eur/mwh
2016-05-05 00:00:00 23.69
2016-05-05 01:00:00 21.86
2016-05-05 02:00:00 21.26
2016-05-05 03:00:00 20.26
2016-05-05 04:00:00 19.79
... ...
2016-12-05 19:00:00 57.52
2016-12-05 20:00:00 49.40
2016-12-05 21:00:00 42.23
2016-12-05 22:00:00 34.99
2016-12-05 23:00:00 33.51

For the time being I managed to do so doing:

df = pd.concat([pd.DataFrame(x) for x in result['Elements']])
df['Date'] = pd.to_datetime(df['Date'] + ' ' + [x['TimeSpan'][:5] for x in df['TimeSpans']], errors='coerce')
df[result['Denomination']] = [x['Value'] for x in df['TimeSpans']]
df = df.set_index(df['Date'], drop=True).drop(columns=['Date','TimeSpans'])
df = df[~df.index.isnull()]

I did so because the daylight-saving-time is replacing the ‘TimeSpan’ hourly values with ‘dts’ string, giving ParseDate errors when creating the datetime index.
Since I will request data very frequently and potentially for different granularities (e.g. half-hourly), is there a better / quicker / standard way to shape so many nested dictionaries into a dataframe with the format I look for, that allows to avoid the parsing date error for daylight-saving-time changes?

thank you in advance, cheers.

Asked By: Luca91

||

Answers:

You did not give examples of the dts, so I cannot verify. But in principle, trating the Date as timestamp and TimeSpan as as timedeltas should give you both the ability to ignore granularity changes and potentialy include additional "dts" parsing.

def parse_time(x):
    if "dst" not in x:
        return x[:5]+":00"
    return f"{int(x[:2])+1}{x[2:5]}:00"  # TODO ACTUALLY PARSE, time overflow etc

df = pd.DataFrame(result['Elements']).set_index("Date")
d2 = df.TimeSpans.explode().apply(pd.Series)
d2['Datetime'] = pd.to_datetime(d2.index) + pd.to_timedelta(d2.TimeSpan.apply(parse_dt))
pd.DataFrame(d2.set_index(d2.Datetime).Value).rename(columns={"Value": "eur/mwh"})

gives

enter image description here

Answered By: pwoolvett

this should work:

df = pd.DataFrame()
cols = ['Datetime', 'eur/mwh']

# concat days together to one df
for day in results['Elements']:
    # chunk represents a day worth of data to concat
    chunk = []
    date = pd.to_datetime(day['Date'])
    for pair in day['TimeSpans']:
        # hour offset is just the first 2 characters of TimeSpan
        offset = pd.DateOffset(hours=int(pair['TimeSpan'][:1])
        value = pair['Value']
        chunk.append([(date + offset), value])
    
    # concat day-chunk to df
    df = pd.concat([df, pd.DataFrame(chunk, columns=cols)]

only thing i’m not 100% sure of is the pd.to_datetime() but if it does’t work you just need to use a format argument with it.

hope it helps 🙂

Answered By: maxxel_