Pandas datetime series report values at specific intervals

Question:

Given data from print(cot_report_splice)

date        symbol     Net Positioning
2020-10-20  PA         3413
            PL         7825
2020-10-27  PA         3468
            PL        10051
2020-11-03  PA         2416
                      ...  
2022-12-06  PL        25636
2022-12-13  PA         -883
            PL        28445
2022-12-20  PA        -2627
            PL        24052

I’m attempting a dashboard that print(results)

Symbol 1W 2W 1MO 3MO 1YR
PA -2627 -883 25000 -10000
PL 24052 28445 35000 -5000

However the for iteration below gets hinged by unknown datetime string format, unable to parse: symbol, and I do not see an immediate remedy..?

cot_report = get_cot_report_fmp(start_date=START_DATE, end_date=END_DATE)

contracts_list = ['PL', 'PA']

cot_report = cot_report[cot_report['symbol'].isin(contracts_list)]

cot_report = cot_report.reset_index().set_index(['date', 'symbol']).unstack().stack()

cot_report = cot_report[['noncomm_positions_long_all', 'noncomm_positions_short_all']]

cot_report['net_positioning'] = cot_report['noncomm_positions_long_all']-cot_report['noncomm_positions_short_all']

cot_report_splice = cot_report.loc[:, 'net_positioning']

results = pd.DataFrame(columns=['symbol', '1W', '2W', '3MO', '1YR'])

for symbol in cot_report_splice['symbol'].unique():
    symbol_df = cot_report_splice[cot_report_splice['symbol'] == symbol]
    
    most_recent_date = symbol_df['date'].max()
    
    week_mask = (symbol_df['date'] >= most_recent_date - pd.Timedelta(weeks=1)) & (symbol_df['date'] <= most_recent_date)
    two_week_mask = (symbol_df['date'] >= most_recent_date - pd.Timedelta(weeks=2)) & (symbol_df['date'] <= most_recent_date)
    three_mo_mask = (symbol_df['date'] >= most_recent_date - pd.Timedelta(weeks=15)) & (symbol_df['date'] <= most_recent_date)
    year_mask = (symbol_df['date'] >= most_recent_date - pd.Timedelta(weeks=52)) & (symbol_df['date'] <= most_recent_date)
    week_row = symbol_df.loc[week_mask, 'net_positioning'].iloc[0]
    two_week_row = symbol_df.loc[two_week_mask, 'net_positioning'].iloc[-1]
    three_mo_row = symbol_df.loc[three_mo_mask, 'net_positioning'].iloc[-1]
    year_row = symbol_df.loc[year_mask, 'net_positioning'].iloc[-1]
    
    results = results.append({'symbol': contract, '1W': week_row, '2W': two_week_row, '3M': three_mo_row, '1YR': year_row}, ignore_index=True)

results.set_index('symbol', inplace=True)

display(HTML(results._repr_html_()))
Asked By: SJ1

||

Answers:

You can use the answer I gave here https://stackoverflow.com/a/75036050/10251643

If you apply this, the code after the creation of cot_report_splice, will be:

cot_report_splice = cot_report.loc[:, 'net_positioning']

columns = ['1W', '2W', '3MO', '1YR']
weeks = [1, 2, 12, 52]
filter = [most_recent_date - pd.Timedelta(weeks=w) for w in weeks]
idx_level = cot_report_splice.index.get_level_values('date')
results = cot_report_splice.loc[idx_level.isin(filter)]
results.sort_index(level=['date', 'symbol'], ascending=[False, True], inplace=True)
d = dict(zip(filter, columns))
idx = results.index.levels[0].to_series().replace(d)
results.index = results.index.set_levels(idx, level='date')
results = results.unstack(level='date')
results.columns.rename(None, inplace=True)

With this you get the information for the dashboard.

Answered By: Adolfo Cardozo