create hashmap from dictionaries python

Question:

I have dictionaries like

    {
    {'instrument_name': 'BTC-24FEB23-24000-C',
    'index_price': 23822.86,
    'direction': 'sell',
    'amount': 0.5},
   {
    'instrument_name': 'BTC-30JUN23-40000-C',
    'index_price': 23813.52,
    'direction': 'sell',
    'amount': 0.1},
   {
    'instrument_name': 'BTC-24FEB23-24000-C',
    'index_price': 23812.99,
    'direction': 'sell',
    'amount': 6.0},
   {
    'instrument_name': 'BTC-26MAY23-18000-P',
    'index_price': 23817.83,
    'direction': 'buy',
    'amount': 0.3}
}

I want output like , group by dates and adding amount in dictionary.

{ 24FEB23 : 6.5, 30JUN23: 0.1 , 26MAY23:0.3}

Basically to sum up the values from the string date

instrument_date= instrument_name.split()[1]

Is there any better way other than using for loop in this.

Asked By: Madan

||

Answers:

As oskros suggested, you might use pandas for this like so. Note that I’ve changed your {} to [] so it is a list of dictionaries:

data = [{'instrument_name': 'BTC-24FEB23-24000-C',
    'index_price': 23822.86,
    'direction': 'sell',
    'amount': 0.5},
   {
    'instrument_name': 'BTC-30JUN23-40000-C',
    'index_price': 23813.52,
    'direction': 'sell',
    'amount': 0.1},
   {
    'instrument_name': 'BTC-24FEB23-24000-C',
    'index_price': 23812.99,
    'direction': 'sell',
    'amount': 6.0},
   {
    'instrument_name': 'BTC-26MAY23-18000-P',
    'index_price': 23817.83,
    'direction': 'buy',
    'amount': 0.3}]

df = pd.DataFrame(data)
print(df.groupby('instrument_name')['amount'].sum())

Outcome:

instrument_name
BTC-24FEB23-24000-C    6.5
BTC-26MAY23-18000-P    0.3
BTC-30JUN23-40000-C    0.1
Answered By: JarroVGIT

I don’t quite understand the problem with a for loop here. If dicts is a list of your dictionaries, then

from collections import defaultdict

d = defaultdict(float)
for x in dicts:
    d[x['instrument_name'].split('-')[1]] += x['amount']

# d = {'24FEB23': 6.5, '30JUN23': 0.1, '26MAY23': 0.3}

Should be fast enough, unless you are dealing with massively big inputs

Answered By: PedroTurik

As you suggest, you can solve this using a for loop (with a defaultdict)

dat = [{'instrument_name': 'BTC-24FEB23-24000-C',
        'index_price': 23822.86,
        'direction': 'sell',
        'amount': 0.5},
       {'instrument_name': 'BTC-30JUN23-40000-C',
        'index_price': 23813.52,
        'direction': 'sell',
        'amount': 0.1},
       {'instrument_name': 'BTC-24FEB23-24000-C',
        'index_price': 23812.99,
        'direction': 'sell',
        'amount': 6.0},
       {'instrument_name': 'BTC-26MAY23-18000-P',
        'index_price': 23817.83,
        'direction': 'buy',
        'amount': 0.3}]

Solution with loop:

from collections import defaultdict
def sum_dates(dat):
    out = defaultdict(lambda: 0)
    for dct in dat:
        out[dct['instrument_name'].split('-')[1]] += dct['amount']
    return dict(out)

%timeit sum_dates(dat)
>>> 1.82 µs +/- 292 ns per loop (mean +/- std. dev. of 7 runs, 1,000,000 loops each)

Solution with pandas:

import pandas as pd
df = pd.DataFrame(dat)
df['date'] = df['instrument_name'].str.split('-').str[1]

def sum_dates_pandas(df):
    return df.groupby('date')['amount'].sum().to_dict()


>>> %timeit sum_dates_pandas(df)
219 µs +/- 18.6 µs per loop (mean +/- std. dev. of 7 runs, 1,000 loops each)

Seems the first solution is the fastest

Answered By: oskros