create hashmap from dictionaries python
Question:
I have dictionaries like
{
{'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23822.86,
'direction': 'sell',
'amount': 0.5},
{
'instrument_name': 'BTC-30JUN23-40000-C',
'index_price': 23813.52,
'direction': 'sell',
'amount': 0.1},
{
'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23812.99,
'direction': 'sell',
'amount': 6.0},
{
'instrument_name': 'BTC-26MAY23-18000-P',
'index_price': 23817.83,
'direction': 'buy',
'amount': 0.3}
}
I want output like , group by dates and adding amount in dictionary.
{ 24FEB23 : 6.5, 30JUN23: 0.1 , 26MAY23:0.3}
Basically to sum up the values from the string date
instrument_date= instrument_name.split()[1]
Is there any better way other than using for loop in this.
Answers:
As oskros suggested, you might use pandas for this like so. Note that I’ve changed your {}
to []
so it is a list of dictionaries:
data = [{'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23822.86,
'direction': 'sell',
'amount': 0.5},
{
'instrument_name': 'BTC-30JUN23-40000-C',
'index_price': 23813.52,
'direction': 'sell',
'amount': 0.1},
{
'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23812.99,
'direction': 'sell',
'amount': 6.0},
{
'instrument_name': 'BTC-26MAY23-18000-P',
'index_price': 23817.83,
'direction': 'buy',
'amount': 0.3}]
df = pd.DataFrame(data)
print(df.groupby('instrument_name')['amount'].sum())
Outcome:
instrument_name
BTC-24FEB23-24000-C 6.5
BTC-26MAY23-18000-P 0.3
BTC-30JUN23-40000-C 0.1
I don’t quite understand the problem with a for loop here. If dicts is a list of your dictionaries, then
from collections import defaultdict
d = defaultdict(float)
for x in dicts:
d[x['instrument_name'].split('-')[1]] += x['amount']
# d = {'24FEB23': 6.5, '30JUN23': 0.1, '26MAY23': 0.3}
Should be fast enough, unless you are dealing with massively big inputs
As you suggest, you can solve this using a for loop (with a defaultdict)
dat = [{'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23822.86,
'direction': 'sell',
'amount': 0.5},
{'instrument_name': 'BTC-30JUN23-40000-C',
'index_price': 23813.52,
'direction': 'sell',
'amount': 0.1},
{'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23812.99,
'direction': 'sell',
'amount': 6.0},
{'instrument_name': 'BTC-26MAY23-18000-P',
'index_price': 23817.83,
'direction': 'buy',
'amount': 0.3}]
Solution with loop:
from collections import defaultdict
def sum_dates(dat):
out = defaultdict(lambda: 0)
for dct in dat:
out[dct['instrument_name'].split('-')[1]] += dct['amount']
return dict(out)
%timeit sum_dates(dat)
>>> 1.82 µs +/- 292 ns per loop (mean +/- std. dev. of 7 runs, 1,000,000 loops each)
Solution with pandas:
import pandas as pd
df = pd.DataFrame(dat)
df['date'] = df['instrument_name'].str.split('-').str[1]
def sum_dates_pandas(df):
return df.groupby('date')['amount'].sum().to_dict()
>>> %timeit sum_dates_pandas(df)
219 µs +/- 18.6 µs per loop (mean +/- std. dev. of 7 runs, 1,000 loops each)
Seems the first solution is the fastest
I have dictionaries like
{
{'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23822.86,
'direction': 'sell',
'amount': 0.5},
{
'instrument_name': 'BTC-30JUN23-40000-C',
'index_price': 23813.52,
'direction': 'sell',
'amount': 0.1},
{
'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23812.99,
'direction': 'sell',
'amount': 6.0},
{
'instrument_name': 'BTC-26MAY23-18000-P',
'index_price': 23817.83,
'direction': 'buy',
'amount': 0.3}
}
I want output like , group by dates and adding amount in dictionary.
{ 24FEB23 : 6.5, 30JUN23: 0.1 , 26MAY23:0.3}
Basically to sum up the values from the string date
instrument_date= instrument_name.split()[1]
Is there any better way other than using for loop in this.
As oskros suggested, you might use pandas for this like so. Note that I’ve changed your {}
to []
so it is a list of dictionaries:
data = [{'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23822.86,
'direction': 'sell',
'amount': 0.5},
{
'instrument_name': 'BTC-30JUN23-40000-C',
'index_price': 23813.52,
'direction': 'sell',
'amount': 0.1},
{
'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23812.99,
'direction': 'sell',
'amount': 6.0},
{
'instrument_name': 'BTC-26MAY23-18000-P',
'index_price': 23817.83,
'direction': 'buy',
'amount': 0.3}]
df = pd.DataFrame(data)
print(df.groupby('instrument_name')['amount'].sum())
Outcome:
instrument_name
BTC-24FEB23-24000-C 6.5
BTC-26MAY23-18000-P 0.3
BTC-30JUN23-40000-C 0.1
I don’t quite understand the problem with a for loop here. If dicts is a list of your dictionaries, then
from collections import defaultdict
d = defaultdict(float)
for x in dicts:
d[x['instrument_name'].split('-')[1]] += x['amount']
# d = {'24FEB23': 6.5, '30JUN23': 0.1, '26MAY23': 0.3}
Should be fast enough, unless you are dealing with massively big inputs
As you suggest, you can solve this using a for loop (with a defaultdict)
dat = [{'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23822.86,
'direction': 'sell',
'amount': 0.5},
{'instrument_name': 'BTC-30JUN23-40000-C',
'index_price': 23813.52,
'direction': 'sell',
'amount': 0.1},
{'instrument_name': 'BTC-24FEB23-24000-C',
'index_price': 23812.99,
'direction': 'sell',
'amount': 6.0},
{'instrument_name': 'BTC-26MAY23-18000-P',
'index_price': 23817.83,
'direction': 'buy',
'amount': 0.3}]
Solution with loop:
from collections import defaultdict
def sum_dates(dat):
out = defaultdict(lambda: 0)
for dct in dat:
out[dct['instrument_name'].split('-')[1]] += dct['amount']
return dict(out)
%timeit sum_dates(dat)
>>> 1.82 µs +/- 292 ns per loop (mean +/- std. dev. of 7 runs, 1,000,000 loops each)
Solution with pandas:
import pandas as pd
df = pd.DataFrame(dat)
df['date'] = df['instrument_name'].str.split('-').str[1]
def sum_dates_pandas(df):
return df.groupby('date')['amount'].sum().to_dict()
>>> %timeit sum_dates_pandas(df)
219 µs +/- 18.6 µs per loop (mean +/- std. dev. of 7 runs, 1,000 loops each)
Seems the first solution is the fastest