Python how to create a dictionary using the values in multiple pandas dataframe columns as tuple keys and a single column as value
Question:
I would like to create a dictionary using the values in the pandas’ data frame multiple columns as tuple keys and single-column values as the value(s). And where there are no values for a particular tuple pair, I would like to assign a generic value of say 99999. This latter part is proving to be a challenge and I wish to seek help from this forum on how to achieve this task. Thank you.
Sample extract data:
Periods(Days)
Factory
Warehouse
Sales Outlets
Products
Dist Fact-Whse
0
1
Berlin
Teltow
Magdeburg
Maracuja
19.6
1
2
Hamburg
Wismar
Lubeck
Himbeer
126.2
2
3
Berlin
Kleinmachnow
Halle
Malaga
26.9
3
4
Hamburg
Wismar
Lubeck
Waldmeister
126.2
4
5
Berlin
Kleinmachnow
Leipzig
Walnuss
26.9
Based on the above data set, the following piece of code is how I am creating my dictionary object(s) from the data frame:
F = df.Factory.drop_duplicates().to_list()
W = df.Warehouse.drop_duplicates().to_list()
dist1 = {};
for i in df.index:
key = (df.at[i, 'Factory'], df.at[i, 'Warehouse'])
value = df.at[i, 'Dist Fact-Whse']
dicT = {key : value}
dist1.update(dicT)
for f in F:
for w in W:
if (f, w) not in dist1:
dist1[(f, w)] = 9999
I get my desired or expected outcome: {('Berlin', 'Teltow'): 19.6, ('Hamburg', 'Wismar'): 126.2, ('Berlin', 'Kleinmachnow'): 26.9, ('Berlin', 'Wismar'): 9999, ('Hamburg', 'Teltow'): 9999, ('Hamburg', 'Kleinmachnow'): 9999},
but this is too elaborous, time consuming, and not efficient as I have a of other parameter similar to "dist1" to create in my entire code.
I kindly welcome a more elegant and smart solution to this issue.
Answers:
Here is a proposition with pandas.DataFrame.groupby
and itertools.product
from itertools.
d = df.groupby(["Factory","Warehouse"])["Dist Fact-Whse"].first().to_dict()
combs = list(product(df["Factory"].unique(), df["Warehouse"].unique()))
dist1 = {k:v for v,k in zip([d.get(c, 99999) for c in combs], combs)}
This looks shorter (and maybe fancier) but not sure if it’s faster. I’ll let you give it a try yourself !
Output :
print(dist1)
{('Berlin', 'Teltow'): 19.6,
('Berlin', 'Wismar'): 99999,
('Berlin', 'Kleinmachnow'): 26.9,
('Hamburg', 'Teltow'): 99999,
('Hamburg', 'Wismar'): 126.2,
('Hamburg', 'Kleinmachnow'): 99999}
I would like to create a dictionary using the values in the pandas’ data frame multiple columns as tuple keys and single-column values as the value(s). And where there are no values for a particular tuple pair, I would like to assign a generic value of say 99999. This latter part is proving to be a challenge and I wish to seek help from this forum on how to achieve this task. Thank you.
Sample extract data:
Periods(Days) | Factory | Warehouse | Sales Outlets | Products | Dist Fact-Whse | |
---|---|---|---|---|---|---|
0 | 1 | Berlin | Teltow | Magdeburg | Maracuja | 19.6 |
1 | 2 | Hamburg | Wismar | Lubeck | Himbeer | 126.2 |
2 | 3 | Berlin | Kleinmachnow | Halle | Malaga | 26.9 |
3 | 4 | Hamburg | Wismar | Lubeck | Waldmeister | 126.2 |
4 | 5 | Berlin | Kleinmachnow | Leipzig | Walnuss | 26.9 |
Based on the above data set, the following piece of code is how I am creating my dictionary object(s) from the data frame:
F = df.Factory.drop_duplicates().to_list()
W = df.Warehouse.drop_duplicates().to_list()
dist1 = {};
for i in df.index:
key = (df.at[i, 'Factory'], df.at[i, 'Warehouse'])
value = df.at[i, 'Dist Fact-Whse']
dicT = {key : value}
dist1.update(dicT)
for f in F:
for w in W:
if (f, w) not in dist1:
dist1[(f, w)] = 9999
I get my desired or expected outcome: {('Berlin', 'Teltow'): 19.6, ('Hamburg', 'Wismar'): 126.2, ('Berlin', 'Kleinmachnow'): 26.9, ('Berlin', 'Wismar'): 9999, ('Hamburg', 'Teltow'): 9999, ('Hamburg', 'Kleinmachnow'): 9999},
but this is too elaborous, time consuming, and not efficient as I have a of other parameter similar to "dist1" to create in my entire code.
I kindly welcome a more elegant and smart solution to this issue.
Here is a proposition with pandas.DataFrame.groupby
and itertools.product
from itertools.
d = df.groupby(["Factory","Warehouse"])["Dist Fact-Whse"].first().to_dict()
combs = list(product(df["Factory"].unique(), df["Warehouse"].unique()))
dist1 = {k:v for v,k in zip([d.get(c, 99999) for c in combs], combs)}
This looks shorter (and maybe fancier) but not sure if it’s faster. I’ll let you give it a try yourself !
Output :
print(dist1)
{('Berlin', 'Teltow'): 19.6,
('Berlin', 'Wismar'): 99999,
('Berlin', 'Kleinmachnow'): 26.9,
('Hamburg', 'Teltow'): 99999,
('Hamburg', 'Wismar'): 126.2,
('Hamburg', 'Kleinmachnow'): 99999}