Python: calculate with dataframe and dictionary?

Question:

I have a dataframe/excel sheet with transaction types of business processes and how often a transaction type was performed:

branch Transaction Type occurrences
aa red 12
aa green 100
bb blue 20
cc red 12
cc green 100
cc blue 20

I have a second df/excel sheet with processing time in seconds per transaction type

Transaction Type time in S
red 120
green 320
blue 60

What i need is a new column in the processes-df, where the # of occurrences is multiplied by the processing time, in order to get the effort in seconds for a specific transaction type:

branch Transaction Type occurrences Effort in S
aa red 12 1440
aa green 100 32000
bb blue 20 1200
cc red 12 1440
cc green 100 32000
cc blue 20 1200

[edit]
I was not precise enough. it is not only a simple merge of 2 dataframes, but rather the calculation of the effort per branch….
[/edit]

As i am a beginner with only theoretical knowledge i assume that i have to import my 2 excels with openpyxl and create dataframes with pandas.
Then i need to iterate over the dataframes and maybe with a function (lambda?) i can do this simple calculation.
Maybe it is better to create a dictionary out of the 2nd excel, since it has only 2 columns?

Any help is appreciated 🙂

Asked By: hugo999

||

Answers:

Use Pandas library in python, much easier to do this thing.

import pandas as pd
df1 = pd.read_csv(<PATH_TO_FILE>)
df2 = pd.read_csv(<PATH_TO_SECOND_FILE>)
final_df = pd.DataFrame()
final_df = df1 #get first three columns same as df1
final_df.merge(df2, on='Transaction Type', how='left')
final_df['Effort in S'] = final_df['time in S']*final_df['occurrences']
#Incase u need to remove the time in S column
#df.drop('column_name', axis=1, inplace=True)
final_df.to_csv(<PATH_TO_Directory/file_name>, sep='t', encoding='utf-8', index=False)

Edited after seeing you edited the question.

Answered By: Mathpdegeek497
import pandas as pd

df1 = pd.DataFrame({"branch":["aa","aa","bb","cc","cc","cc"], "Transaction Type": ["red","green","blue", "red","green","blue"], "occurrences":[12,100,20,12,100,20]})
df2 = pd.DataFrame({"Transaction Type": ["red","green","blue"], "time in S":[120,320,60]})
df3 = df1.merge(df2, how='inner')

df3["Effort in S"] = df3["occurrences"]*df3["time in S"]
df3 = df3.drop("time in S", axis=1).sort_values('branch')
print(df3)
Answered By: Ajeet Verma

thank you, both suggested solutions work fine.

Answered By: hugo999
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.