Merging pandas dataframes generated by for loop

Question:

Let say I have below code:

import pandas as pd
import numpy as np
import random
import string

data = ['DF1', 'DF2', 'DF2']

for i in data :
    DF = pd.DataFrame([random.choices(string.ascii_lowercase,k=5), [10, 11, 12, 13, 14]]).T
    DF.columns = ['col1', 'col2']
    DF['i'] = i

So for each i, I have different DF. Finally I need to merge all those data frames based on col1 and add numbers in col2 row-wise.

In this case, total number of such dataframes is based on length of data array, and therefore variable. In R we can use do.call() function to merge such varying number of data frames. In Python is there any way to achieve this?

For example, lets say we have 3 individual tables as below:

enter image description here
enter image description here
enter image description here

After joining based on col1, I expect below table (sorted based on col1)

enter image description here

Any pointer will be highly appreciated.

Asked By: Bogaso

||

Answers:

you can use merge and actually you will be cross joining the dataframes :

import pandas as pd
import random
import string

data = ['DF1', 'DF2', 'DF2']
df = pd.DataFrame([random.choices(string.ascii_lowercase,k=5), [10, 11, 12, 13, 14]], ).T
df.columns = ['col1', 'col2']
da = pd.DataFrame(data, columns=['data'])
res = df.merge(da, how='cross')

print(res)

output:

>>>   
    col1 col2 data
0     j   10  DF1
1     j   10  DF2
2     j   10  DF2
3     d   11  DF1
4     d   11  DF2
5     d   11  DF2
6     k   12  DF1
7     k   12  DF2
8     k   12  DF2
9     e   13  DF1
10    e   13  DF2
11    e   13  DF2
12    n   14  DF1
13    n   14  DF2
14    n   14  DF2

Process finished with exit code 0
Answered By: eshirvana

IIUC, try:

df1 = pd.DataFrame({'col1':[*'adrtg']
                   ,'col2':[10,11,12,13,14]
                   ,'data':['DF1']*5})
df2 = pd.DataFrame({'col1':[*'adspq']
                   ,'col2':[10,11,12,13,14]
                   ,'data':['DF2']*5})
df3 = pd.DataFrame({'col1':[*'dcxyz']
                   ,'col2':[10,11,12,13,14]
                   ,'data':['DF3']*5})

pd.concat([df1, df2, df3]).groupby('col1', as_index=False)['col2'].sum()

Output:

   col1  col2
0     a    20
1     c    11
2     d    32
3     g    14
4     p    13
5     q    14
6     r    12
7     s    12
8     t    13
9     x    12
10    y    13
11    z    14
Answered By: Scott Boston
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.