How to add two dataframes with tuples

Question

I am extracting data from a Databank and storing it in a dictionary. Then I convert this dictionary into a DataFrame. I am left with two DataFrames, which I’d like to add but the data is stored in tuples.

Both DataFrames are really big (66 rows x 8497 columns) but look something like this:

df1

	0	1	2	3
P00001	(-17.5,)	(-16.2,)	(-15.9,)	(-14.3,)
P00002	(-11.3,)	(-13.1,)	(-13.8,)	(-10.4,)
P00003	(-17.0,)	(-18.0,)	(-17.6,)	(-13.6,)
P00004	None	None	None	None

df2

	0	1	2	3
P00001	(3.3,)	(3.8,)	(5.6,)	(7.5,)
P00002	(4.2,)	(2.3,)	(1.5,)	(5.3,)
P00003	(0.0,)	(0.0,)	(0.0,)	(0.0,)
P00004	(2.8,)	(3.7,)	(4.8,)	(3.9,)

I’d like to add for example the value (P00001,0) in df1 = -17.5 with the value (P00001,0) in df2 = 3.3 and so on, so that it looks like this:

	0	1	2	3
P00001	-14.2	-12.4	-10.3	-6.8
P00002	-7.1	-10.8	-12.3	-5.1
P00003	-17.0	-18.0	-17.6	-13.6
P00004	2.8	3.7	4.8	3.9

I have tried:

df_add = df1.add(df2, fill_value=0)

tuple(np.add(df1,df2))

tuple(map(sum,zip(df1,df2)))

I also tried turning the dataframe into int, but that didn’t work either.

df1_new = df1[:].astype(int)

df_new = df1.convert_dtypes(int)

df_new = df1.apply(pd.to_numeric, errors='ignore')

I am a beginner, please let me know if you need more information.

Asked By: snkm

||

Source

Answer 1

Transforming the tuples to integers is indeed an option:

import numpy as np

def tuple2int(x):
    try:
        return x[0]
   except:
       return 0

df1[:] = np.vectorize(tuple2int)(df1)
df2[:] = np.vectorize(tuple2int)(df2)

Then add the data frames as you suggested:

df_add = df1.add(df2, fill_value=0)

Answered By: rosa b.

How to add two dataframes with tuples

Question:

Answers: