How to add two dataframes with tuples
Question:
I am extracting data from a Databank and storing it in a dictionary. Then I convert this dictionary into a DataFrame. I am left with two DataFrames, which I’d like to add but the data is stored in tuples.
Both DataFrames are really big (66 rows x 8497 columns) but look something like this:
df1
0
1
2
3
P00001
(-17.5,)
(-16.2,)
(-15.9,)
(-14.3,)
P00002
(-11.3,)
(-13.1,)
(-13.8,)
(-10.4,)
P00003
(-17.0,)
(-18.0,)
(-17.6,)
(-13.6,)
P00004
None
None
None
None
df2
0
1
2
3
P00001
(3.3,)
(3.8,)
(5.6,)
(7.5,)
P00002
(4.2,)
(2.3,)
(1.5,)
(5.3,)
P00003
(0.0,)
(0.0,)
(0.0,)
(0.0,)
P00004
(2.8,)
(3.7,)
(4.8,)
(3.9,)
I’d like to add for example the value (P00001,0) in df1 = -17.5 with the value (P00001,0) in df2 = 3.3 and so on, so that it looks like this:
0
1
2
3
P00001
-14.2
-12.4
-10.3
-6.8
P00002
-7.1
-10.8
-12.3
-5.1
P00003
-17.0
-18.0
-17.6
-13.6
P00004
2.8
3.7
4.8
3.9
I have tried:
df_add = df1.add(df2, fill_value=0)
tuple(np.add(df1,df2))
tuple(map(sum,zip(df1,df2)))
I also tried turning the dataframe into int, but that didn’t work either.
df1_new = df1[:].astype(int)
df_new = df1.convert_dtypes(int)
df_new = df1.apply(pd.to_numeric, errors='ignore')
I am a beginner, please let me know if you need more information.
Answers:
Transforming the tuples to integers is indeed an option:
import numpy as np
def tuple2int(x):
try:
return x[0]
except:
return 0
df1[:] = np.vectorize(tuple2int)(df1)
df2[:] = np.vectorize(tuple2int)(df2)
Then add the data frames as you suggested:
df_add = df1.add(df2, fill_value=0)
I am extracting data from a Databank and storing it in a dictionary. Then I convert this dictionary into a DataFrame. I am left with two DataFrames, which I’d like to add but the data is stored in tuples.
Both DataFrames are really big (66 rows x 8497 columns) but look something like this:
df1
0 | 1 | 2 | 3 | |
---|---|---|---|---|
P00001 | (-17.5,) | (-16.2,) | (-15.9,) | (-14.3,) |
P00002 | (-11.3,) | (-13.1,) | (-13.8,) | (-10.4,) |
P00003 | (-17.0,) | (-18.0,) | (-17.6,) | (-13.6,) |
P00004 | None | None | None | None |
df2
0 | 1 | 2 | 3 | |
---|---|---|---|---|
P00001 | (3.3,) | (3.8,) | (5.6,) | (7.5,) |
P00002 | (4.2,) | (2.3,) | (1.5,) | (5.3,) |
P00003 | (0.0,) | (0.0,) | (0.0,) | (0.0,) |
P00004 | (2.8,) | (3.7,) | (4.8,) | (3.9,) |
I’d like to add for example the value (P00001,0) in df1 = -17.5 with the value (P00001,0) in df2 = 3.3 and so on, so that it looks like this:
0 | 1 | 2 | 3 | |
---|---|---|---|---|
P00001 | -14.2 | -12.4 | -10.3 | -6.8 |
P00002 | -7.1 | -10.8 | -12.3 | -5.1 |
P00003 | -17.0 | -18.0 | -17.6 | -13.6 |
P00004 | 2.8 | 3.7 | 4.8 | 3.9 |
I have tried:
df_add = df1.add(df2, fill_value=0)
tuple(np.add(df1,df2))
tuple(map(sum,zip(df1,df2)))
I also tried turning the dataframe into int, but that didn’t work either.
df1_new = df1[:].astype(int)
df_new = df1.convert_dtypes(int)
df_new = df1.apply(pd.to_numeric, errors='ignore')
I am a beginner, please let me know if you need more information.
Transforming the tuples to integers is indeed an option:
import numpy as np
def tuple2int(x):
try:
return x[0]
except:
return 0
df1[:] = np.vectorize(tuple2int)(df1)
df2[:] = np.vectorize(tuple2int)(df2)
Then add the data frames as you suggested:
df_add = df1.add(df2, fill_value=0)