append two data frame with pandas
Question:
When I try to merge two dataframes by rows doing:
bigdata = data1.append(data2)
I get the following error:
Exception: Index cannot contain duplicate values!
The index of the first data frame starts from 0 to 38 and the second one from 0 to 48. I didn’t understand that I have to modify the index of one of the data frame before merging, but I don’t know how to.
Thank you.
These are the two dataframes:
data1
:
meta particle ratio area type
0 2 part10 1.348 0.8365 touching
1 2 part18 1.558 0.8244 single
2 2 part2 1.893 0.894 single
3 2 part37 0.6695 1.005 single
....clip...
36 2 part23 1.051 0.8781 single
37 2 part3 80.54 0.9714 nuclei
38 2 part34 1.071 0.9337 single
data2
:
meta particle ratio area type
0 3 part10 0.4756 1.025 single
1 3 part18 0.04387 1.232 dusts
2 3 part2 1.132 0.8927 single
...clip...
46 3 part46 13.71 1.001 nuclei
47 3 part3 0.7439 0.9038 single
48 3 part34 0.4349 0.9956 single
the first column is the index
Answers:
The append
function has an optional argument ignore_index
which you should use here to join the records together, since the index isn’t meaningful for your application.
You could first identify the index-duplicated (not value) row using groupby
method, and then do a sum/mean operation on all the rows with the duplicate index.
data1 = data1.groupby(data1.index).sum()
data2 = data2.groupby(data2.index).sum()
Try using pd.concat
bigdata = pd.concat([data1,data2])
When I try to merge two dataframes by rows doing:
bigdata = data1.append(data2)
I get the following error:
Exception: Index cannot contain duplicate values!
The index of the first data frame starts from 0 to 38 and the second one from 0 to 48. I didn’t understand that I have to modify the index of one of the data frame before merging, but I don’t know how to.
Thank you.
These are the two dataframes:
data1
:
meta particle ratio area type
0 2 part10 1.348 0.8365 touching
1 2 part18 1.558 0.8244 single
2 2 part2 1.893 0.894 single
3 2 part37 0.6695 1.005 single
....clip...
36 2 part23 1.051 0.8781 single
37 2 part3 80.54 0.9714 nuclei
38 2 part34 1.071 0.9337 single
data2
:
meta particle ratio area type
0 3 part10 0.4756 1.025 single
1 3 part18 0.04387 1.232 dusts
2 3 part2 1.132 0.8927 single
...clip...
46 3 part46 13.71 1.001 nuclei
47 3 part3 0.7439 0.9038 single
48 3 part34 0.4349 0.9956 single
the first column is the index
The append
function has an optional argument ignore_index
which you should use here to join the records together, since the index isn’t meaningful for your application.
You could first identify the index-duplicated (not value) row using groupby
method, and then do a sum/mean operation on all the rows with the duplicate index.
data1 = data1.groupby(data1.index).sum()
data2 = data2.groupby(data2.index).sum()
Try using pd.concat
bigdata = pd.concat([data1,data2])