pandas: how to merge columns irrespective of index

Question:

I have two dataframes with meaningless index’s, but carefully curated order and I want to merge them while preserving that order. So, for example:

>>> df1
   First
a      1
b      3

and

>>> df2 
c       2
d       4

After merging, what I want to obtain is this:

>>> Desired_output
                    First  Second
AnythingAtAll           1       2     # <--- Row Names are meaningless.
SeriouslyIDontCare      3       4     # <--- But the ORDER of the rows is critical and must be preserved.

The fact that I’ve got row-indices "a/b", and "c/d" is irrelevent, but what is crucial is the order in which the rows appear. Every version of "join" I’ve seen requires me to manually reset indices, which seems really awkward, and I don’t trust that it won’t screw up the ordering. I thought concat would work, but I get this:

>>> pd.concat( [df1, df2] , axis = 1, ignore_index= True )
     0    1
a  1.0  NaN
b  3.0  NaN
c  NaN  2.0
d  NaN  4.0
# ^ obviously not what I want.

Even when I explicitly declare ignore_index.

How do I "overrule" the indexing and force the columns to be merged with the rows kept in the exact order that I supply them?


Edit:
Note that if I assign another column, the results are all "NaN".

>>> df1["second"]=df2["Second"]
>>> df1
   First  second
a      1     NaN
b      3     NaN

This was screwing me up but thanks to the suggestion from jsmart and topsail, you can dereference the indices by directly accessing the values in the column:

df1["second"]=df2["Second"].values
>>> df1
   First  second
a      1       2
b      3       4

^ Solution

Asked By: Jabber1

||

Answers:

ignore_index means whether to keep the output dataframe index from original along axis. If it is True, it means don’t use original index but start from 0 to n just like what the column header 0, 1 shown in your result.

You can try

out = pd.concat( [df1.reset_index(drop=True), df2.reset_index(drop=True)] , axis = 1)
print(out)

   First  Second
0      1       2
1      3       4
Answered By: Ynjxsjmh

This should also work I think:

df1["second"] = df2["second"].values

It would keep the index from the first dataframe, but since you have values in there such as "AnyThingAtAll" and "SeriouslyIdontCare" I guess any index values whatsoever are acceptable.

Basically, we are just adding a the values from your series as a new column to the first dataframe.

Here’s a test example similar to your described problem:

# -----------
# sample data
# -----------
df1 = pd.DataFrame(
{
    'x': ['a','b'],
    'First': [1,3],
})
df1.set_index("x", drop=True, inplace=True)
df2 = pd.DataFrame(
{
    'x': ['c','d'],
    'Second': [2, 4],
})
df2.set_index("x", drop=True, inplace=True)


# ---------------------------------------------
# Add series as a new column to first dataframe
# ---------------------------------------------
df1["Second"] = df2["Second"].values

Result is:

First Second
a 1 2
b 3 4
Answered By: topsail

The goal is to combine data based on position (not by Index). Here is one way to do it:

import pandas as pd

# create data frames df1 and df2
df1 = pd.DataFrame(data = {'First': [1, 3]}, index=['a', 'b'])
df2 = pd.DataFrame(data = {'Second': [2, 4]}, index = ['c', 'd'])

# add a column to df1 -- add by position, not by Index
df1['Second'] = df2['Second'].values

print(df1)
   First  Second
a      1       2
b      3       4

And you could create a completely new data frame like this:

data = {'1st': df1['First'].values, '2nd': df1['Second'].values}
print(pd.DataFrame(data))

   1st  2nd
0    1    2
1    3    4
Answered By: jsmart
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.