Pandas version of rbind

Question:

In R, you can combine two dataframes by sticking the columns of one onto the bottom of the columns of the other using rbind. In pandas, how do you accomplish the same thing? It seems bizarrely difficult.

Using append results in a horrible mess including NaNs and things for reasons I don’t understand. I’m just trying to “rbind” two identical frames that look like this:

EDIT: I was creating the DataFrames in a stupid way, which was causing issues. Append=rbind to all intents and purposes. See answer below.

        0         1       2        3          4          5        6                    7
0   ADN.L  20130220   437.4   442.37   436.5000   441.9000  2775364  2013-02-20 18:47:42
1   ADM.L  20130220  1279.0  1300.00  1272.0000  1285.0000   967730  2013-02-20 18:47:42
2   AGK.L  20130220  1717.0  1749.00  1709.0000  1739.0000   834534  2013-02-20 18:47:43
3  AMEC.L  20130220  1030.0  1040.00  1024.0000  1035.0000  1972517  2013-02-20 18:47:43
4   AAL.L  20130220  1998.0  2014.50  1942.4999  1951.0000  3666033  2013-02-20 18:47:44
5  ANTO.L  20130220  1093.0  1097.00  1064.7899  1068.0000  2183931  2013-02-20 18:47:44
6   ARM.L  20130220   941.5   965.10   939.4250   951.5001  2994652  2013-02-20 18:47:45

But I’m getting something horrible a la this:

        0         1        2        3          4         5        6                    7       0         1       2        3          4          5        6                    7
0     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADN.L  20130220   437.4   442.37   436.5000   441.9000  2775364  2013-02-20 18:47:42
1     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADM.L  20130220  1279.0  1300.00  1272.0000  1285.0000   967730  2013-02-20 18:47:42
2     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   AGK.L  20130220  1717.0  1749.00  1709.0000  1739.0000   834534  2013-02-20 18:47:43
3     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN  AMEC.L  20130220  1030.0  1040.00  1024.0000  1035.0000  1972517  2013-02-20 18:47:43
4     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   AAL.L  20130220  1998.0  2014.50  1942.4999  1951.0000  3666033  2013-02-20 18:47:44
5     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN  ANTO.L  20130220  1093.0  1097.00  1064.7899  1068.0000  2183931  2013-02-20 18:47:44
6     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ARM.L  20130220   941.5   965.10   939.4250   951.5001  2994652  2013-02-20 18:47:45
0     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADN.L  20130220   437.4   442.37   436.5000   441.9000  2775364  2013-02-20 18:47:42
1     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   ADM.L  20130220  1279.0  1300.00  1272.0000  1285.0000   967730  2013-02-20 18:47:42
2     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN   AGK.L  20130220  1717.0  1749.00  1709.0000  1739.0000   834534  2013-02-20 18:47:43
3     NaN       NaN      NaN      NaN        NaN       NaN      NaN                  NaN  

And I don’t understand why. I’m starting to miss R 🙁

Asked By: N. McA.

||

Answers:

Ah, this is to do with how I created the DataFrame, not with how I was combining them. The long and the short of it is, if you are creating a frame using a loop and a statement that looks like this:

Frame = Frame.append(pandas.DataFrame(data = SomeNewLineOfData))

You must ignore the index

Frame = Frame.append(pandas.DataFrame(data = SomeNewLineOfData), ignore_index=True)

Or you will have issues later when combining data.

Answered By: N. McA.

[EDIT] append() is deprecated since 1.4.0 – use concat() instead – https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html

This worked for me:

import numpy as np
import pandas as pd

dates = np.asarray(pd.date_range('1/1/2000', periods=8))
df1 = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df2 = df1.copy()
df = df1.append(df2)

Yields:

                   A         B         C         D
2000-01-01 -0.327208  0.552500  0.862529  0.493109
2000-01-02  1.039844 -2.141089 -0.781609  1.307600
2000-01-03 -0.462831  0.066505 -1.698346  1.123174
2000-01-04 -0.321971 -0.544599 -0.486099 -0.283791
2000-01-05  0.693749  0.544329 -1.606851  0.527733
2000-01-06 -2.461177 -0.339378 -0.236275  0.155569
2000-01-07 -0.597156  0.904511  0.369865  0.862504
2000-01-08 -0.958300 -0.583621 -2.068273  0.539434
2000-01-01 -0.327208  0.552500  0.862529  0.493109
2000-01-02  1.039844 -2.141089 -0.781609  1.307600
2000-01-03 -0.462831  0.066505 -1.698346  1.123174
2000-01-04 -0.321971 -0.544599 -0.486099 -0.283791
2000-01-05  0.693749  0.544329 -1.606851  0.527733
2000-01-06 -2.461177 -0.339378 -0.236275  0.155569
2000-01-07 -0.597156  0.904511  0.369865  0.862504
2000-01-08 -0.958300 -0.583621 -2.068273  0.539434

If you don’t already use the latest version of pandas I highly recommend upgrading. It is now possible to operate with DataFrames which contain duplicate indices.

Answered By: abudis
import pandas as pd 
import numpy as np

If you have a DataFrame like this:

array = np.random.randint( 0,10, size = (2,4) )
df = pd.DataFrame(array, columns = ['A','B', 'C', 'D'],  
                           index = ['10aa', '20bb'] )  ### some crazy indexes
df

      A  B  C  D
10aa  4  2  4  6
20bb  5  1  0  2

And you want add some NEW ROW which is a list (or another iterable object):

List = [i**3 for i in range(df.shape[1]) ]
List
[0, 1, 8, 27]

You should transform list to dictionary with keys equals columns in DataFrame with zip() function:

Dict = dict(  zip(df.columns, List)  )
Dict
{'A': 0, 'B': 1, 'C': 8, 'D': 27}

Than you can use append() method to add new dictionary:

df = df.append(Dict, ignore_index=True)
df
    A   B   C   D
0   7   5   5   4
1   5   8   4   1
2   0   1   8   27

N.B. the indexes are dropped.

And yeah, it’s not as simple as cbind() in R 🙁

Answered By: Bem Ostap

pd.concat will serve the purpose of rbind in R.

import pandas as pd
df1 = pd.DataFrame({'col1': [1,2], 'col2':[3,4]})
df2 = pd.DataFrame({'col1': [5,6], 'col2':[7,8]})
print(df1)
print(df2)
print(pd.concat([df1, df2]))

The outcome will looks like:

   col1  col2
0     1     3
1     2     4
   col1  col2
0     5     7
1     6     8
   col1  col2
0     1     3
1     2     4
0     5     7
1     6     8

If you read the documentation careful enough, it will also explain other operations like cbind, ..etc.

Answered By: B.Mr.W.

dplyr‘s bind_rows does the same thing.

In python, you can do it the same way:

>>> from datar.all import bind_rows, head, tail
>>> from datar.datasets import iris
>>> 
>>> iris >> head(3) >> bind_rows(iris >> tail(3))
   Sepal_Length  Sepal_Width  Petal_Length  Petal_Width    Species
      <float64>    <float64>     <float64>    <float64>   <object>
0           5.1          3.5           1.4          0.2     setosa
1           4.9          3.0           1.4          0.2     setosa
2           4.7          3.2           1.3          0.2     setosa
3           6.5          3.0           5.2          2.0  virginica
4           6.2          3.4           5.4          2.3  virginica
5           5.9          3.0           5.1          1.8  virginica

I am the author of the datar package. Feel free to submit issues if you have any questions.

Answered By: Panwen Wang

Yes, rbind() (row bind dataframes) and cbind() (column bind dataframes) in R are very simple and intuitive.

You can use the "concat()" function from the pandas library for both of them to achieve the same thing. The rbind(df1,df2) equivalent in pandas will be the following:

pd.concat([df1, df2], ignore_index = True)

However, I have written rbind() and cbind() functions below using pandas for ease of use.


    def rbind(df1, df2):
        import pandas as pd
        return pd.concat([df1, df2], ignore_index = True)

    def cbind(df1, df2):
        import pandas as pd
        # Note this does not keep the original indexes of the df's and resets them to 0,1,...
        return pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis = 1)

If you copy, paste, and run the above functions you can use these functions in python the same as you would use them in R. Also, they have the same assumptions as their R counterparts such as for rbind(df1, df2): df1 and df2 need to have the same column names.

Below is an example of the rbind() function:

import pandas as pd

dict1 = {'Name': ['Ali', 'Craig', 'Shaz', 'Maheen'], 'Age': [36, 38, 33, 34]} 
dict2 = {'Name': ['Fahad', 'Tyler', 'Thai-Son', 'Shazmeen', 'Uruj', 'Tatyana'], 'Age': [42, 27, 29, 60, 42, 31]}

data1 = pd.DataFrame(dict1)
data2 = pd.DataFrame(dict2) 

# We now row-bind the two dataframes and save it as df_final.

df_final = rbind(data1, data2)

print(df_final)

Here is an open public GitHub repo file I created for writing and consolidating python equivalent R functions in one central place:
https://github.com/CubeStatistica/Learning-Data-Science-Properly-for-Work-and-Production-Using-Python/blob/main/Writing-R-Functions-in-Python.ipynb

Feel free to contribute.

Happy coding!

Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.