Why the column order is changing while appending pandas dataframes?

Question:

I want to append (merge) all the csv files in a folder using Python pandas.

For example: Say folder has two csv files test1.csv and test2.csv as follows:

A_Id    P_Id    CN1         CN2         CN3
AAA     111     702         709         740
BBB     222     1727        1734        1778

and

A_Id    P_Id    CN1         CN2         CN3
CCC     333     710        750          750
DDD     444     180        734          778

So the python script I wrote was as follows:

#!/usr/bin/python
import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df)

all_data.to_csv('testfolder/combined.csv')

Though the combined.csv seems to have all the appended rows, it looks as follows:

      CN1       CN2         CN3    A_Id    P_Id
  0   710      750         750     CCC     333
  1   180       734         778     DDD     444     
  0   702       709         740     AAA     111
  1  1727       1734        1778    BBB     222

Where as it should look like this:

A_ID   P_Id   CN1    CN2    CN2
AAA    111    702    709    740
BBB    222    1727   1734   1778
CCC    333    110    356    123
DDD    444    220    256    223
  • Why are the first two columns moved to the end?
  • Why is it appending in the first line rather than at the last line?

What am I missing? And how can I get get of 0s and 1s in the first column?

P.S: Since these are large csv files, I thought of using pandas.

Asked By: kingmakerking

||

Answers:

I tweaked the code as below. Comments in-line.

#!/usr/bin/python
import pandas as pd
import glob

# Grab all the csv files in the folder to a list.
fileList = glob.glob('input_folder/*.csv')

#Initialize an empty dataframe to grab the csv content.
all_data = pd.DataFrame()

#Initialize an empty list to grab the dataframes.
dfList= []

for files in  fileList:
    df =  pd.read_csv(files, index_col = None, header= False)
    dfList.append(df)

#The frames will be in reverse order i.e last read file's content in the begining. So reverse it again
Reversed_dfList =  dfList[::-1]
CombinedFrame =  pd.concat(Reversed_dfList)

# The "Combined.csv" file will have combination of all the files.
CombinedFrame.to_csv('output_folder/Combined.csv', index=False)
Answered By: kingmakerking

Try this …..

all_data = all_data.append(df)[df.columns.tolist()]
Answered By: user6745154

I had the same issue and it was painfull. I managed to solve it by reorganising columns based on source dataframe after it was appended to final dataframe. It would look like this:

#!/usr/bin/python
import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
    df = pd.read_csv(f)
    all_data = all_data.append(df)
    all_data = all_data[df.columns]

all_data.to_csv('testfolder/combined.csv') 

Since your issue was from almost two years ago, I’m posting solution which worked for me for enyone else who will also face similar issue.

Answered By: Uzzy

You can use reindex to change to the original order:

all_data = all_data.append(df)
all_data = all_data.reindex(df.columns, axis=1)

I saw this here (more details in the link): https://github.com/pandas-dev/pandas/issues/4588#issuecomment-44421883

Answered By: bubbassauro

Starting from version 0.23.0, you can prevent the append() method to sort the final appended DataFrame. In your case:

all_data = all_data.append(df, sort=False)
Answered By: Shayan Amani

all_data = all_data.append(df, ignore_index=True)

Put ignore_index=True so the order will not change.
By default is False so you need to change that.

Answered By: Muhammad Huzaifa

Please try:

df = df.append(pd.DataFrame(Added_rows).T)

Answered By: Haochen Qin

if you create the dataframe and you set the columns the way you want then append will comply to this

df = pd.DataFrame(columns = ['d','b','a','c'])
df = df.append({'d': name,
                        'b': h,
                        'a': w,
                        'c': normal_size                        
                        }
                       , ignore_index=True,sort=False)
Answered By: John
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.