Pandas Concat vs append and join columns –> ("state", "state:", "State")

Question:

I join 437 tables and I get 3 columns for state as my coworkers feel like giving it a different name each day, ("state", "state:" and "State"), is there a way that joins those 3 columns to just 1 column called "state"?.

*also my code uses append, I just saw its deprecated, will it work the same using concat? any way to make it give the same results as append?.

I tried:

excl_merged.rename(columns={"state:": "state", "State": "state"})

but it doesn’t do anything.

The code I use:

# importing the required modules
import glob
import pandas as pd
 
# specifying the path to csv files
path = "X:/.../Admission_merge"
 
# csv files in the path
file_list = glob.glob(path + "/*.xlsx")
 
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the excel
# data into pandas dataframe.
excl_list = []
 
for file in file_list:
    excl_list.append(pd.read_excel(file)) #use .concat will it give the columns in the same order? 
 
# create a new dataframe to store the
# merged excel file.
excl_merged = pd.DataFrame()
 
for excl_file in excl_list:
     
    # appends the data into the excl_merged
    # dataframe.
    excl_merged = excl_merged.append(
      excl_file, ignore_index=True)


# exports the dataframe into excel file with
# specified name.
excl_merged.to_excel('X:/.../Admission_MERGED/total_admission_2021-2023.xlsx', index=False)
print("Merge finished")

Any suggestions how I can improve it? also is there a way to remove unnamed empty columns?.

Thanks a lot.

Asked By: rstudiohelpneeded

||

Answers:

You can use pd.concat:

excl_list = ['state1.xlsx', 'state2.xlsx', 'state3.xlsx']
state_map = {'state:': 'state', 'State': 'state'}

data = []
for excl_file in excl_list:
    df = pd.read_excel(excl_file).rename(columns=state_map)
    data.append(df)
excl_merged = pd.concat(data, ignore_index=True)
print(excl_merged)

# Output
  ID state
0  A     a
1  B     b
2  C     c
3  D     d
4  E     e
5  F     f
6  G     g
7  H     h
8  I     i

file1.xlsx:

  ID State
0  A     a
1  B     b
2  C     c

file2.xlsx:

  ID state
0  D     d
1  E     e
2  F     f

file3.xlsx:

  ID state:
0  G      g
1  H      h
2  I      i

If you have empty columns, you can use data.append(df.dropna(how='all', axis=1)) before appending to data list.

Answered By: Corralien