How to concatenate multiple csv file with one common column
Question:
I have multiple csv files with different columns and one common EMP_ID column. I want to merge all the files and get one file with all the columns and header from different files. I tried the following code and I get the EMP_ID column multiple times.
file 1 :
EMP_ID
EMP_name
EMP_Service
33
a
6
44
b
3
12
c
1
16
d
10
10
e
25
File 2:
EMP_ID
EMP_age
33
30
44
29
12
27
16
45
10
50
File 3:
EMP_ID
EMP_dept
33
xyz
44
abc
12
lmn
16
abc
10
lmn
required output: cat_vars.csv
EMP_ID
EMP_name
EMP_age
EMP_Service
EMP_dept
33
a
30
6
xyz
44
b
29
3
abc
12
r
27
1
lmn
16
d
45
10
abc
10
t
50
25
lmn
df1 = pd.read_csv(r'file_1.csv')
df2 = pd.read_csv(r'file_2.csv')
df3 = pd.read_csv(r'file_3.csv')
df = (pd.concat([df1, df2, df3], axis=1).to_csv('cat_vars.csv', index=False))
Answers:
Set your index as the column you want to join on~
dfs = [df1, df2, df3]
dfs = [x.set_index('EMP_ID') for x in dfs]
df = pd.concat(dfs, axis=1).reset_index()
print(df)
Output:
EMP_ID EMP_name EMP_Service EMP_age EMP_dept
0 33 a 6 30 xyz
1 44 b 3 29 abc
2 12 c 1 27 lmn
3 16 d 10 45 abc
4 10 e 25 50 lmn
I have multiple csv files with different columns and one common EMP_ID column. I want to merge all the files and get one file with all the columns and header from different files. I tried the following code and I get the EMP_ID column multiple times.
file 1 :
EMP_ID | EMP_name | EMP_Service |
---|---|---|
33 | a | 6 |
44 | b | 3 |
12 | c | 1 |
16 | d | 10 |
10 | e | 25 |
File 2:
EMP_ID | EMP_age |
---|---|
33 | 30 |
44 | 29 |
12 | 27 |
16 | 45 |
10 | 50 |
File 3:
EMP_ID | EMP_dept |
---|---|
33 | xyz |
44 | abc |
12 | lmn |
16 | abc |
10 | lmn |
required output: cat_vars.csv
EMP_ID | EMP_name | EMP_age | EMP_Service | EMP_dept |
---|---|---|---|---|
33 | a | 30 | 6 | xyz |
44 | b | 29 | 3 | abc |
12 | r | 27 | 1 | lmn |
16 | d | 45 | 10 | abc |
10 | t | 50 | 25 | lmn |
df1 = pd.read_csv(r'file_1.csv')
df2 = pd.read_csv(r'file_2.csv')
df3 = pd.read_csv(r'file_3.csv')
df = (pd.concat([df1, df2, df3], axis=1).to_csv('cat_vars.csv', index=False))
Set your index as the column you want to join on~
dfs = [df1, df2, df3]
dfs = [x.set_index('EMP_ID') for x in dfs]
df = pd.concat(dfs, axis=1).reset_index()
print(df)
Output:
EMP_ID EMP_name EMP_Service EMP_age EMP_dept
0 33 a 6 30 xyz
1 44 b 3 29 abc
2 12 c 1 27 lmn
3 16 d 10 45 abc
4 10 e 25 50 lmn