Python. Merge multiple Excel files and add a column with the original file's name

Question:

I am hoping somebody will be able to tell me why I am doing wrong, please?

My code reads three Excel files: 0122.xls, 0222.xls and 0322.xls and merges them together in a data frame. My problem is that when I create a new column that should contain the original file’s name I am expecting to get 0122.xls, 0222.xls and 0322.xls but getting something different and I can’t pinpoint what I am doing wrong. USD 1.1 comes from file 0122, USD 1.2 from 0222.xls and file 0322 contains USD 1.3.

enter image description here

Any help would be appreciated! Here is my code. Thank you.

import pandas as pd
import numpy as np
import os

os.chdir('//.../Python/data')
path = '//.../Python/data'
files = os.listdir(path)
df = pd.DataFrame()

for f in files:
    data = pd.read_excel(f, skiprows=np.r_[0:7], header=0, usecols=['Unnamed: 2','Unnamed: 3'])
    df['col4'] = f
    df = df.append(data)  

df = df.rename(columns={'Unnamed: 2': 'Curr', 'Unnamed: 3': 'USD'})
df = df.drop(df[df.Curr != 'GBP'].index)

df
Asked By: Adrian

||

Answers:

In the for loop you should change

df['col4'] = f

to

data['col4'] = f

otherwise reasign all of col4 every iteration of the for loop, hence why it ends up with the name of the final file in every row.

Answered By: tiger
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.