Python Pandas add Filename Column CSV
Question:
My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step further – how do I add a column that appends the filename of the CSV that was used?
import pandas as pd
import glob
globbed_files = glob.glob("*.csv") #creates a list of all csv files
data = [] # pd.concat takes a list of dataframes as an agrument
for csv in globbed_files:
frame = pd.read_csv(csv)
data.append(frame)
bigframe = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes
bigframe.to_csv("Pandas_output2.csv")
Answers:
This should work:
import os
for csv in globbed_files:
frame = pd.read_csv(csv)
frame['filename'] = os.path.basename(csv)
data.append(frame)
frame['filename']
creates a new column named filename
and os.path.basename()
turns a path like /a/d/c.txt
into the filename c.txt
.
Mike’s answer above works perfectly. In case any googlers run into the following error:
>>> TypeError: cannot concatenate object of type "<type 'str'>";
only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
It’s possibly because the separator is not correct. I was using a custom csv file so the separator was ^
. Becuase of that I needed to include the separator in the pd.read_csv
call.
import os
for csv in globbed_files:
frame = pd.read_csv(csv, sep='^')
frame['filename'] = os.path.basename(csv)
data.append(frame)
files variable contains all list of csv files in your present directory. Such as
['FileName1.csv',FileName2.csv']
. You also need to remove ".csv"
. You can use .split()
function. Below is simple logic. This will work for you.
files = glob.glob("*.csv")
for i in files:
df=pd.read_csv(i)
df['New Column name'] = i.split(".")[0]
df.to_csv(i.split(".")[0]+".csv")
My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step further – how do I add a column that appends the filename of the CSV that was used?
import pandas as pd
import glob
globbed_files = glob.glob("*.csv") #creates a list of all csv files
data = [] # pd.concat takes a list of dataframes as an agrument
for csv in globbed_files:
frame = pd.read_csv(csv)
data.append(frame)
bigframe = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes
bigframe.to_csv("Pandas_output2.csv")
This should work:
import os
for csv in globbed_files:
frame = pd.read_csv(csv)
frame['filename'] = os.path.basename(csv)
data.append(frame)
frame['filename']
creates a new column named filename
and os.path.basename()
turns a path like /a/d/c.txt
into the filename c.txt
.
Mike’s answer above works perfectly. In case any googlers run into the following error:
>>> TypeError: cannot concatenate object of type "<type 'str'>";
only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
It’s possibly because the separator is not correct. I was using a custom csv file so the separator was ^
. Becuase of that I needed to include the separator in the pd.read_csv
call.
import os
for csv in globbed_files:
frame = pd.read_csv(csv, sep='^')
frame['filename'] = os.path.basename(csv)
data.append(frame)
files variable contains all list of csv files in your present directory. Such as
['FileName1.csv',FileName2.csv']
. You also need to remove ".csv"
. You can use .split()
function. Below is simple logic. This will work for you.
files = glob.glob("*.csv")
for i in files:
df=pd.read_csv(i)
df['New Column name'] = i.split(".")[0]
df.to_csv(i.split(".")[0]+".csv")