Pandas run Converting program to contents of a folder

Question:

I’ve created a program so that I can split the column into 2 columns by specifying the location of the file but I would like to do this for an entire folder so that the process can be a bit faster.

Here is the code for my program:

import pandas as pd
import os

// reading csv file from url
filepath = "C:/Users/username/folder1/folder2/folder3/filename.csv"
file_encoding = 'cp1252' 
data = pd.read_csv(filepath , header=None , encoding=file_encoding)

data.columns =['Last Name', 'First Name']

// new data frame with split value columns
new = data["Name"].str.split(",", n = 1, expand = True)

// making separate first name column from new data frame
data["Last Name"]= new[0]

// making separate last name column from new data frame
data["First Name"]= new[1]

// new data frame with split value columns 
new = data["SName"].str.split(",", n = 1, expand = True)

// making separate first name column from new data frame
data["S Last Name"]= new[0]

// making separate last name column from new data frame
data["S First Name"]= new[1]

// Saving File name as its path
filename = os.path.basename(filepath) + ".xlsx"
data.to_excel(filename, index=False)

data

I created an automated text to column program so that I don’t have to manually do for hundreds of data entries. But I have done it for a specified file. I would like to take this code a bit further by allowing it to take a folder of .csv and do the conversions.

Thanks for helping

Asked By: Kenny

||

Answers:

You can place your code in a function:

import pandas as pd
import os

def splitter(folder_path):

    for file in os.listdir(folder_path):
        if file.endswith(".csv"):
        
            # reading csv file from url
            filepath = os.path.join(folder_path, file)
            file_encoding = 'cp1252' 
            data = pd.read_csv(filepath , header=None , encoding=file_encoding)

            data.columns =['Last Name', 'First Name']

            # new data frame with split value columns
            new = data["Name"].str.split(",", n = 1, expand = True)

            # making separate first name column from new data frame
            data["Last Name"]= new[0]

            # making separate last name column from new data frame
            data["First Name"]= new[1]

            # new data frame with split value columns 
            new = data["SName"].str.split(",", n = 1, expand = True)

            # making separate first name column from new data frame
            data["S Last Name"]= new[0]

            # making separate last name column from new data frame
            data["S First Name"]= new[1]

            # Saving File name as its path
            filename = os.path.basename(filepath) + ".xlsx"
            data.to_excel(filename, index=False)
            
            print(file, " is done.")


# Example
fpath = "C:/Users/username/folder1/folder2/folder3"
splitter(fpath)
Answered By: a11

Use glob library.

from glob import glob

files = glob('C:/Users/username/folder1/folder2/folder3/*')

for file in files:
    # ... your function/code
Answered By: Feline