How to read in .dta files from a csv list and then drop a specific column from all the files in the csv list using python

Question

I have a CSV containing the names of .dta files and I would like to drop a column from all the .dta files using python.

For eg. R43567.dta, B12345.dta, P34567.dta and so on these files contain a column named ‘ID’ and I would like to drop that column from all the files.

But I don’t know how to read all the files from the CSV and drop a column from them and save them back in another folder as a function or a loop.

I have the following code :

import pandas as pd
#read in the .dta file
dtafile = (r"C:DocumentR235401.dta")
df = pd.read_stata(dtafile) 
#list the column names 
list(df) 
#drop column 'id' 
df = df.drop('id', axis = 1)
list(df) 
#save the file back to the folder as .dta 
df = df.to_stata(r"C:DocumentR235401.dta")

Please can someone advice on how to carry out the above but for multiple stata files.

Many thanks

Asked By: Rhea

||

Source

Answer 1

Assuming that your *.dta files are all in the same folder you can do something like this:

import pandas as pd
import os

folder_path = r'PATH_TO_FOLDER_CONTAINING_DTA_FILES'
counter = 0
for filename in os.listdir(folder_path):
    # check for file ending and optional other criteria
    if not filename.endswith('.dta'):
        continue
    try:
        #read in the .dta file
        dtafile = folder_path + f'{filename}'
        df = pd.read_stata(dtafile) 

        #list the column names 
        list(df) 

        #drop column 'id' 
        df = df.drop('id', axis = 1)
        list(df) 

        #save the file back to the folder as .dta 
        df = df.to_stata(dtafile)
        counter += 1
    except Exception as e:
        print(f'{filename} failed with {e}, {type(e)}')

print(f'processed {counter} files')

Answered By: Doluk

How to read in .dta files from a csv list and then drop a specific column from all the files in the csv list using python

Question:

Answers: