How to read in .dta files from a csv list and then drop a specific column from all the files in the csv list using python
Question:
I have a CSV containing the names of .dta files and I would like to drop a column from all the .dta files using python.
For eg. R43567.dta
, B12345.dta
, P34567.dta
and so on these files contain a column named ‘ID’ and I would like to drop that column from all the files.
But I don’t know how to read all the files from the CSV and drop a column from them and save them back in another folder as a function or a loop.
I have the following code :
import pandas as pd
#read in the .dta file
dtafile = (r"C:DocumentR235401.dta")
df = pd.read_stata(dtafile)
#list the column names
list(df)
#drop column 'id'
df = df.drop('id', axis = 1)
list(df)
#save the file back to the folder as .dta
df = df.to_stata(r"C:DocumentR235401.dta")
Please can someone advice on how to carry out the above but for multiple stata files.
Many thanks
Answers:
Assuming that your *.dta files are all in the same folder you can do something like this:
import pandas as pd
import os
folder_path = r'PATH_TO_FOLDER_CONTAINING_DTA_FILES'
counter = 0
for filename in os.listdir(folder_path):
# check for file ending and optional other criteria
if not filename.endswith('.dta'):
continue
try:
#read in the .dta file
dtafile = folder_path + f'{filename}'
df = pd.read_stata(dtafile)
#list the column names
list(df)
#drop column 'id'
df = df.drop('id', axis = 1)
list(df)
#save the file back to the folder as .dta
df = df.to_stata(dtafile)
counter += 1
except Exception as e:
print(f'{filename} failed with {e}, {type(e)}')
print(f'processed {counter} files')
I have a CSV containing the names of .dta files and I would like to drop a column from all the .dta files using python.
For eg. R43567.dta
, B12345.dta
, P34567.dta
and so on these files contain a column named ‘ID’ and I would like to drop that column from all the files.
But I don’t know how to read all the files from the CSV and drop a column from them and save them back in another folder as a function or a loop.
I have the following code :
import pandas as pd
#read in the .dta file
dtafile = (r"C:DocumentR235401.dta")
df = pd.read_stata(dtafile)
#list the column names
list(df)
#drop column 'id'
df = df.drop('id', axis = 1)
list(df)
#save the file back to the folder as .dta
df = df.to_stata(r"C:DocumentR235401.dta")
Please can someone advice on how to carry out the above but for multiple stata files.
Many thanks
Assuming that your *.dta files are all in the same folder you can do something like this:
import pandas as pd
import os
folder_path = r'PATH_TO_FOLDER_CONTAINING_DTA_FILES'
counter = 0
for filename in os.listdir(folder_path):
# check for file ending and optional other criteria
if not filename.endswith('.dta'):
continue
try:
#read in the .dta file
dtafile = folder_path + f'{filename}'
df = pd.read_stata(dtafile)
#list the column names
list(df)
#drop column 'id'
df = df.drop('id', axis = 1)
list(df)
#save the file back to the folder as .dta
df = df.to_stata(dtafile)
counter += 1
except Exception as e:
print(f'{filename} failed with {e}, {type(e)}')
print(f'processed {counter} files')