Why is the leading character on this filename unexpectedly stripped by strip()?
Question:
I am working my way through Automate the Boring Stuff With Python. I just got this project working (taking data from sheets with Openpyxl and putting it in CSV files). The only unexpected behavior, is that my filenames do not come out exactly how I would expect. The spreadsheet filenames are provided as example files by the author. They take the form “spreadsheet-A.xlsx”. Instead of returning my expected filename, strip() takes off the leading “s”.
This is not a big deal, but I’m curious about why it’s happening and I haven’t figured it out.
Expected behavior: spreadsheet-A.xlsx
becomes spreadsheet-A.csv
Actual behavior: spreadsheet-A.xlsx
becomes preadsheet-A.csv
My guess is that the problem happens at lines 20 and 21, and that there’s something about strip that I don’t know.
#!/usr/bin/python
# excelToCSV.py - Converts all excel files in a directory to CSV, one file
# per sheet
import openpyxl
from openpyxl.utils import get_column_letter
import csv
import os
for excelFile in os.listdir('.'):
#Skip non-xlsx files, load the workbook object.
if excelFile.endswith('.xlsx'):
wbA = openpyxl.load_workbook(excelFile)
#Loop through each sheet in the workbook
for sheet in wbA.worksheets:
sheetName = sheet.title
sheetA = wbA.get_sheet_by_name(sheetName)
# Create the CSV filename from the excel filename and sheet title
excelFileStripped = excelFile.strip('.xlsx')
csvFilename = excelFileStripped + '.csv'
# Create the csv.writer object for this csv file
csvFile = open(csvFilename, 'w', newline='')
csvWriter = csv.writer(csvFile)
# Loop through every row in the sheet
maxRow = sheetA.max_row
maxCol = sheetA.max_column
for rowNum in range(1, maxRow + 1):
rowData = []
# Loop through each cell in the row
for colNum in range(1, maxCol + 1):
# Append each cell's data to rowData
x = get_column_letter(colNum)
coordinate = str(x) + str(rowNum)
cellA = sheetA[coordinate]
#cellValue = cellA.value
rowData.append(cellA.value)
# Write the rowData list to the csv file
csvWriter.writerow(rowData)
csvFile.close()
else:
continue
Answers:
As noted in the comments, strip
actually takes an iterable of individual characters (strings of length 1) and removes all instances of any of them from the start and end of the string (docs here).
While you could use rstrip
, I recommend using the functions in os
specifically meant for handling paths, for example:
import os
print(os.path.splitext('my_file.xlsx'))
Output:
('my_file', '.xlsx')
Applying this to your code, you might get this:
for filename in os.listdir(os.curdir):
name, extension = os.path.splitext(filename)
if extension == '.xlsx':
# Excel file: do stuff...
Another way to get rid of the ending could be:
ending = ".csv"
txt = "Hello, my name is H.xlsx"
sep = '.'
rest = txt.split(sep, 1)[0]
new = rest+ending
This removes the end part and adds on the ending variable.
I am working my way through Automate the Boring Stuff With Python. I just got this project working (taking data from sheets with Openpyxl and putting it in CSV files). The only unexpected behavior, is that my filenames do not come out exactly how I would expect. The spreadsheet filenames are provided as example files by the author. They take the form “spreadsheet-A.xlsx”. Instead of returning my expected filename, strip() takes off the leading “s”.
This is not a big deal, but I’m curious about why it’s happening and I haven’t figured it out.
Expected behavior: spreadsheet-A.xlsx
becomes spreadsheet-A.csv
Actual behavior: spreadsheet-A.xlsx
becomes preadsheet-A.csv
My guess is that the problem happens at lines 20 and 21, and that there’s something about strip that I don’t know.
#!/usr/bin/python
# excelToCSV.py - Converts all excel files in a directory to CSV, one file
# per sheet
import openpyxl
from openpyxl.utils import get_column_letter
import csv
import os
for excelFile in os.listdir('.'):
#Skip non-xlsx files, load the workbook object.
if excelFile.endswith('.xlsx'):
wbA = openpyxl.load_workbook(excelFile)
#Loop through each sheet in the workbook
for sheet in wbA.worksheets:
sheetName = sheet.title
sheetA = wbA.get_sheet_by_name(sheetName)
# Create the CSV filename from the excel filename and sheet title
excelFileStripped = excelFile.strip('.xlsx')
csvFilename = excelFileStripped + '.csv'
# Create the csv.writer object for this csv file
csvFile = open(csvFilename, 'w', newline='')
csvWriter = csv.writer(csvFile)
# Loop through every row in the sheet
maxRow = sheetA.max_row
maxCol = sheetA.max_column
for rowNum in range(1, maxRow + 1):
rowData = []
# Loop through each cell in the row
for colNum in range(1, maxCol + 1):
# Append each cell's data to rowData
x = get_column_letter(colNum)
coordinate = str(x) + str(rowNum)
cellA = sheetA[coordinate]
#cellValue = cellA.value
rowData.append(cellA.value)
# Write the rowData list to the csv file
csvWriter.writerow(rowData)
csvFile.close()
else:
continue
As noted in the comments, strip
actually takes an iterable of individual characters (strings of length 1) and removes all instances of any of them from the start and end of the string (docs here).
While you could use rstrip
, I recommend using the functions in os
specifically meant for handling paths, for example:
import os
print(os.path.splitext('my_file.xlsx'))
Output:
('my_file', '.xlsx')
Applying this to your code, you might get this:
for filename in os.listdir(os.curdir):
name, extension = os.path.splitext(filename)
if extension == '.xlsx':
# Excel file: do stuff...
Another way to get rid of the ending could be:
ending = ".csv"
txt = "Hello, my name is H.xlsx"
sep = '.'
rest = txt.split(sep, 1)[0]
new = rest+ending
This removes the end part and adds on the ending variable.