Why is the leading character on this filename unexpectedly stripped by strip()?

Question:

I am working my way through Automate the Boring Stuff With Python. I just got this project working (taking data from sheets with Openpyxl and putting it in CSV files). The only unexpected behavior, is that my filenames do not come out exactly how I would expect. The spreadsheet filenames are provided as example files by the author. They take the form “spreadsheet-A.xlsx”. Instead of returning my expected filename, strip() takes off the leading “s”.

This is not a big deal, but I’m curious about why it’s happening and I haven’t figured it out.

Expected behavior: spreadsheet-A.xlsx becomes spreadsheet-A.csv
Actual behavior: spreadsheet-A.xlsx becomes preadsheet-A.csv

My guess is that the problem happens at lines 20 and 21, and that there’s something about strip that I don’t know.

#!/usr/bin/python

# excelToCSV.py - Converts all excel files in a directory to CSV, one file
# per sheet

import openpyxl
from openpyxl.utils import get_column_letter
import csv
import os

for excelFile in os.listdir('.'):
    #Skip non-xlsx files, load the workbook object.
    if excelFile.endswith('.xlsx'):
        wbA = openpyxl.load_workbook(excelFile)
        #Loop through each sheet in the workbook
        for sheet in wbA.worksheets:    
            sheetName = sheet.title
            sheetA = wbA.get_sheet_by_name(sheetName)
            # Create the CSV filename from the excel filename and sheet title
            excelFileStripped = excelFile.strip('.xlsx')
            csvFilename = excelFileStripped + '.csv'
            # Create the csv.writer object for this csv file
            csvFile = open(csvFilename, 'w', newline='')
            csvWriter = csv.writer(csvFile)
            # Loop through every row in the sheet
            maxRow = sheetA.max_row
            maxCol = sheetA.max_column
            for rowNum in range(1, maxRow + 1):
                rowData = []
                # Loop through each cell in the row
                for colNum in range(1, maxCol + 1):
                    # Append each cell's data to rowData
                    x = get_column_letter(colNum)
                    coordinate = str(x) + str(rowNum)
                    cellA = sheetA[coordinate]
                    #cellValue = cellA.value 
                    rowData.append(cellA.value)
                    # Write the rowData list to the csv file
                csvWriter.writerow(rowData)

            csvFile.close()

    else:
        continue
Asked By: partial_mask

||

Answers:

As noted in the comments, strip actually takes an iterable of individual characters (strings of length 1) and removes all instances of any of them from the start and end of the string (docs here).

While you could use rstrip, I recommend using the functions in os specifically meant for handling paths, for example:

import os

print(os.path.splitext('my_file.xlsx'))

Output:

('my_file', '.xlsx')

Applying this to your code, you might get this:

for filename in os.listdir(os.curdir):
    name, extension = os.path.splitext(filename)
    if extension == '.xlsx':
        # Excel file: do stuff...
Answered By: gmds

Another way to get rid of the ending could be:

ending = ".csv"
txt = "Hello, my name is H.xlsx"
sep = '.'
rest = txt.split(sep, 1)[0]
new = rest+ending

This removes the end part and adds on the ending variable.

Answered By: Tommy Lawrence
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.