Program for considering a word such as colour's as 2 words

Question:

I would like my code to consider [colour’s] as 2 words [colour] & [s] and take the count for it in python. I tried doing in this way but causes many errors

import sys
from pathlib import Path
import re

text_file = Path(sys.argv[1])

if text_file.exists() and text_file.is_file():
    read = text_file.read_text()
    length = len(read.split())
    addi = len(re.search(r'*.[["a-zA-Z"]]', text_file))
    length += addi
    print(f'{text_file} has', length, 'words')
else:
    print(f'File not found: {text_file}')
Asked By: Vyshakh

||

Answers:

Perhaps you could use the function .split() and re.findall for your purpose.. With the latter function, you could count the number of words (with [color’s] as 2 words) instead of looking for the individual words in group. For example

import re

read = "today is Color's birthday"
print(read.split())
print(len(read.split()))

read2 = re.findall(r'[a-zA-Z]+', read)
print(read2)
print(len(read2))

Output:

['today', 'is', "Color's", 'birthday']
4
['today', 'is', 'Color', 's', 'birthday']
5
Answered By: perpetual student

You can replace the apostrophe with some arbitrary whitespace character then count the length of the list created by string.split()

However, you may not want to replace all apostrophes. You almost certainly only want to replace apostrophes that are bounded by letters.

Therefore with a combination of re and string.split() you could do this:

import re
import sys

def word_count(filename):
    with open(filename) as infile:
        text = infile.read()
        data = re.sub("(?<=[A-Za-z])[']+(?=[A-Za-z])", ' ', text)
        return len(data.split())

if len(sys.argv) > 1:
    print(word_count(sys.argv[1]))
Answered By: Stuart
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.