My strip() function is not removing

Question

My intention is to have a whole lot of text and translate it into all lower case first. (Which it does) Then, remove the punctuation marks in the text.(Which it does not) Finally, print out the frequency of the word used. (It prints out test. and test as two different things.)

from collections import Counter



text = """
Test. test test. Test Test test. 
""".lower().strip(".")



words = text.split()
counts = Counter(words)
print(counts)

Any help would be appreciated.

Asked By: user7884512

||

Source

Answer 1

You need .replace('.', '') in place of strip

Answered By: zengr

Answer 2

You can split the text in a list and then strip the punctuation, or use roganjosh’s suggestion, which is to use .replace(‘.’, ”):

Way 1:

text = "Test. test test. Test Test test."
word = text.split()
the_list = [i.strip('.') for i in word]
counts = Counter(the_list)

Note that for .strip(), only punctuation at the end of a string will be removed, not in the middle.

Way 2:

new_text = text.replace('.', '')
counts = Counter(new_text)

Answered By: Ajax1234

Answer 3

If all you want is to extract words (for counting or any other reason), use regular expressions re.findall (or re.finditer if the texts are big and you don’t want to collect all the matches in memory):

import re

text = """
Test. test test. Test Test test. 
"""

# Counter({'test': 6})
counts = Counter(re.findall("w+", text))

Note this may be trickier with the non-ASCII texts (and doesn’t account for, e.g. words-with-dashes).

Answered By: drdaeman

Answer 4

To replace all characters you need to work with it word by word.

strip is an amazing function and you can use it to remove multiple characters all at one, but the problem with strip() is that it will stop after the first whitespace it find.

word = text.split()
text_list = [i.strip('.') for i in word]
count = len(text_list)
text = " ".join(text_list)

This way you work with each word.

Hope this helps

Answered By: yatabani

My strip() function is not removing

Question:

Answers: