Turning text file into a string (Python 3)

Question:

I want to convert a text file into a string and I was given this function to start with, which was written in Python 2:

def parseOutText(f):
    f.seek(0)  
    all_text = f.read()

    content = all_text.split("X-FileName:")
    words = ""
    if len(content) > 1:
        text_string = content[1].translate(string.maketrans("", ""), string.punctuation)

        words = text_string

        ### split the text string into individual words, stem each word,
        ### and append the stemmed word to words (make sure there's a single
        ### space between each stemmed word)

    return words

As you can see I have to add some code to this function, but it does not compile (compiler gives error, saying ‘string’ does not have function ‘maketrans’). I am sure this code can easily be translated to Python 3 but I dont really understand what it does until the comment line. Does it simply omit punctuation and convert text to string?

Asked By: Ach113

||

Answers:

So I found this piece of code and it works like a charm:

exclude = set(string.punctuation)
string = ''.join(ch for ch in string if ch not in exclude)
Answered By: Ach113

Python 3.x maketrans and translate have all the basic functionality of their Python 2 predecessors, and moreā€”but they have a different API. So, you have to understand what they’re doing to use them.

translate in 2.x took a very simple table, make by string.maketrans, plus a separate deletechars list.

In 3.x, the table is more complicated (in large part because it’s now translating Unicode characters, not bytes, but it also has other new features). The table is made by a static method str.maketrans instead of a function string.maketrans. And the table includes the deletions list, so you don’t need a separate argument to translate.

From the docs:

static str.maketrans(x[, y[, z]])

This static method returns a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters (strings of length 1) to Unicode ordinals, strings (of arbitrary lengths) or None. Character keys will then be converted to ordinals.

If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.


So, to make a table that deletes all punctuation and does nothing else in 3.x, you do this:

table = str.maketrans('', '', string.punctuation)

And to apply it:

translated = s.translate(table)

Meanwhile, since you’re dealing with Unicode, are you sure string.punctuation is what you want? As the docs say, this is:

String of ASCII characters which are considered punctuation characters in the C locale.

So, for example, curly quotes, punctuation used in languages other than English, etc. will not be removed.

If that’s an issue, you’d have to do something like this:

translated = ''.join(ch for ch in s if unicodedata.category(ch)[0] != 'P')
Answered By: abarnert

change this line

text_string = content[1].translate(string.maketrans("", ""), string.punctuation)'

to this

text_string = content[1].translate((str.maketrans("", ""), string.punctuation)) '
Answered By: Sohel Khan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.