InMemoryUploadedFile Django File Read in Bytes – str vs utf8

Question:

I’m looking to read the binary content of a Django type InMemoryUploadedFile. The file type/encoding is a variable, so I don’t want to assume UTF8, I simply want to read the binary content, then I want to encode with UTF8. Here’s what I’ve tried:

str(file.read()).replace('n', 'rn') 

This looks to work, except the string still has the binary ‘b’ character. To fix this, I tried:

file.read().decode('utf8').replace('n', 'rn')

This works well for reading .txt files. Any other file types fail to read properly, understandably. How can I read the binary content of type "InMemoryUploadedFile" without specifying an encoding?

Asked By: user2953714

||

Answers:

"the string still has the binary ‘b’ character" is the wrong approach to think about here (and in general will lead you to just pain down the line, with Python and encodings).

Instead, use Django’s force_str helper to decode a bytes to a text string (instead of using str to generate a representation of a bytestring as a text string):

force_str(file.read()).replace('n', 'rn') 

This of course only works for text that can be decoded; binary content has no inherent encoding (after all, how would you decode e.g. a PNG image as text?).

Answered By: AKX
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.