InMemoryUploadedFile Django File Read in Bytes – str vs utf8
Question:
I’m looking to read the binary content of a Django type InMemoryUploadedFile. The file type/encoding is a variable, so I don’t want to assume UTF8, I simply want to read the binary content, then I want to encode with UTF8. Here’s what I’ve tried:
str(file.read()).replace('n', 'rn')
This looks to work, except the string still has the binary ‘b’ character. To fix this, I tried:
file.read().decode('utf8').replace('n', 'rn')
This works well for reading .txt files. Any other file types fail to read properly, understandably. How can I read the binary content of type "InMemoryUploadedFile" without specifying an encoding?
Answers:
"the string still has the binary ‘b’ character" is the wrong approach to think about here (and in general will lead you to just pain down the line, with Python and encodings).
Instead, use Django’s force_str
helper to decode a bytes
to a text string (instead of using str
to generate a representation of a bytestring as a text string):
force_str(file.read()).replace('n', 'rn')
This of course only works for text that can be decoded; binary content has no inherent encoding (after all, how would you decode e.g. a PNG image as text?).
I’m looking to read the binary content of a Django type InMemoryUploadedFile. The file type/encoding is a variable, so I don’t want to assume UTF8, I simply want to read the binary content, then I want to encode with UTF8. Here’s what I’ve tried:
str(file.read()).replace('n', 'rn')
This looks to work, except the string still has the binary ‘b’ character. To fix this, I tried:
file.read().decode('utf8').replace('n', 'rn')
This works well for reading .txt files. Any other file types fail to read properly, understandably. How can I read the binary content of type "InMemoryUploadedFile" without specifying an encoding?
"the string still has the binary ‘b’ character" is the wrong approach to think about here (and in general will lead you to just pain down the line, with Python and encodings).
Instead, use Django’s force_str
helper to decode a bytes
to a text string (instead of using str
to generate a representation of a bytestring as a text string):
force_str(file.read()).replace('n', 'rn')
This of course only works for text that can be decoded; binary content has no inherent encoding (after all, how would you decode e.g. a PNG image as text?).