I’ve been reading about encoding, and the python I/O documentation but since I’m a bit new to programming haven’t understood much. I’m just trying to read a text file, then saving each line to another text file. But some of these lines are in Japanese characters and although when printed they’re correctly displayed in the Python IDE, the resulting text in the file it’s just empty.
This is what I’m trying to do:
filename = 'test.txt' # File with the japanese characters filename2 = 'test2.txt' text = open(filename, 'rb') # I've tried opening it as 'utf-8' too text2 = open(filename2, 'w', encoding='utf-8') # Output file for line in text: new_line = line.decode() # From bytes to utf-8 print(new_line) # Just to check text2.write(new_line) # Checking if file was written text3 = open(filename2, 'r', encoding='utf-8') for line2 in text3: print(line2 + 'something')
This code just prints the lines from the input file, but when using the last bit to print what’s in the output file, it prints nothing. I’m trying this on Linux and the output file, test2.txt, it’s just empty, doesn’t even have the lines in English. If I try running this on Windows, I get an error about charmap not being able to recognize the character or something when using .write(). If I remove all the lines in Japanese, this works just fine. I’ve also tried just opening the input file with utf-8 encoding (it’s already saved in that way but just in case) instead of bytes but it’s the same result.
Just in case, this is one of the japanese lines:
I’m using Python 3.5.2.
The encoding is fine, the problem with not seeing the result of last print is that you already opened the file
test2.txt for writing. Until you explicitly close the stream
text2, you won’t be able to read from file in another stream. So:
# close write stream text2.close() # now you can open the file again to read from it text3 = open(filename2,'r',encoding='utf-8')
Testing it on Linux and OSX yields:
$ echo "▣世界から解放され▣" > test.txt $ python3.5 script.py ▣世界から解放され▣ ▣世界から解放され▣ something
You have to close the file, that all contents is written. Best, use the
filename = 'test.txt' # File with the japanese characters filename2 = 'test2.txt' with open(filename,'r',encoding='utf-8') as text: with open(filename2,'w',encoding='utf-8') as text2: for line in text: text2.write(line)