Python is printing 0x90c2 instead of just 0x90 NOP
Question:
The following command is outputting 200 bytes of ‘A’ followed by one byte of 0x0a:
python3 -c "print('x41'*200)" > out.txt
hexdump out.txt
confirms this:
0000000 4141 4141 4141 4141 4141 4141 4141 4141
*
00000c0 4141 4141 4141 4141 000a
00000c9
However, whenever I try to output 200 bytes of NOP sled (0x90), for some reason, python decides to also add a series of 0xc2 after every 0x90. So I’m running this:
python3 -c "print('x90'*200)" > out.txt
And according to hexdump out.txt
:
0000000 90c2 90c2 90c2 90c2 90c2 90c2 90c2 90c2
*
0000190 000a
0000191
This is not an issue in perl as the following outputs 200 bytes of NOP sled:
perl -e 'print "x90" x 200' > out.txt
Why is Python outputting 0x90 followed by 0xc2?
Answers:
You are outputting a str
, with a codec like utf8, for text output.
Prefer to output bytes
when binary output is of interest, and use binary mode.
$ python3 -c "b = bytes('x90' * 4, 'latin1'); print(len(b))"
4
$ python3 -c "b = bytes('x90' * 4, 'utf-8'); print(len(b))"
8
Python2, and perl, conflate the two.
Python3 draws a strong distinction between
a sequence of unicode codepoints and a serialized sequence of bytes.
You’re not printing 200 x90
bytes and a x0a
byte. You’re printing 200 U+0090 DEVICE CONTROL STRING characters, and a newline character. Those characters get encoded to bytes in whatever encoding sys.stdout
is set to, which appears to be UTF-8 here.
If you want to write bytes to a file, open it in binary mode and write a bytestring:
with open('out.txt', 'wb') as f:
f.write(b'x90'*200 + b'n') # you can leave the b'n' off if you don't want it
The following Python code resolved the issue:
python3 -c "import sys; sys.stdout.buffer.write(b'x90'*200)" > out.txt
This is confirmed by hexdump -C out.txt
:
00000000 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |................|
*
000000c0 90 90 90 90 90 90 90 90 |........|
000000c8
The following command is outputting 200 bytes of ‘A’ followed by one byte of 0x0a:
python3 -c "print('x41'*200)" > out.txt
hexdump out.txt
confirms this:
0000000 4141 4141 4141 4141 4141 4141 4141 4141
*
00000c0 4141 4141 4141 4141 000a
00000c9
However, whenever I try to output 200 bytes of NOP sled (0x90), for some reason, python decides to also add a series of 0xc2 after every 0x90. So I’m running this:
python3 -c "print('x90'*200)" > out.txt
And according to hexdump out.txt
:
0000000 90c2 90c2 90c2 90c2 90c2 90c2 90c2 90c2
*
0000190 000a
0000191
This is not an issue in perl as the following outputs 200 bytes of NOP sled:
perl -e 'print "x90" x 200' > out.txt
Why is Python outputting 0x90 followed by 0xc2?
You are outputting a str
, with a codec like utf8, for text output.
Prefer to output bytes
when binary output is of interest, and use binary mode.
$ python3 -c "b = bytes('x90' * 4, 'latin1'); print(len(b))"
4
$ python3 -c "b = bytes('x90' * 4, 'utf-8'); print(len(b))"
8
Python2, and perl, conflate the two.
Python3 draws a strong distinction between
a sequence of unicode codepoints and a serialized sequence of bytes.
You’re not printing 200 x90
bytes and a x0a
byte. You’re printing 200 U+0090 DEVICE CONTROL STRING characters, and a newline character. Those characters get encoded to bytes in whatever encoding sys.stdout
is set to, which appears to be UTF-8 here.
If you want to write bytes to a file, open it in binary mode and write a bytestring:
with open('out.txt', 'wb') as f:
f.write(b'x90'*200 + b'n') # you can leave the b'n' off if you don't want it
The following Python code resolved the issue:
python3 -c "import sys; sys.stdout.buffer.write(b'x90'*200)" > out.txt
This is confirmed by hexdump -C out.txt
:
00000000 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 |................|
*
000000c0 90 90 90 90 90 90 90 90 |........|
000000c8