Python is printing 0x90c2 instead of just 0x90 NOP

Question:

The following command is outputting 200 bytes of ‘A’ followed by one byte of 0x0a:

python3 -c "print('x41'*200)" > out.txt

hexdump out.txt confirms this:

0000000 4141 4141 4141 4141 4141 4141 4141 4141
*
00000c0 4141 4141 4141 4141 000a
00000c9

However, whenever I try to output 200 bytes of NOP sled (0x90), for some reason, python decides to also add a series of 0xc2 after every 0x90. So I’m running this:

python3 -c "print('x90'*200)" > out.txt

And according to hexdump out.txt:

0000000 90c2 90c2 90c2 90c2 90c2 90c2 90c2 90c2
*
0000190 000a
0000191

This is not an issue in perl as the following outputs 200 bytes of NOP sled:

perl -e 'print "x90" x 200' > out.txt

Why is Python outputting 0x90 followed by 0xc2?

Asked By: ramon

||

Answers:

You are outputting a str, with a codec like utf8, for text output.

Prefer to output bytes when binary output is of interest, and use binary mode.

$  python3 -c "b = bytes('x90' * 4, 'latin1'); print(len(b))"
4
$  python3 -c "b = bytes('x90' * 4, 'utf-8');  print(len(b))"
8

Python2, and perl, conflate the two.
Python3 draws a strong distinction between
a sequence of unicode codepoints and a serialized sequence of bytes.

Answered By: J_H

You’re not printing 200 x90 bytes and a x0a byte. You’re printing 200 U+0090 DEVICE CONTROL STRING characters, and a newline character. Those characters get encoded to bytes in whatever encoding sys.stdout is set to, which appears to be UTF-8 here.

If you want to write bytes to a file, open it in binary mode and write a bytestring:

with open('out.txt', 'wb') as f:
    f.write(b'x90'*200 + b'n') # you can leave the b'n' off if you don't want it
Answered By: user2357112

The following Python code resolved the issue:

python3 -c "import sys; sys.stdout.buffer.write(b'x90'*200)" > out.txt

This is confirmed by hexdump -C out.txt:

00000000  90 90 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |................|
*
000000c0  90 90 90 90 90 90 90 90                           |........|
000000c8
Answered By: ramon
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.