Sending multiple files through a TCP socket

Question:

I have a simple client-server program, in which I try to send the contents of my Documents folder that contains 2 files [“img1.jpg”, “img2.jpg”].

Functioning:

The server waits for a client to connect and receives the messages from it, but if the message is text: files then the createExp () function that receives the name of the new folder to be created and the amount of files it goes to start Has receive.

With that data, I start a for cycle that has to be repeated according to the number of files that the user indicated to the server.

Cycle for:

This cycle has the function of receiving the data of each of the files sent by the client and subsequently saved in the indicated route.

Issue:

The server correctly receives a small part of the data, but then throws an error:

Traceback (most recent call last):
  File "C:UsersDellDesktopservidor_recv_archivo.py", line 53, in <module>
    if msgp.decode() == "files":
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 5: invalid
start byte

server.py

import socket
import os

def bytes_to_int(b):
    result = 0
    for i in range(4):
        result += b[i]<<(i*8)
    return result

def makeExp(client):
    while True:
        FolderName = client.recv(1024).decode()
        NumberFiles = client.recv(1024).decode()

        print(FolderName,NumberFiles)
        if not os.path.exists(FolderName):
            os.mkdir(FolderName)

        for element in range(int(NumberFiles)):
            size = client.recv(4)
            size = bytes_to_int(size)
            current_size = 0
            buffer = b""
            while current_size < size:
                data = client.recv(1024)
                if not data:
                    break
                if len(data) + current_size > size:
                    data = data[:size-current_size]
                buffer += data
                current_size += len(data)
            with open(str(element),"wb") as f:
                f.write(buffer)
        break

ip = "192.168.8.8"
port = 5555
data = (ip,port)
listen = 2

server = socket.socket()
server.bind(data)
server.listen(listen)

client,direction = server.accept()

while True:
    try:
        msgp = client.recv(1024)

        print(msgp)
        client.sendall("Msg recv".encode())
        if msgp.decode() == "files":
            makeExp(client)
    except ConnectionResetError:
        print("{} ".format(direction))
        break

client.py

import socket
import os

def convert_to_bytes(length):
    result = bytearray()
    result.append(length&255)
    for i in range(3):
        length = length>>8
        result.append(length&255)
    return result

def makeFolder(client):
    rute = "C:/Users/AngelHp/Desktop/Documentos"
    FolderName = os.path.basename("C:/Users/AngelHp/Desktop/Documentos")
    NumberFiles = str(len(os.listdir("C:/Users/AngelHp/Desktop/Documentos")))

    client.sendall(FolderName.encode())
    client.sendall(NumberFiles.encode())

    for element in (os.listdir(rute)):
        length = os.path.getsize(rute+"/"+element)
        client.send(convert_to_bytes(length))
        with open(rute+"/"+element,"rb") as infile:
            d = infile.read(1024)
            while d:
                client.send(d)
                d = infile.read(1024)

ip = "192.168.8.8"
port = 5555

client = socket.socket()
client.connect((ip,port))

while True:
    msg = input("> ")
    if msg != "files": #Al oprimir el boton guarar en serv, lanzara la funcion crearExpServ
        client.sendall(msg.encode())
        reply = client.recv(1024).decode()
        print(reply)
    elif msg == "files":
        print("ok")
        makeFolder(client)

@mark – edited

import socket
with socket.socket() as s:
    s.bind(('',8000))
    s.listen(1)
    with s.accept()[0] as c:
        chunks = []
        while True:
            chunk = c.recv(4096)
            if not chunk: break
            chunks.append(chunk)
    for i in range(2):
        with open('image{}.png'.format(str(i)),'wb') as f:
            f.write(b''.join(chunks))

cieent.py

import socket
import os

with socket.socket() as s:
    s.connect(('localhost',8000))
    for elemento in os.listdir("img"):
        print(elemento)
        with open("img"+"/"+elemento,'rb') as f:
            s.sendall(f.read())
Asked By: Revsky01

||

Answers:

TCP is a streaming protocol with no concept of message boundaries, so if you print msgp you will see it received more than you expected, probably folder name, number of files, and part of the binary file data. Since that data isn’t UTF-8 encoded, you get a UnicodeDecodeError.

You have to define a protocol and buffer data from the socket until it satisfies the protocol (read to a newline character, for example). Also see socket.makefile which wraps a socket in a file-like object so you can treat it more like a file. Methods like .readline() and .read(n) exist, so you could define a protocol like:

  1. Send Folder Name + newline
  2. Send number of files + newline
  3. Send filename #1 + newline
  4. send file size + newline
  5. send binary data of exactly “file size” bytes.
  6. Repeat 3-5 for remaining files.

Example implementing the above protocol (no error handling if a client breaks the protocol). Prepare a folder or two to send, then start the server, in another terminal, run client.py <folder> to transmit <folder> to a Downloads folder.

server.py

import socket
import os

s = socket.socket()
s.bind(('', 8000))
s.listen()

while True:
    client, address = s.accept()
    print(f'{address} connected')

    # client socket and makefile wrapper will be closed when with exits.
    with client, client.makefile('rb') as clientfile:
        while True:
            folder = clientfile.readline()
            if not folder:  # When client closes connection folder == b''
                break
            folder = folder.strip().decode()
            no_files = int(clientfile.readline())
            print(f'Receiving folder: {folder} ({no_files} files)')
            # put in different directory in case server/client on same system
            folderpath = os.path.join('Downloads', folder)
            os.makedirs(folderpath, exist_ok=True)
            for i in range(no_files):
                filename = clientfile.readline().strip().decode()
                filesize = int(clientfile.readline())
                data = clientfile.read(filesize)
                print(f'Receiving file: {filename} ({filesize} bytes)')
                with open(os.path.join(folderpath, filename), 'wb') as f:
                    f.write(data)

client.py

import socket
import sys
import os

def send_string(sock, string):
    sock.sendall(string.encode() + b'n')

def send_int(sock, integer):
    sock.sendall(str(integer).encode() + b'n')

def transmit(sock, folder):
    print(f'Sending folder: {folder}')
    send_string(sock, folder)
    files = os.listdir(folder)
    send_int(sock, len(files))
    for file in files:
        path = os.path.join(folder, file)
        filesize = os.path.getsize(path)
        print(f'Sending file: {file} ({filesize} bytes)')
        send_string(sock, file)
        send_int(sock, filesize)
        with open(path, 'rb') as f:
            sock.sendall(f.read())

s = socket.socket()
s.connect(('localhost', 8000))
with s:
    transmit(s, sys.argv[1])

I prepared two folders then ran "client Folder1" and "client Folder2". Client terminal output:

C:test>client Folder1
Sending folder: Folder1
Sending file: file1 (13 bytes)
Sending file: file2 (13 bytes)
Sending file: file3 (13 bytes)
Sending file: file4 (13 bytes)

C:test>client Folder2
Sending folder: Folder2
Sending file: file5 (13 bytes)
Sending file: file6 (13 bytes)

Output (server.py):

C:test>server
('127.0.0.1', 2303) connected
Receiving folder: Folder1 (4 files)
Receiving file: file1 (13 bytes)
Receiving file: file2 (13 bytes)
Receiving file: file3 (13 bytes)
Receiving file: file4 (13 bytes)
('127.0.0.1', 2413) connected
Receiving folder: Folder2 (2 files)
Receiving file: file5 (13 bytes)
Receiving file: file6 (13 bytes)

Other Examples:

Answered By: Mark Tolonen

The problem is that you can always encode text as bytes, e.g. " ".encode(), but can’t always decode an arbitrary sequence of bytes as text. This is because not all sequences of bytes are valid UTF-8 text. When you try to decode binary data to check if it’s equal to the "files" string, Python will throw an exeption if it detects byte sequences that aren’t used by the UTF-8 text standard.

# Will crash if msgp contains byte sequences that aren't defined in the UTF-8 standard.
if msgp.decode() == "files":

I was able to get your code partially working* by first converting the text to bytes, then comparing the bytes themselves**:

if msgp == "files".encode(): # comparing binary won't crash

*You also need to make sure that the client actually sends the message "files" to the server in the elif statement. For testing purposes, I worked around the lack of message boundaries by adding delays after sending each message, but as Mark Tolonen suggested, it would be much better to introduce a message boundary protocol!

**This technique is good for protocols that use a magic number to distingish the message / file, e.g. PDF files start with “%PDF”. However, note that some unicode characters have multiple valid binary representations, so comparing unicode at the byte level can lead to issues.

Answered By: anjsimmo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.