Detect socket hangup without sending or receiving?

Question:

I’m writing a TCP server that can take 15 seconds or more to begin generating the body of a response to certain requests. Some clients like to close the connection at their end if the response takes more than a few seconds to complete.

Since generating the response is very CPU-intensive, I’d prefer to halt the task the instant the client closes the connection. At present, I don’t find this out until I send the first payload and receive various hang-up errors.

How can I detect that the peer has closed the connection without sending or receiving any data? That means for recv that all data remains in the kernel, or for send that no data is actually transmitted.

Asked By: Matt Joiner

||

Answers:

Check out select module.

Answered By: Rumple Stiltskin

The select module contains what you’ll need. If you only need Linux support and have a sufficiently recent kernel, select.epoll() should give you the information you need. Most Unix systems will support select.poll().

If you need cross-platform support, the standard way is to use select.select() to check if the socket is marked as having data available to read. If it is, but recv() returns zero bytes, the other end has hung up.

I’ve always found Beej’s Guide to Network Programming good (note it is written for C, but is generally applicable to standard socket operations), while the Socket Programming How-To has a decent Python overview.

Edit: The following is an example of how a simple server could be written to queue incoming commands but quit processing as soon as it finds the connection has been closed at the remote end.

import select
import socket
import time

# Create the server.
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind((socket.gethostname(), 7557))
serversocket.listen(1)

# Wait for an incoming connection.
clientsocket, address = serversocket.accept()
print 'Connection from', address[0]

# Control variables.
queue = []
cancelled = False

while True:
    # If nothing queued, wait for incoming request.
    if not queue:
        queue.append(clientsocket.recv(1024))

    # Receive data of length zero ==> connection closed.
    if len(queue[0]) == 0:
        break

    # Get the next request and remove the trailing newline.
    request = queue.pop(0)[:-1]
    print 'Starting request', request

    # Main processing loop.
    for i in xrange(15):
        # Do some of the processing.
        time.sleep(1.0)

        # See if the socket is marked as having data ready.
        r, w, e = select.select((clientsocket,), (), (), 0)
        if r:
            data = clientsocket.recv(1024)

            # Length of zero ==> connection closed.
            if len(data) == 0:
                cancelled = True
                break

            # Add this request to the queue.
            queue.append(data)
            print 'Queueing request', data[:-1]

    # Request was cancelled.
    if cancelled:
        print 'Request cancelled.'
        break

    # Done with this request.
    print 'Request finished.'

# If we got here, the connection was closed.
print 'Connection closed.'
serversocket.close()

To use it, run the script and in another terminal telnet to localhost, port 7557. The output from an example run I did, queueing three requests but closing the connection during the processing of the third one:

Connection from 127.0.0.1
Starting request 1
Queueing request 2
Queueing request 3
Request finished.
Starting request 2
Request finished.
Starting request 3
Request cancelled.
Connection closed.

epoll alternative

Another edit: I’ve worked up another example using select.epoll to monitor events. I don’t think it offers much over the original example as I cannot see a way to receive an event when the remote end hangs up. You still have to monitor the data received event and check for zero length messages (again, I’d love to be proved wrong on this statement).

import select
import socket
import time

port = 7557

# Create the server.
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind((socket.gethostname(), port))
serversocket.listen(1)
serverfd = serversocket.fileno()
print "Listening on", socket.gethostname(), "port", port

# Make the socket non-blocking.
serversocket.setblocking(0)

# Initialise the list of clients.
clients = {}

# Create an epoll object and register our interest in read events on the server
# socket.
ep = select.epoll()
ep.register(serverfd, select.EPOLLIN)

while True:
    # Check for events.
    events = ep.poll(0)
    for fd, event in events:
        # New connection to server.
        if fd == serverfd and event & select.EPOLLIN:
            # Accept the connection.
            connection, address = serversocket.accept()
            connection.setblocking(0)

            # We want input notifications.
            ep.register(connection.fileno(), select.EPOLLIN)

            # Store some information about this client.
            clients[connection.fileno()] = {
                'delay': 0.0,
                'input': "",
                'response': "",
                'connection': connection,
                'address': address,
            }

            # Done.
            print "Accepted connection from", address

        # A socket was closed on our end.
        elif event & select.EPOLLHUP:
            print "Closed connection to", clients[fd]['address']
            ep.unregister(fd)
            del clients[fd]

        # Error on a connection.
        elif event & select.EPOLLERR:
            print "Error on connection to", clients[fd]['address']
            ep.modify(fd, 0)
            clients[fd]['connection'].shutdown(socket.SHUT_RDWR)

        # Incoming data.
        elif event & select.EPOLLIN:
            print "Incoming data from", clients[fd]['address']
            data = clients[fd]['connection'].recv(1024)

            # Zero length = remote closure.
            if not data:
                print "Remote close on ", clients[fd]['address']
                ep.modify(fd, 0)
                clients[fd]['connection'].shutdown(socket.SHUT_RDWR)

            # Store the input.
            else:
                print data
                clients[fd]['input'] += data

        # Run when the client is ready to accept some output. The processing
        # loop registers for this event when the response is complete.
        elif event & select.EPOLLOUT:
            print "Sending output to", clients[fd]['address']

            # Write as much as we can.
            written = clients[fd]['connection'].send(clients[fd]['response'])

            # Delete what we have already written from the complete response.
            clients[fd]['response'] = clients[fd]['response'][written:]

            # When all the the response is written, shut the connection.
            if not clients[fd]['response']:
                ep.modify(fd, 0)
                clients[fd]['connection'].shutdown(socket.SHUT_RDWR)

    # Processing loop.
    for client in clients.keys():
        clients[client]['delay'] += 0.1

        # When the 'processing' has finished.
        if clients[client]['delay'] >= 15.0:
            # Reverse the input to form the response.
            clients[client]['response'] = clients[client]['input'][::-1]

            # Register for the ready-to-send event. The network loop uses this
            # as the signal to send the response.
            ep.modify(client, select.EPOLLOUT)

        # Processing delay.
        time.sleep(0.1)

Note: This only detects proper shutdowns. If the remote end just stops listening without sending the proper messages, you won’t know until you try to write and get an error. Checking for that is left as an exercise for the reader. Also, you probably want to perform some error checking on the overall loop so the server itself is shutdown gracefully if something breaks inside it.

Answered By: Blair

You can select with a timeout of zero, and read with the MSG_PEEK flag.

I think you really should explain what you precisely mean by “not reading”, and why the other answer are not satisfying.

Answered By: shodanex

I’ve had a recurring problem communicating with equipment that had separate TCP links for send and receive. The basic problem is that the TCP stack doesn’t generally tell you a socket is closed when you’re just trying to read – you have to try and write to get told the other end of the link was dropped. Partly, that is just how TCP was designed (reading is passive).

I’m guessing Blair’s answer works in the cases where the socket has been shut down nicely at the other end (i.e. they have sent the proper disconnection messages), but not in the case where the other end has impolitely just stopped listening.

Is there a fairly fixed-format header at the start of your message, that you can begin by sending, before the whole response is ready? e.g. an XML doctype? Also are you able to get away with sending some extra spaces at some points in the message – just some null data that you can output to be sure the socket is still open?

Answered By: asc99c

The socket KEEPALIVE option allows to detect this kind of “drop the connection without telling the other end” scenarios.

You should set the SO_KEEPALIVE option at SOL_SOCKET level. In Linux, you can modify the timeouts per socket using TCP_KEEPIDLE (seconds before sending keepalive probes), TCP_KEEPCNT (failed keepalive probes before declaring the other end dead) and TCP_KEEPINTVL (interval in seconds between keepalive probes).

In Python:

import socket
...
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPCNT, 5)

netstat -tanop will show that the socket is in keepalive mode:

tcp        0      0 127.0.0.1:6666          127.0.0.1:43746         ESTABLISHED 15242/python2.6     keepalive (0.76/0/0)

while tcpdump will show the keepalive probes:

01:07:08.143052 IP localhost.6666 > localhost.43746: . ack 1 win 2048 <nop,nop,timestamp 848683438 848683188>
01:07:08.143084 IP localhost.43746 > localhost.6666: . ack 1 win 2050 <nop,nop,timestamp 848683438 848682438>
01:07:09.143050 IP localhost.6666 > localhost.43746: . ack 1 win 2048 <nop,nop,timestamp 848683688 848683438>
01:07:09.143083 IP localhost.43746 > localhost.6666: . ack 1 win 2050 <nop,nop,timestamp 848683688 848682438>
Answered By: ninjalj

After struggling with a similar problem I found a solution that works for me, but it does require calling recv() in non-blocking mode and trying to read data, like this:

bytecount=recv(connectionfd,buffer,1000,MSG_NOSIGNAL|MSG_DONTWAIT);

The nosignal tells it to not terminate program on error, and the dontwait tells it to not block.
In this mode, recv() returns one of 3 possible types of responses:

  • -1 if there is no data to read or other errors.
  • 0 if the other end has hung up nicely
  • 1 or more if there was some data waiting.

So by checking the return value, if it is 0 then that means the other end hung up.
If it is -1 then you have to check the value of errno. If errno is equal to EAGAIN or EWOULDBLOCK then the connection is still believed to be alive by the server’s tcp stack.

This solution would require you to put the call to recv() into your intensive data processing loop — or somewhere in your code where it would get called 10 times a second or whatever you like, thus giving your program knowledge of a peer who hangs up.

This of course will do no good for a peer who goes away without doing the correct connection shutdown sequence, but any properly implemented tcp client will correctly terminate the connection.

Note also that if the client sends a bunch of data then hangs up, recv() will probably have to read that data all out of the buffer before it’ll get the empty read.

Answered By: Jesse Gordon

This code is very simple, reconnects forever and captures crtl+c to finish program closing the port. Change the port to you your needs

import select
import socket
import time
import sys
import threading

#create socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = ('', 2105)
print('starting up on {} port {}'.format(*server_address))
sock.bind(server_address)
sock.listen(1)

#main loop
while True:
    #waits for a new connection
    print('waiting for a connection')
    connection, client_address = sock.accept()
    try:
        print('connection from', client_address)
        #connection loop
        while True:
            try:
                r, w, e = select.select((connection,), (), (), 0)
                if r:
                    data = connection.recv(16)
                    if len(data) == 0:
                        break
                    print data
                    #example, return to client received data
                    connection.sendall(data)

            except KeyboardInterrupt:
                connection.close()
                sys.exit()

            except Exception as e:
                pass

            #let the socket receive some data
            time.sleep(0.1)

    except Exception as e:
        print e

    finally:
        #clean up connection
        connection.close()
Answered By: AlejandroAlis
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.