Python scalable chat server

Question:

I’ve just begun learning sockets with Python. So I’ve written some examples of chat servers and clients. Most of what I’ve seen on the internet seems to use threading module for (asynchronous) handling of clients’ connections to the server. I do understand that for a scalable server you need to use some additional tricks, because thousands of threads can kill the server (correct me if I’m wrong, but is it due to GIL?), but that’s not my concern at the moment.

The strange thing is that I’ve found somewhere in Python documentation that creating subprocesses is the right way (unfortunately I’ve lost the reference, sorry 🙁 ) for handling sockets.

So the question is: to use threading or multiprocessing? Or is there even better solution?

Please, give me the answer and explain the difference to me.

By the way: I do know that there are things like Twisted which are well-written.
I’m not looking for a pre-made scalable server, I am instead trying to understand how to write one that can be scaled or will deal with at least 10k clients.

EDIT: The operating system is Linux.

Asked By: freakish

Source

Answers:

Facebook needed a scalable server so they wrote Tornado (which uses async). Twisted is also famously scalable (it also uses async). Gunicorn is also a top performer (it uses multiple processes). None of the fast, scalable tools that I know about uses threading.

An easy way to experiment with the different approaches is to start with the SocketServer module in the standard library: http://docs.python.org/library/socketserver.html . It lets you easily switch approaches by alternately inheriting from either ThreadingMixin or ForkingMixin.

Also, if you’re interested in learning about the async approach, the easiest way to build your understanding is to read a blog post discussing the implementation of Tornado: http://golubenco.org/2009/09/19/understanding-the-code-inside-tornado-the-asynchronous-web-server-powering-friendfeed/

Good luck and happy computing 🙂

Answered By: Raymond Hettinger

thousands of threads can kill the server (correct me if I’m wrong, but is it due to GIL?)

For one thing, GIL has nothing to do with no. of threads. If you’re are doing IO within these threads, you could have hundreds of thousands of these threads without any problem from GIL or otherwise.

GIL comes into play when you have CPU intensive tasks.

See this very informative talk from David Beazly to know more about GIL.

Answered By: treecoder