Way back when I was at NGINX I worked with several people on integrating a kernel patch for SO_REUSEPORT in Linux to work with NGINX for something I termed “socket sharding“. In hindsight I should have maybe called it “socket load balancing” but the term “sharding” has stuck with that option across the industry now. Whilst this option is a standard option in many *nix kernels it behaves very differently in them. So I thought I would note some details down in this blog post

Linux

In Linux this option was added in Kernel 3.9 and basically turned the kernel into a load balancer for the socket. So your application can have multiple threads or processes listening on the same IP/port combination and the kernel will send incoming connections to one of these listeners. This can give a nice performance boost when you have a high number of connections per second coming into the server as it can reduce the thundering herd problem.

The downside with the Linux implementation is because it is using a hash to make sure the incoming connections are going to the right socket, as soon as a single listener fails the whole thing collapses because the modulus of the hash is different.

DragonFly BSD

This brings us to DragonFly BSD which has an implementation very similar to Linux’s, but crucially it adds hash tables to the mix. This means if a process fails the existing connections on other processes are not dropped. Everything can keep running smoothly.

Other BSDs

I believe this socket option was first defined by BSD and in general other BSDs such as FreeBSD and NetBSD have an implementation that is used to help bleed-off connections from old processes to new processes.

To use this every socket listener needs to have SO_REUSEPORT defined as before, but only the last listener gets the incoming connections. This could help with software upgrades for server processes. When the new process is brought up it also listens in the same socket, new connections are routed to it and when the old one has finished with current connections it can be shut down. No loss of connections.

I believe macOS also uses this behaviour, but this is not something I have tested.

FreeBSD

I have big respect for what FreeBSD have done, because they have been able to implement both types of implementations at the same time. When SO_REUSEPORT is used it behaves like other BSDs above. But around 2018 they also added SO_REUSEPORT_LB which is very similar to the DragonFly BSD implementation (with a few minor enhancements).

This gives the application the option of having either implementation depending on the needs of the application. I think in theory both could be used at the same time, but I do not see an advantage of this.

Other Operating Systems

This socket option is not really supported in many other operating systems such as Windows or even other *nix based ones. It is typically something that an application would need to detect at compile time and likely decide what to do based on the exact operating system and kernel detected.

In summary, socket programming can be a bit of a minefield, but the payoffs are really good once these things become clear.

Leave a Reply

Your email address will not be published. Required fields are marked *