Coding

Socket SO_REUSEPORT and Kernel Implementations

Way back when I was at NGINX I worked with several people on integrating a kernel patch for SO_REUSEPORT in Linux to work with NGINX for something I termed “socket sharding“. In hindsight I should have maybe called it “socket load balancing” but the term “sharding” has stuck with that option across the industry now. Whilst this option is a standard option in many *nix kernels it behaves very differently in them. So I thought I would note some details down in this blog post

Linux

In Linux this option was added in Kernel 3.9 and basically turned the kernel into a load balancer for the socket. So your application can have multiple threads or processes listening on the same IP/port combination and the kernel will send incoming connections to one of these listeners. This can give a nice performance boost when you have a high number of connections per second coming into the server as it can reduce the thundering herd problem.

The downside with the Linux implementation is because it is using a hash to make sure the incoming connections are going to the right socket, as soon as a single listener fails the whole thing collapses because the modulus of the hash is different.

DragonFly BSD

This brings us to DragonFly BSD which has an implementation very similar to Linux’s, but crucially it adds hash tables to the mix. This means if a process fails the existing connections on other processes are not dropped. Everything can keep running smoothly.

Other BSDs

I believe this socket option was first defined by BSD and in general other BSDs such as FreeBSD and NetBSD have an implementation that is used to help bleed-off connections from old processes to new processes.

To use this every socket listener needs to have SO_REUSEPORT defined as before, but only the last listener gets the incoming connections. This could help with software upgrades for server processes. When the new process is brought up it also listens in the same socket, new connections are routed to it and when the old one has finished with current connections it can be shut down. No loss of connections.

I believe macOS also uses this behaviour, but this is not something I have tested.

FreeBSD

I have big respect for what FreeBSD have done, because they have been able to implement both types of implementations at the same time. When SO_REUSEPORT is used it behaves like other BSDs above. But around 2018 they also added SO_REUSEPORT_LB which is very similar to the DragonFly BSD implementation (with a few minor enhancements).

This gives the application the option of having either implementation depending on the needs of the application. I think in theory both could be used at the same time, but I do not see an advantage of this.

Other Operating Systems

This socket option is not really supported in many other operating systems such as Windows or even other *nix based ones. It is typically something that an application would need to detect at compile time and likely decide what to do based on the exact operating system and kernel detected.

In summary, socket programming can be a bit of a minefield, but the payoffs are really good once these things become clear.

LinuxJedi

Share
Published by
LinuxJedi

Recent Posts

Two special Amiga 4000s: More Jops Repairs

In my previous post in this series, I managed to diagnose and repair three very…

4 hours ago

Two special Amiga 4000s: Jops video

I finally got Jops to generate a good DiagROM serial output, but the video output…

1 week ago

Restoration of a barn find Amiga 2000: part 3

All the motherboard issues were resolved in my previous post in this series, now it…

1 week ago

Restoration of a barn find Amiga 2000: part 2

With this Amiga 2000, I previously got it into a state where it would boot…

2 weeks ago

Two special Amiga 4000s: Repairing Jops

Last time I worked on Jops, I left myself a lot of work to do.…

4 weeks ago

Restoration of a barn find Amiga 2000: part 1

I recently acquired an Amiga 2000 for £350 which was in an unknown state, but…

4 weeks ago