At work, I (Greg Porter) have a lot of students (hundreds) that need to use ssh to log into a couple of unix hosts. Most of the students use one particular host, vogon.csc.calpoly.edu. When vogon gets busy, or fork bombed, or hangs, all those users are SOL. It’d be nice if we had multiple ssh hosts behind some sort of ssh load balancer. Of course, we can’t afford a real load balancer.
We figured it out with iptables, and so far it seems to work.
References (Thanks, Google!):
https://securepoint.com/lists/html/NetFilter/2006-11/msg00091.html
https://lserinol.blogspot.com/2008/05/simple-load-balancing-with-iptables.html
I’m used to hardware load balancers. Something like this Barracuda one. These work as you might expect. Incoming traffic is routed to one of many back end servers. Fancier ones might even realize when a back end server stops responding, and stop sending traffic their way.
The way we did it is with iptables. Iptables is basically a packet filter (or more generically a packet mangler).
To get this working you have to:
Pick a name/address for users to connect to. In this example, it’s unix0.csc.calpoly.edu. Unix0 doesn’t exist (in real life), it’s just a virtual name/address for the cluster.
Number each back end node. In this case the real back end nodes are unix1 through unix4.csc.calpoly.edu. Each node will “know” what cluster node number it is. Unix1 is node number 1, unix2 is node 2 and so on.
Give each cluster member an additional “fake” cluster MAC address they share. This is a specially crafted “multicast” MAC address, which I had never even heard of. Once each member server has an additional address, you get this very strange result when you ping it (them?):
[glporter@updates ~]$ ping -c 1 unix0
PING unix0.csc.calpoly.edu (129.65.158.160) 56(84) bytes of data.
64 bytes from unix0.csc.calpoly.edu (129.65.158.160): icmp_seq=1 ttl=64 time=0.241 ms
64 bytes from unix0.csc.calpoly.edu (129.65.158.160): icmp_seq=1 ttl=64 time=0.242 ms (DUP!)
64 bytes from unix0.csc.calpoly.edu (129.65.158.160): icmp_seq=1 ttl=64 time=0.242 ms (DUP!)
64 bytes from unix0.csc.calpoly.edu (129.65.158.160): icmp_seq=1 ttl=64 time=0.479 ms (DUP!)
— unix0.csc.calpoly.edu ping statistics —
1 packet transmitted, 1 received, +3 duplicates, 0% packet loss, time 999ms
Betcha never seen THAT before! I know I haven’t. Ping one address and get 4 responses. Hmm. We got four responses because unix1 responded from its additional (unix0) address, unix2 responded (from its unix0 address), unix3 responded and unix 4 responded. So each member node responds to a request sent to unix0.
So far we don’t have a “load balancer”, it’s more like a “load spammer”. Speak to one IP, 4 IPs hear you.
Now even more magic happens. We’re going to use some rules in iptables to:
Take an incoming connection to unix0, look at the source IP address, and apply a hash rule to it. The hash we use will ALWAYS get a 1, 2, 3 or 4 (for our 4 node cluster) for any given source IP. No matter “who” uses the hash, given a particular source IP, you’ll always get the same answer (1, 2, 3, or 4).
Decide if a node should handle the connection. Each node “hears” the incoming connection. Each node hashes the source IP. Each node gets the same answer (1, 2, 3, or 4 for our 4 node cluster). If the source IP hashes to their own node number (“I’m node 1 and this IP hashes to 1”) then they act on the request. If not, they drop the packet in the bit bucket.
So although all nodes 4 “hear” any particular request, only one acts on it. Voila! Load balancing! (Sort of).
This sort of handles state in a crude way. It assumes that the same user, for a particular session, will be coming from the same IP, and they will be serviced by the same node.
It doesn’t address what happens when a node goes down. In this setup, if a node goes down, then 1 in 4 connections don’t get serviced, ever.
It’s *VERY* strange. It works, but wow, how weird.
Here’s what the iptables rules look like on one of the nodes:
[root@unix4 ~]# service iptables status
Table: filter
Chain INPUT (policy ACCEPT)
num target prot opt source destination
1 CLUSTERIP tcp — 0.0.0.0/0 129.65.158.160 tcp dpt:22 CLUSTERIP hashmode=sourceip clustermac=01:00:5E:41:9E:A0 total_nodes=4 local_node=4 hash_init=0
2 SSHRULES tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22
Chain FORWARD (policy ACCEPT)
num target prot opt source destination
Chain OUTPUT (policy ACCEPT)
num target prot opt source destination
Chain SSHRULES (1 references)
num target prot opt source destination
1 REJECT all — 0.0.0.0/0 0.0.0.0/0 recent: UPDATE seconds: 7 hit_count: 2 name: DEFAULT side: source reject-with icmp-port-unreachable
2 ACCEPT all — 0.0.0.0/0 0.0.0.0/0 recent: SET name: DEFAULT side: source