Networking

Redundancy on the servers without BGP

A developer friend recently asked me about the design of redundancy on servers. He had a valid point – running BGP can be tricky and expensive since most colo & datacenter host would offer simple static routing & usually with just a couple of IP addresses. Furthermore, due to IPv4 exhaustion, the prices of /24 have shot off pretty massively. On top of this burning, a /24 on single or multiple servers is also a questionable design practice unless one of hosting & selling hundreds of virtual machines on those servers.

So on the question – how can a developer/sysadmin design network of a server with redundancy assuming that one is able to get multiple diversified links (from ISP A and ISP B) with static routing?

This certainly requires a few special things, here’s why:

  • A typical Linux server would have an option for only one gateway. If it’s towards ISP A, the circuit via ISP B won’t work. Packets will reach from the internet > ISP B > server but the server will try to return them via ISP A and ISP A might just drop them due to Unicast Reverse Path Forwarding (uRPF). These rules are designed to prevent spoofing. It can work if uRPF is not applied but…
  • Packets are still returning via ISP A and thus failure in A would result in a traffic drop on ISP B IP as well.
  • One might try to have dynamic routing with FRR and not install a static gateway at all but instead learn 0.0.0.0/0 dynamically via BGP but BGP, as stated earlier, can be altogether different commercial product. Plus a majority of ISPs won’t be interested in BGP session set up with a customer without a public RIR allocated AS number.
  • Ideally if one is designing such a setup, both the circuits should work and stay active. That way both can be monitored via external points. That should be the goal.

Possible design choices

  1. Running server a VyOS VM & terminating ISP B on the VM. Essentially carrying ISP B via layer 2 all the way to the VM. Layer 2 is important here as one cannot easily terminate ISP B on the host server.
  2. Running server with both links, terminating both of them on the server, keeping default route of the default routing table towards ISP A while having another routing table where default points to ISP B. Next, set a rule on iptables to ensure that outgoing traffic where the source is ISP B WAN IP, must be via ISP B next-hop gateway.

#1 is easy to achieve is there is a plan to run KVM on the server anyway. Here’s a sample config on #2 in absence of VM infra.

I got a demo server with two interfaces: ISP A – 10.50.60.60 with gateway 10.50.60.1 on ens18 & ISP B – 10.50.50.50 on ens19 with gateway 10.50.50.1.

anurag@dual-home-static-demo:~$ ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 5a:fb:5f:02:a4:14 brd ff:ff:ff:ff:ff:ff
3: ens19: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 9a:44:34:ac:b2:bc brd ff:ff:ff:ff:ff:ff
anurag@dual-home-static-demo:~$ 

Here’s what the interface config looks like with just ISP A:

anurag@dual-home-static-demo:~$ cat /etc/netplan/00-installer-config.yaml 
# This is the network config written by 'subiquity'
network:
  ethernets:
    ens18:
      dhcp4: false
      addresses:
           - 10.50.60.60/23
      gateway4: 10.50.60.1
      nameservers:
          addresses:      
           - 10.50.60.1      
  version: 2
anurag@dual-home-static-demo:~$ 

Now, to make ISP B work, let’s terminate ISP B IP but without a gateway in the OS interface config.

anurag@dual-home-static-demo:~$ cat /etc/netplan/00-installer-config.yaml 
# This is the network config written by 'subiquity'
network:
  ethernets:
    ens18:
      dhcp4: false
      addresses:
           - 10.50.60.60/23
      gateway4: 10.50.60.1
      nameservers:
          addresses:      
           - 10.50.60.1     

    ens19:
      addresses:
           - 10.50.50.50/24
             
  version: 2
anurag@dual-home-static-demo:~$ 

Understanding default Linux routing behaviour

Now when we ping 10.50.50.50 i.e ISP B WAN, the return path will go via ISP A gateway 10.50.60.1 which is a problem. So packets will enter via ens19 but return via ens18.
Here’s tcpdump showing that:

anurag@dual-home-static-demo:~$ sudo tcpdump -i ens19 'icmp' -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens19, link-type EN10MB (Ethernet), capture size 262144 bytes
17:24:51.713714 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 11, seq 1, length 64
17:24:52.745798 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 11, seq 2, length 64
17:24:53.769867 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 11, seq 3, length 64
17:24:54.793663 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 11, seq 4, length 64
17:24:55.817701 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 11, seq 5, length 64
^C
5 packets captured
5 packets received by filter
0 packets dropped by kernel

anurag@dual-home-static-demo:~$ sudo tcpdump -i ens18 'icmp' -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens18, link-type EN10MB (Ethernet), capture size 262144 bytes
17:25:08.882196 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 12, seq 1, length 64
17:25:09.897726 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 12, seq 2, length 64
17:25:10.921571 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 12, seq 3, length 64
17:25:11.945615 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 12, seq 4, length 64
17:25:12.969603 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 12, seq 5, length 64
^C
5 packets captured
5 packets received by filter
0 packets dropped by kernel
anurag@dual-home-static-demo:~$ 

So now the problem we gotta solve is that if packets are hitting 10.50.50.50 i.e ISP B via ens19, return should happen via the same interface.

Creating a new routing table

Add “2 ispb” in /etc/iproute2/rt_tables defining routing table number 2 with name ispb.
So here’s how /etc/iproute2/rt_tables looks like:

#
# reserved values
#
255	local
254	main
253	default
0	unspec
#
# local
#
#1	inr.ruhep
2 ispb

Next, ensure that table B is used when source is 10.50.50.50:

anurag@dual-home-static-demo:~$ sudo ip rule add from 10.50.50.50 lookup ispb
anurag@dual-home-static-demo:~$ sudo ip rule list
0:	from all lookup local
32765:	from 10.50.50.50 lookup ispb
32766:	from all lookup main
32767:	from all lookup default
anurag@dual-home-static-demo:~$ 

Next, point default route for table 2 to ISP B gateway 10.50.50.1:

anurag@dual-home-static-demo:~$ sudo ip route add default via 10.50.50.1 dev ens19 table ispb
anurag@dual-home-static-demo:~$ 

anurag@dual-home-static-demo:~$ sudo ip route show table ispb
default via 10.50.50.1 dev ens19 
anurag@dual-home-static-demo:~$ 

Final test

So let’s ping again from the outside server and look at tcpdump of ens19 i.e port connected to ISP B.

anurag@dual-home-static-demo:~$ sudo tcpdump -i ens19 'icmp' -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens19, link-type EN10MB (Ethernet), capture size 262144 bytes
18:17:51.588727 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 14, seq 1, length 64
18:17:51.588775 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 14, seq 1, length 64
18:17:52.599319 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 14, seq 2, length 64
18:17:52.599348 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 14, seq 2, length 64
18:17:53.623438 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 14, seq 3, length 64
18:17:53.623474 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 14, seq 3, length 64
18:17:54.647508 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 14, seq 4, length 64
18:17:54.647527 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 14, seq 4, length 64
18:17:55.671627 IP 10.50.60.6 > 10.50.50.50: ICMP echo request, id 14, seq 5, length 64
18:17:55.671687 IP 10.50.50.50 > 10.50.60.6: ICMP echo reply, id 14, seq 5, length 64
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel
anurag@dual-home-static-demo:~$ 

So we see packets coming and returning via the same interface. Hence now both IP endpoints 10.60.60.60 via ISP A and 10.50.50.50 via ISP B reply from their respective gateways and work independently.

One can use this with some sort of smartness at DNS which ensures that different IP is returned if the primary goes down. PowerDNS authoritative LUA records support that. If using Cloud, one can use policy record on AWS Route 53, or something like DNS failover records in DNSMadeEasy. In these cases auth DNS would keep an eye on multiple IPs and any service outage in one of the IP would reflect in the resolution of the record.

Leave a comment