22 Nov

My home network…

This is a common discussion topic when I tell friends in Indian network operators that I work from home. As soon as I say that, they ask me – “How good is the connectivity at your home?” And of course like all answers in engineering – it depends. 🙂

So I have two links at my home: IAXN and Siti broadband. IAXN is a FTTH connection with 50Mbps down and 25Mbps up, while Siti broadband is a DOCSIS connection with ~60Mbps down and 25Mbps up.

Both have reasonable but not 100% uptime. So to get close to 100% uptime, I use both together. These are consumer grade connections with no BGP. These days many routing platforms support running multiple WAN links for the redundancy reasons. I use Ubnt Edgerouter Lite which my good friend Nat Morris gifted me a while ago. Both links are defined in the “load balancing” where one link acts as primary and other for failover only with multiple routing tables. Next, policy based routing on the LAN VLAN sub-interface takes care of routing packets as needed. This documentation covers the setup in detail. For wifi I use a Asus device which runs purely as a access point in bridged mode with no routing.

Some other things in use at home network:

  • A Raspberry pi 3 stays on a dedicated VLAN & runs multiple site to site Wireguard VPN tunnels (over multiple WAN links) to multiple of my remote locations.
  • It also runs OSPF over FRR to ensure dynamic routing table changes whenever a link is changed. I can switch over traffic by defining the OSPF cost.
  • My server in Munich runs a NGIX proxy & apart from doing various tasks, it also hosts a test URL which does reverse proxy via Raspberry Pi at my home over Siti broadband (only). UptimeRobot monitors that URL for availability and that’s how I monitor my Siti broadband link which is without any public IP and totally behind the CGNAT.
  • Site to site VPNs over multiple links with OSPF taking care of dynamically moving traffic also takes care of things like SNMP monitoring of home devices. I use LibreNMS which is hosted remotely & keeps an eye on home network.
  • Raspberry Pi at home also runs Smokeping where certain predefined targets are moved forcefully out of each WAN link to plot latency. That helps in keeping eye on latency to ISP’s core, as well as upstream telco cores via each link.
  • I also host a node for Galmon project node to keep an eye on (American) GPS satellites, European, Chinese & Russian navigation satellites. The wonderful map here shows the receivers. Lately project is getting good coverage for it’s stats (reference here)
  • I run a DNS resolver at home (again on the raspberry pi)

While there’s auto switching in case of failure or packet loss beyond certain rate on the primary WAN link, I also have a ansible playbook which can be used to tweak the primary/secondary choice & the playbook is available via Semaphone web UI based interface so that my family can switch if they need to.

So the end result is close to 100% uptime (30 seconds outage if primary fails) as well with no irritating wifi switching as well as push notifications on my phone about an outage (via Uptime Robot) for both links. Usually there’s outage once in 30 days not because of WAN links but because I have shut things to clean up the dust.

11 May

Building redundancy on home network

I posted about the home network in multiple other posts in past. I recent time I switched from Microtik SXT Lite 5 to Power Beam PBE-M5-400. This gave me a jump from 16dbi to 25dbi which gives much sharper beam. I also got a harness & climbed BTS myself (after getting permission from the manager) this time to switch gear. I think I can do a better job than wasting time in finding guys from local WISPs to do it. 🙂

 

Also, Essel Group launched Siti broadband in my home area and they are using DOCSIS. The network is overall fine though initially faced many outages due to fibre cuts here & there. As of now, the connection is reasonably stable. I am paying 860Rs/month ~ $14 for 10Mbps uncapped link which gives me 10Mbps down and 1.5Mbps up. From a price point, it’s an excellent connection to have for redundancy reasons. Now as the connection is stable enough to explore auto-failover. For last few months I took both primary links as well as backup links to the router in the form of tagged VLANs and used to push specific traffic based on source IP (device at home) or destination IP/port combination using policy based routing.

 

 

Here both links drop on the TP-Link router which I use as a layer2 switch. I tag both links on different VLANs and carry them to my room over a single cable. TP-link 1043nd flashed with OpenWRT and it allows me to do simple layer 2 aggregation and maintains 1Gig link with other switch placed in my room.

It’s tricky to do an auto-failover in such static setup where I am not using BGP and hence WAN IP changes when the connection is switched. I use Ubiquity Edge router as core router at home and it comes with the option of “load balancing” features where one can load balance or simply put a secondary interface in failover mode.

 

Here’s how the config looks like now:

(Note: VLAN10 / routing table1  – Primary link and VLAN20 / routing table 2: Secondary link)

anurag@router01# show protocols static table 1
 description "Primary Link"
 route 0.0.0.0/0 {
     next-hop $Provider1 - Router {
     }
 }
[edit]
anurag@router01# show protocols static table 2
 route 0.0.0.0/0 {
     next-hop $Provider2 - Router {
         description "Secondary Link"
     }
 }
[edit]
anurag@router01#

 

So this is simply putting two different routing tables in the router besides the main table known as “main”. Next, is the load balancing config:

anurag@router01# show load-balance group Home-HA-Zone
 interface eth2.20 {
     failover-only
     route {
         table 2
     }
     route-test {
         initial-delay 60
         interval 5
         type {
             ping {
                 target 150.107.9.54
             }
         }
     }
 }
 interface eth2.10 {
     route-test {
         count {
             failure 6
             success 12
         }
         initial-delay 60
         interval 5
         type {
             ping {
                 target 62.140.24.49
             }
         }
     }
 }
 lb-local enable
[edit]
anurag@router01#

 

So here I have eth2.20 defined for failover only and it uses routing table 2 while the primary link is eth2.10 which uses the main table. It’s basically sending 6 pings (one in every 5 seconds) and hence if 6/6 fail during 30 seconds long outage, a primary link would be considered dead and traffic will move to secondary link. The further router will keep on trying to ping the defined IP and once there are 12 successful pings (one in every 5 seconds) in a 1min period, it would be assumed live again. New sessions will switch over to primary while existing ones will stick with secondary to avoid outage on them.

 

Next, load balance config is called on a firewall modify instance:

anurag@router01# show firewall modify SOURCE_ROUTE rule 30
 action modify
 description "High Availability on Production LAN"
 modify {
     lb-group Home-HA-Zone
 }
[edit]
anurag@router01#

and this “SOURCE_ROUTE” is called on the LAN-facing interface to apply this policy on the interface:

anurag@router01# show interfaces ethernet eth2  vif 2 firewall in modify
 modify SOURCE_ROUTE
[edit]
anurag@router01#

 

And that’s all about it. It ensures that regular internet usage (not SSH sessions), streaming, Chromecast, etc all can stay live with a maximum impact of 30 seconds in case of the issue on the primary link.

 

Some misc notes:

  1. If primary link goes down, IPv6 would be still broken and I have yet to put a script to disable IPv6 on LAN in the case of an outage on the link.
  2. I noticed Ubnt doesn’t behave well in terms of failover if I do not specify IPv4 test address. It tends to use a test string which was pointed to Amazon CDN (which is fine btw) but as a primary link fails, DNS resolution also fails and devices seem to be re-trying DNS resolution instead of assuming failure instantly.
  3. I focused on testing primary link with an IP far away in Europe. The secondary link does not really matter because it’s just not being used and the case when it is being used it is the only option. Hence extensive testing makes no sense on the secondary link.

 

Here’s output of this load-balancing setup:

anurag@router01:~$ show load-balance watchdog
Group Home-HA-Zone
  eth2.20
  status: Running
  failover-only mode
  pings: 2857
  fails: 0
  run fails: 0/3
  route drops: 0
  ping gateway: 150.107.9.54 - REACHABLE

  eth2.10
  status: Running
  pings: 2744
  fails: 6
  run fails: 0/6
  route drops: 0
  ping gateway: 62.140.24.49 - REACHABLE

anurag@router01:~$ show load-balance status
Group Home-HA-Zone
  interface   : eth2.10
  carrier     : up
  status      : active
  gateway     : $Provider1
  route table : 201
  weight      : 100%
  flows
      WAN Out : 11767
      WAN In  : 14446
    Local Out : 2

  interface   : eth2.20
  carrier     : up
  status      : failover
  route table : 2
  weight      : 0%
  flows
      WAN Out : 0
      WAN In  : 0
    Local Out : 0

anurag@router01:~$

 

 

Sidenote: I am in Bangalore for Rootconf 2017. I would be presenting about Eyeball routing measurement using RIPE Atlas. If you are around in Bangalore, drop me a message and it would be great to meet!