22 Nov

My home network…

This is a common discussion topic when I tell friends in Indian network operators that I work from home. As soon as I say that, they ask me – “How good is the connectivity at your home?” And of course like all answers in engineering – it depends. ūüôā

So I have two links at my home: IAXN and Siti broadband. IAXN is a FTTH connection with 50Mbps down and 25Mbps up, while Siti broadband is a DOCSIS connection with ~60Mbps down and 25Mbps up.

Both have reasonable but not 100% uptime. So to get close to 100% uptime, I use both together. These are consumer grade connections with no BGP. These days many routing platforms support running multiple WAN links for the redundancy reasons. I use Ubnt Edgerouter Lite which my good friend Nat Morris gifted me a while ago. Both links are defined in the “load balancing” where one link acts as primary and other for failover only with multiple routing tables. Next, policy based routing on the LAN VLAN sub-interface takes care of routing packets as needed. This documentation covers the setup in detail. For wifi I use a Asus device which runs purely as a access point in bridged mode with no routing.

Some other things in use at home network:

  • A Raspberry pi 3 stays on a dedicated VLAN & runs multiple site to site Wireguard VPN tunnels (over multiple WAN links) to multiple of my remote locations.
  • It also runs OSPF over FRR to ensure dynamic routing table changes whenever a link is changed. I can switch over traffic by defining the OSPF cost.
  • My server in Munich runs a NGIX proxy & apart from doing various tasks, it also hosts a test URL which does reverse proxy via Raspberry Pi at my home over Siti broadband (only). UptimeRobot monitors that URL for availability and that’s how I monitor my Siti broadband link which is without any public IP and totally behind the CGNAT.
  • Site to site VPNs over multiple links with OSPF taking care of dynamically moving traffic also takes care of things like SNMP monitoring of home devices. I use LibreNMS which is hosted remotely & keeps an eye on home network.
  • Raspberry Pi at home also runs Smokeping where certain predefined targets are moved forcefully out of each WAN link to plot latency. That helps in keeping eye on latency to ISP’s core, as well as upstream telco cores via each link.
  • I also host a node for Galmon project node to keep an eye on (American) GPS satellites, European, Chinese & Russian navigation satellites. The wonderful map here shows the receivers. Lately project is getting good coverage for it’s stats (reference here)
  • I run a DNS resolver at home (again on the raspberry pi)

While there’s auto switching in case of failure or packet loss beyond certain rate on the primary WAN link, I also have a ansible playbook which can be used to tweak the primary/secondary choice & the playbook is available via Semaphone web UI based interface so that my family can switch if they need to.

So the end result is close to 100% uptime (30 seconds outage if primary fails) as well with no irritating wifi switching as well as push notifications on my phone about an outage (via Uptime Robot) for both links. Usually there’s outage once in 30 days not because of WAN links but because I have shut things to clean up the dust.

26 May

India, DOCSIS, last mile broadband and more…

In my previous post, I shared how I am running redundant uplinks at home (in non-BGP based setup) with the primary link on RF and secondary on DOCSIS. One of my good friends asked me the reason for the sudden jump in DOCSIS-based players across India, especially in smaller cities.
Well, there are multiple reasons for it: 

  1. Fibre is less hard but still remains a cost point. ISPs in large metro cities ran fibre to the user’s building and go on cat5e/cat6 beyond that. That’s pretty much a lot of FTTH players do in India. That’s not possible in smaller cities which do not have¬†multi-dwelling units.
  2. While the cost of fibre is low, the cost of CPE (i.e ONU in the case of GPON or media converter for active networks) is still slightly high. It has come dome drastically in last few years and is about to reach a point where it’s cost won’t be a consideration in near future. Presently at around 2000-2500 ~ $40 USD that is still a hard cost for mass deployment. On the other hand cost of DOCSIS 2.0 as well as DOCSIS 3.0 modems is extremely low.
  3. A considerable part of long coax plant of LCOs got converted into Hybrid Fibre Coax (HFC) as the digital television broadcast is pushed further across the country. So a lot of fibre is already present now (hanging around!) near to a large number of homes even in smaller cities.
  4. The demand for bandwidth is increasing but a lot in terms of GB usage a month while not in terms of raw speeds in megabit per second at the same rate. So while 2Mbps to 5-10Mbps is a major upgrade. Beyond 10-20Mbps there’s just not much demand as yet. It would be great once 4k streaming and more VR stuff get popular but at current levels, demand for speeds isn’t increasing beyond 10-20Mbps due to lack of “killer” application which takes more bandwidth.
  5. DOCSIS networks can be eventually upgraded to GPON all the way to end user with the fair amount of changes. Operators even do hybrid DOCSIS-GPON where they move high capacity end users on dedicated cable headends & even to GPON while leaving low demand users on DOCSIS.
  6. The cost of termination coax is low & relatively easy to handle compared to splicing fibre to pigtail.

Thus DOCSIS is a cheap way to build the last mile in tier 3 cities. While the cost of RG-6/RG-11 cable is on the high side (against the cost of fibre) the equipment is priced pretty low.
#2 here is important. While $40 may sound not much in West, that cost is really high. As per available data, 12.4% of Indian population is below the poverty line. Furthermore, if we speak of per capita income was¬†Rs 93,293 ¬†~ $1400 for year 15-16. And thus pricing really matters. It’s extremely large number of not connected population and extremely price sensitive market. Here people typically pay little over 1000Rs / $16 a month for broadband in metros & large cities and little less than that in smaller cities. Providers in Western countries charge typically 4-5x of that and much more once bundled with digital cable TV.
Some of the players who did large scale DOCSIS setup in recent times mostly are MSOs and come from the cable TV world.

  1. Hathway broadband – AS17488
  2. DEN cable ¬†–¬†AS45184
  3. Siti broadband –¬†AS17747
  4. Netplus broadband –¬†AS133661
  5. Ortel –¬†AS23772
  6. Asianet broadband –¬†AS17465

and more!
One thing is common across these players is good plans. Reliability does vary but is decent enough for the most part except few exceptions. Also, a large number of these players do CGNAT, and I am not aware of anyone of them giving IPv6 to end users as yet. ūüôĀ

A quick view on plans

Hathway plans in Delhi

DEN Broadband in Delhi

If you are thinking this kind of plans where 50Mbps is being offered for 1200Rs /  $18 exist only Delhi, let me share some of the plans from smaller cities.
Netplus broadband in Punjab

Speedtest by one of their users and my friend Damanjit:

So how come these networks are cable to offer relatively good plans as compared to larger telcos who are mostly on DSL? One of lawyer friend from Bangalore asked this question earlier this month. This is due to following:

  1. The cost of IP transit is getting lower and lower across the world. While it’s still quite high in India compared to global standards but the decline is phenomenal. Thus these players are able to buy high capacity bandwidth drop at wholesale prices.
  2. The cost of point to point circuits is getting lower along with more content coming to India along with high-value content getting concentrated over CDNs. Over around half (actually 60% or so) of typical Indian eyeball ISP traffic can be offloaded via GGC, Google Peering, Akamai cache, Microsoft, Amazon, Limelight and so on besides generic HTTP and P2P caching. There’s a decent competition these days on the point to point circuit provider across India in all major cities. There are Govt. options (Railtel, Powergrid) as well as private players (Tata Comm, Airtel, Reliance, Vodafone, Aircel) etc offering circuits (which is known as NLD or National Long Distance in India).
  3. In terms of the last mile, these networks are typically DOCSIS 3.0 (for MSOs) backhauled over GPON or GEPON (yes, DOCSIS backhauled on PON instead of direct Ethernet circuits!). ¬†From a central location where CMTS is located, traffic is pushed towards societies over HFC plant, in societies usually, remote PHY devices are deployed to do optical to coax conversion & then on each headend lot’s of end users are connected. So the layer 1 aggregation happens on these phy devices hanging around.

The capacity on last mile as well as the middle mile is quite questionable on these networks due to multiple layers of contention (DOCSIS as well as xPON) but since traffic growth is relatively predictable and happens over a period of months, this part is somewhat managed. FTTH, of course, is the way to go in the long term, but if these players can cut costs and offer reasonable bandwidth at cheap prices, DOCSIS is a fine option. In Rohtak, for instance, Siti broadband’s 5/10/20/50 Mbps packages are competing with Govt. owned BSNL’s 2/4/8Mbps plans with low FUPs.
GPON, as well as DOCSIS, offer a flexible approach of splitting circuit coming from OLT (in GPON) or Phy (in DOCSIS) as the need comes. So for instance in the case of FTTH on GPON, one can aggregate 16 or more users over such passive splitter:

in the case of DOCSIS, it’s usual old school (though improved) coax taps:

While DSL is very different as compared to it. The cost of aggregating DSL userbase over DSLAM is much higher and capacities on ADSL are quite low.
This is how a typical DOCSIS remote PHY looks like

Ending this post with some pictures of outdoor fibre plant from my recent Bangalore trip. That’s part of our Digital Slum here in India.

11 May

Building redundancy on home network

I posted about the home network in multiple other posts in past. I recent time I switched from Microtik SXT Lite 5 to Power Beam PBE-M5-400. This gave me a jump from 16dbi to 25dbi which gives much sharper beam. I also got a harness & climbed BTS myself (after getting permission from the manager) this time to switch gear. I think I can do a better job than wasting time in finding guys from local WISPs to do it. ūüôā
Also, Essel Group launched Siti broadband in my home area and they are using DOCSIS. The network is overall fine though initially faced many outages due to fibre cuts here & there. As of now, the connection is reasonably stable. I am paying 860Rs/month ~ $14 for 10Mbps uncapped link which gives me 10Mbps down and 1.5Mbps up. From a price point, it’s an excellent connection to have for redundancy reasons. Now as the connection is stable enough to explore auto-failover. For last few months I took both primary links¬†as well as backup links to the router in the form of tagged VLANs and used to push specific traffic based on source IP (device at home) or destination IP/port combination using policy based routing.

Here both links drop on the TP-Link router which I use as a layer2 switch. I tag both links on different VLANs and carry them to my room over a single cable. TP-link 1043nd flashed with OpenWRT and it allows me to do simple layer 2 aggregation and maintains 1Gig link with other switch placed in my room.
It’s tricky to do an auto-failover in such static setup where I am not using BGP and hence WAN IP changes when the connection is switched. I use Ubiquity Edge router¬†as core router at home and it comes with the option of “load balancing” features where one can load balance or simply put a secondary interface in failover mode.
Here’s how the config looks like now:
(Note: VLAN10 / routing table1  РPrimary link and VLAN20 / routing table 2: Secondary link)

anurag@router01# show protocols static table 1
 description "Primary Link"
 route {
     next-hop $Provider1 - Router {
anurag@router01# show protocols static table 2
 route {
     next-hop $Provider2 - Router {
         description "Secondary Link"

So this is simply putting two different routing tables in the router besides the main table known as “main”. Next, is the load balancing config:

anurag@router01# show load-balance group Home-HA-Zone
 interface eth2.20 {
     route {
         table 2
     route-test {
         initial-delay 60
         interval 5
         type {
             ping {
 interface eth2.10 {
     route-test {
         count {
             failure 6
             success 12
         initial-delay 60
         interval 5
         type {
             ping {
 lb-local enable

So here I have eth2.20 defined for failover only and it uses routing table 2 while the primary link is eth2.10 which uses the main table. It’s basically sending 6 pings (one in every 5 seconds) and hence if 6/6 fail during 30 seconds long outage, a primary link would be considered dead and traffic will move to secondary link. The further router will keep on trying to ping the defined IP and once there are 12 successful pings (one in every 5 seconds) in a 1min period, it would be assumed live again. New sessions will switch over to primary while existing ones will stick with secondary to avoid outage on them.
Next, load balance config is called on a firewall modify instance:

anurag@router01# show firewall modify SOURCE_ROUTE rule 30
 action modify
 description "High Availability on Production LAN"
 modify {
     lb-group Home-HA-Zone

and this “SOURCE_ROUTE” is called on the LAN-facing interface to apply this policy on the interface:

anurag@router01# show interfaces ethernet eth2  vif 2 firewall in modify

And that’s all about it. It ensures that regular internet usage (not SSH sessions), streaming, Chromecast, etc all can stay live with a maximum impact of 30 seconds in case of the issue on the primary link.
Some misc notes:

  1. If primary link goes down, IPv6 would be still broken and I have yet to put a script to disable IPv6 on LAN in the case of an outage on the link.
  2. I noticed Ubnt¬†doesn’t behave well in terms of failover if I do not specify IPv4 test address. It tends to use a test string which was pointed to Amazon CDN (which is fine btw) but as a primary link fails, DNS resolution also fails and devices seem to be re-trying DNS resolution instead of assuming failure instantly.
  3. I focused on testing primary link with an IP far away in Europe. The secondary link does not really matter because it’s just not being used and the case when it is being used it is the only option. Hence extensive testing makes no sense on the secondary link.

Here’s output of this load-balancing setup:

anurag@router01:~$ show load-balance watchdog
Group Home-HA-Zone
  status: Running
  failover-only mode
  pings: 2857
  fails: 0
  run fails: 0/3
  route drops: 0
  ping gateway: - REACHABLE
  status: Running
  pings: 2744
  fails: 6
  run fails: 0/6
  route drops: 0
  ping gateway: - REACHABLE
anurag@router01:~$ show load-balance status
Group Home-HA-Zone
  interface   : eth2.10
  carrier     : up
  status      : active
  gateway     : $Provider1
  route table : 201
  weight      : 100%
      WAN Out : 11767
      WAN In  : 14446
    Local Out : 2
  interface   : eth2.20
  carrier     : up
  status      : failover
  route table : 2
  weight      : 0%
      WAN Out : 0
      WAN In  : 0
    Local Out : 0

Sidenote: I am in Bangalore for Rootconf 2017. I would be presenting about Eyeball routing measurement using RIPE Atlas. If you are around in Bangalore, drop me a message and it would be great to meet!