Bangladesh .bd TLD outage on 18th August 2016

outage Day before yesterday i.e on 18th August 2016 Bangladesh’s TLD .bd went had an outage. It was originally reported by Jasim Alam on bdNOG mailing list.

dig btcl.com.bd @8.8.8.8
; <<>> DiG 9.10.4-P2 <<>> btcl.com.bd @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8114
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;btcl.com.bd.                   IN      A
;; Query time: 76 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Thu Aug 18 14:24:25 Bangladesh Standard Time 2016
;; MSG SIZE  rcvd: 40

His message shows that DNS resolution of BTCL (Bangladesh Telecommunications Company Ltd) was failing. Later Alok Das that it was the power problem resulting in outage. Let’s look ask one of 13 root DNS server about NS records on who has the delegation for .bd.

dig @k.root-servers.net. bd. ns
; <<>> DiG 9.8.3-P1 <<>> @k.root-servers.net. bd. ns
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7148
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 3
;; WARNING: recursion requested but not available
;; QUESTION SECTION:
;bd.   				IN     	NS
;; AUTHORITY SECTION:
bd.    			172800 	IN     	NS     	dns.bd.
bd.    			172800 	IN     	NS     	surma.btcl.net.bd.
bd.    			172800 	IN     	NS     	jamuna.btcl.net.bd.
;; ADDITIONAL SECTION:
dns.bd.			172800 	IN     	A      	209.58.24.3
surma.btcl.net.bd.     	172800 	IN     	A      	203.112.194.232
jamuna.btcl.net.bd.    	172800 	IN     	A      	203.112.194.231
;; Query time: 43 msec
;; SERVER: 2001:7fd::1#53(2001:7fd::1)
;; WHEN: Sat Aug 20 01:29:37 2016
;; MSG SIZE  rcvd: 136

So two of out of these three seem to be on BTCL network and that too on same /24.   Let’s ping to all these three using NLNOG Ring node of bdHUB: bdhub01.ring.nlnog.net

anurag@ansible:~$ ansible -a 'ping -c 5 dns.bd'  bdhub01.ring.nlnog.net
bdhub01.ring.nlnog.net | SUCCESS | rc=0 >>
PING dns.bd (209.58.24.3) 56(84) bytes of data.
64 bytes from 209.58.24.3: icmp_req=1 ttl=60 time=0.754 ms
64 bytes from 209.58.24.3: icmp_req=2 ttl=60 time=0.728 ms
64 bytes from 209.58.24.3: icmp_req=3 ttl=60 time=0.725 ms
64 bytes from 209.58.24.3: icmp_req=4 ttl=60 time=0.726 ms
64 bytes from 209.58.24.3: icmp_req=5 ttl=60 time=0.737 ms
--- dns.bd ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 34660ms
rtt min/avg/max/mdev = 0.725/0.734/0.754/0.010 ms
anurag@ansible:~$

anurag@ansible:~$ ansible -a 'ping -c 5 surma.btcl.net.bd'  bdhub01.ring.nlnog.net
bdhub01.ring.nlnog.net | SUCCESS | rc=0 >>
PING surma.btcl.net.bd (203.112.194.232) 56(84) bytes of data.
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=1 ttl=60 time=0.775 ms
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=2 ttl=60 time=0.739 ms
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=3 ttl=60 time=1.02 ms
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=4 ttl=60 time=0.724 ms
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=5 ttl=60 time=0.724 ms
--- surma.btcl.net.bd ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4534ms
rtt min/avg/max/mdev = 0.724/0.796/1.022/0.119 ms
anurag@ansible:~$

anurag@ansible:~$ ansible -a 'ping -c 5 jamuna.btcl.net.bd'  bdhub01.ring.nlnog.net
bdhub01.ring.nlnog.net | SUCCESS | rc=0 >>
PING jamuna.btcl.net.bd (203.112.194.231) 56(84) bytes of data.
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=1 ttl=60 time=0.739 ms
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=2 ttl=60 time=0.785 ms
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=3 ttl=60 time=0.948 ms
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=4 ttl=60 time=1.26 ms
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=5 ttl=60 time=0.747 ms
--- jamuna.btcl.net.bd ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4513ms
rtt min/avg/max/mdev = 0.739/0.897/1.268/0.201 ms
anurag@ansible:~$

So clearly all three servers are in Bangladesh/local as per super low latency from bdHUB node. From traces from outside India it’s quite unlikely of any other anycast node outside Bangladesh. This is a serious design issue. For a country’s TLD one should have much more resiliency. My good friend Fakrul from APNIC mentioned on mailing list about PCH becoming secondary for .bd. Same is visible now in the authority NS records of the domain.

dig @dns.bd. bd. ns +short jamuna.btcl.net.bd. dns.bd. bd-ns.anycast.pch.net. surma.btcl.net.bd.

So once the same is added on root DNS servers, it will bring up bit more resiliency with PCH’s platform with large number of anycast nodes.


So what was impact of this outage?

Well, probably a lot. .bd TLD outage would have brought down a lot of websites running on .bd domain. Any fresh DNS lookup would have failed, any websites with lower TTL would have went down. As per bdIX traffic graph some disturbance is visible across that day.