05 Oct

Partial outage on .bd ccTLD on 5th Oct 2016

outage

 

Bangladesh’s .bd ccTLD faced another outage. As I mentioned in one of the previous posts – .bd domain seems to be primarily on BTCL (AS17494). Zone delegation of .bd is still pending with PCH and while PCH is mentioned in NS records of the authoritative DNS servers but delegation is pending in the root DNS servers as per reply from Kabindra from PCH on the bdNOG mailing list during the last outage.

If we look at root DNS zone .bd has following delegations:

bd.			172800	IN	NS	dns.bd.
bd.			172800	IN	NS	surma.btcl.net.bd.
bd.			172800	IN	NS	jamuna.btcl.net.bd.

 

After last outage few people started tracking uptime using RIPE Atlas Probes. Thus this time, we do know about the downtime.

Following RIPE Atlas measurements were tracking the DNS checks on jamuna.btcl.net.bd.
4598529 and 4598527. As per measurement ID 4598529, it wasn’t available from 15:44UTC on 4th Oct till 05:44UTC on 5th Oct 2016. This outage was visible from RIPE Atlas probes hosted in India, Singapore, Hong Kong, Japan and Germany.

jamuna.btcl.net.bd. measurement

 

At this point, it’s not known for the cause of the issue. It was mentioned on the bdNOG mailing list today.

20 Aug

Bangladesh .bd TLD outage on 18th August 2016

 

outage

Day before yesterday i.e on 18th August 2016 Bangladesh’s TLD .bd went had an outage. It was originally reported by Jasim Alam on bdNOG mailing list.

dig btcl.com.bd @8.8.8.8

; <<>> DiG 9.10.4-P2 <<>> btcl.com.bd @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8114
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;btcl.com.bd.                   IN      A

;; Query time: 76 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Thu Aug 18 14:24:25 Bangladesh Standard Time 2016
;; MSG SIZE  rcvd: 40

 

His message shows that DNS resolution of BTCL (Bangladesh Telecommunications Company Ltd) was failing. Later Alok Das that it was the power problem resulting in outage.

Let’s look ask one of 13 root DNS server about NS records on who has the delegation for .bd.

dig @k.root-servers.net. bd. ns

; <<>> DiG 9.8.3-P1 <<>> @k.root-servers.net. bd. ns
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7148
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 3, ADDITIONAL: 3
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;bd.   				IN     	NS

;; AUTHORITY SECTION:
bd.    			172800 	IN     	NS     	dns.bd.
bd.    			172800 	IN     	NS     	surma.btcl.net.bd.
bd.    			172800 	IN     	NS     	jamuna.btcl.net.bd.

;; ADDITIONAL SECTION:
dns.bd.			172800 	IN     	A      	209.58.24.3
surma.btcl.net.bd.     	172800 	IN     	A      	203.112.194.232
jamuna.btcl.net.bd.    	172800 	IN     	A      	203.112.194.231

;; Query time: 43 msec
;; SERVER: 2001:7fd::1#53(2001:7fd::1)
;; WHEN: Sat Aug 20 01:29:37 2016
;; MSG SIZE  rcvd: 136

So two of out of these three seem to be on BTCL network and that too on same /24.

 

Let’s ping to all these three using NLNOG Ring node of bdHUB: bdhub01.ring.nlnog.net

anurag@ansible:~$ ansible -a 'ping -c 5 dns.bd'  bdhub01.ring.nlnog.net
bdhub01.ring.nlnog.net | SUCCESS | rc=0 >>
PING dns.bd (209.58.24.3) 56(84) bytes of data.
64 bytes from 209.58.24.3: icmp_req=1 ttl=60 time=0.754 ms
64 bytes from 209.58.24.3: icmp_req=2 ttl=60 time=0.728 ms
64 bytes from 209.58.24.3: icmp_req=3 ttl=60 time=0.725 ms
64 bytes from 209.58.24.3: icmp_req=4 ttl=60 time=0.726 ms
64 bytes from 209.58.24.3: icmp_req=5 ttl=60 time=0.737 ms

--- dns.bd ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 34660ms
rtt min/avg/max/mdev = 0.725/0.734/0.754/0.010 ms

anurag@ansible:~$


anurag@ansible:~$ ansible -a 'ping -c 5 surma.btcl.net.bd'  bdhub01.ring.nlnog.net
bdhub01.ring.nlnog.net | SUCCESS | rc=0 >>
PING surma.btcl.net.bd (203.112.194.232) 56(84) bytes of data.
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=1 ttl=60 time=0.775 ms
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=2 ttl=60 time=0.739 ms
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=3 ttl=60 time=1.02 ms
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=4 ttl=60 time=0.724 ms
64 bytes from host232.btcl.net.bd (203.112.194.232): icmp_req=5 ttl=60 time=0.724 ms

--- surma.btcl.net.bd ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4534ms
rtt min/avg/max/mdev = 0.724/0.796/1.022/0.119 ms

anurag@ansible:~$

anurag@ansible:~$ ansible -a 'ping -c 5 jamuna.btcl.net.bd'  bdhub01.ring.nlnog.net
bdhub01.ring.nlnog.net | SUCCESS | rc=0 >>
PING jamuna.btcl.net.bd (203.112.194.231) 56(84) bytes of data.
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=1 ttl=60 time=0.739 ms
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=2 ttl=60 time=0.785 ms
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=3 ttl=60 time=0.948 ms
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=4 ttl=60 time=1.26 ms
64 bytes from host231.btcl.net.bd (203.112.194.231): icmp_req=5 ttl=60 time=0.747 ms

--- jamuna.btcl.net.bd ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4513ms
rtt min/avg/max/mdev = 0.739/0.897/1.268/0.201 ms

anurag@ansible:~$

So clearly all three servers are in Bangladesh/local as per super low latency from bdHUB node. From traces from outside India it’s quite unlikely of any other anycast node outside Bangladesh. This is a serious design issue. For a country’s TLD one should have much more resiliency.

My good friend Fakrul from APNIC mentioned on mailing list about PCH becoming secondary for .bd. Same is visible now in the authority NS records of the domain.

dig @dns.bd. bd. ns +short
jamuna.btcl.net.bd.
dns.bd.
bd-ns.anycast.pch.net.
surma.btcl.net.bd.

 

So once the same is added on root DNS servers, it will bring up bit more resiliency with PCH’s platform with large number of anycast nodes.

So what was impact of this outage?
Well, probably a lot. .bd TLD outage would have brought down a lot of websites running on .bd domain. Any fresh DNS lookup would have failed, any websites with lower TTL would have went down. As per bdIX traffic graph some disturbance is visible across that day.

bdix drop

 

27 Mar

SMW4 Cable outage

Today a friend from Pakistan informed about SMW4 outage. He reported about issues in Pakistan.

It seems like SMW4 is damaged near Egypt and that is what causing high load on East Asian routes giving pretty high latency.

 

I am at my home and sitting BSNL’s network and latency with Europe has jumped terribly to 700-800ms. Right now I do not see a direct route to Europe and it’s rather taking East Asia > US > Europe routes right now on other cable networks.

 

Quick view on some of traceroutes:

 

To Facebook.com

anurag:~ anurag$ traceroute -a www.facebook.com
traceroute to star.c10r.facebook.com (69.171.229.25), 64 hops max, 52 byte packets
1 [AS65534] router02 (10.10.0.1) 1.759 ms 1.018 ms 0.869 ms
2 [AS9829] 117.220.160.1 (117.220.160.1) 18.184 ms 18.809 ms 17.962 ms
3 [AS9829] 218.248.169.126 (218.248.169.126) 28.761 ms 28.648 ms 28.352 ms
4 [AS4755] 115.114.57.165.static-mumbai.vsnl.net.in (115.114.57.165) 77.803 ms 63.059 ms 61.319 ms
5 [AS3549] 172.29.250.33 (172.29.250.33) 63.106 ms 62.755 ms 63.853 ms
6 * * *
7 [AS4755] 115.114.85.233 (115.114.85.233) 64.694 ms 63.013 ms 61.133 ms
8 * [AS0] if-7-2.tcore1.cxr-chennai.as6453.net (180.87.36.34) 531.243 ms *
9 [AS0] if-5-2.tcore1.svw-singapore.as6453.net (180.87.12.53) 566.615 ms 906.432 ms *
10 * * *
11 [AS0] if-2-2.tcore1.tv2-tokyo.as6453.net (180.87.180.1) 577.953 ms 542.487 ms *
12 * [AS0] if-9-2.tcore2.pdi-paloalto.as6453.net (180.87.180.17) 538.170 ms 617.144 ms
13 * [AS3549] te1-4-10g.ar1.pao2.gblx.net (208.51.134.97) 673.785 ms *
14 * [AS22566] xe10-3-1-10g.scr3.snv2.gblx.net (67.17.79.169) 563.667 ms 631.657 ms
15 [AS22566] e8-1-20g.ar5.sjc2.gblx.net (67.16.145.118) 554.785 ms * *
16 [AS3549] 64.208.158.30 (64.208.158.30) 535.164 ms 573.485 ms 546.552 ms
17 [AS32934] ae1.bb02.sjc1.tfbnw.net (204.15.21.164) 580.511 ms * 529.838 ms
18 [AS32934] ae12.bb02.prn1.tfbnw.net (74.119.79.109) 543.454 ms 569.572 ms
[AS32934] ae16.bb01.prn1.tfbnw.net (31.13.24.254) 659.153 ms
19 [AS32934] ae1.dr02.prn1.tfbnw.net (74.119.79.107) 567.662 ms *
[AS32934] ae1.dr05.prn1.tfbnw.net (204.15.23.61) 560.851 ms
20 * * *
21 * * *
22 *^C
anurag:~ anurag$

 

Route to Europe:

anurag:~ anurag$ traceroute -a server7.anuragbhatia.com
traceroute to server7.anuragbhatia.com (178.238.225.247), 64 hops max, 52 byte packets
1 [AS65534] router02 (10.10.0.1) 1.797 ms 0.989 ms 1.015 ms
2 [AS9829] 117.220.160.1 (117.220.160.1) 21.046 ms 18.046 ms 18.068 ms
3 [AS9829] 218.248.169.126 (218.248.169.126) 244.155 ms 28.669 ms 28.922 ms
4 [AS4755] 115.114.57.165.static-mumbai.vsnl.net.in (115.114.57.165) 62.840 ms 61.595 ms 60.564 ms
5 [AS0] 172.31.16.193 (172.31.16.193) 91.433 ms 94.132 ms 94.564 ms
6 [AS6453] if-2-606.tcore1.njy-newark.as6453.net (66.198.70.121) 529.370 ms * *
7 * [AS6453] 66.110.59.66 (66.110.59.66) 566.573 ms *
8 [AS1299] nyk-bb1-link.telia.net (80.91.252.226) 614.390 ms * *
9 * * [AS1299] ffm-bb1-link.telia.net (213.155.131.146) 697.499 ms
10 * [AS1299] mcn-b2-link.telia.net (213.155.134.13) 733.122 ms 721.410 ms
11 * [AS1299] gw02.contabo.net (213.248.101.78) 731.281 ms *
12 * * [AS51167] server7.anuragbhatia.com (178.238.225.247) 702.811 ms
anurag:~ anurag$

 

 

Issues seems not isolated to BSNL or Tata but also with Airtel.

 

E.g Airtel Delhi PoP to London:

 

Wed Mar 27 16:28:59 GMT+05:30 2013
traceroute 62.239.237.1

Type escape sequence to abort.
Tracing the route to 62.239.237.1

1 203.101.100.29 [MPLS: Label 716197 Exp 0] 84 msec
182.79.254.242 [MPLS: Label 716197 Exp 0] 84 msec
203.101.95.146 [MPLS: Label 677302 Exp 0] 80 msec
2 125.21.80.161 [MPLS: Label 406905 Exp 0] 156 msec
203.101.95.141 [MPLS: Label 406905 Exp 0] 76 msec
202.56.223.205 [MPLS: Label 406905 Exp 0] 92 msec
3 203.101.95.117 [MPLS: Label 569896 Exp 0] 120 msec 40 msec
203.101.100.205 [MPLS: Label 389360 Exp 0] 52 msec
4 182.79.255.18 92 msec
182.79.255.14 88 msec 88 msec
5 BHA-0007.gw1.sin0.asianetcom.net (203.192.168.53) 176 msec 180 msec 172 msec
6 te0-3-0-0.wr1.sin0.asianetcom.net (61.14.157.233) [AS 10026] 184 msec 184 msec 216 msec
7 gi3-0-0.cr2.nrt1.asianetcom.net (61.14.157.158) [AS 10026] 248 msec 248 msec 244 msec
8 po5-0-0.gw3.lax1.asianetcom.net (202.147.0.38) [AS 10026] 428 msec 432 msec 420 msec
9 linx7.ukcore.bt.net (195.66.224.56) [AS 10026] 388 msec 388 msec 388 msec
10 *
core1-te0-3-0-1.ealing.ukcore.bt.net (62.172.102.2) [AS 2856] 384 msec 392 msec
11 core1-pos1-0.birmingham.ukcore.bt.net (62.172.103.81) [AS 2856] 384 msec 384 msec 384 msec
12 iar1-gig5-4.birmingham.ukcore.bt.net (62.6.196.94) [AS 2856] 392 msec 388 msec 448 msec
13 62.172.57.218 [AS 2856] 384 msec 432 msec 392 msec
14 * * *

 

 

If we look at Tata AS6453’s routing table at Mumbai for a Europe based IP:

BGP routing table entry for 178.238.224.0/21
Bestpath Modifiers: deterministic-med
Paths: (3 available, best #3)
Multipath: eBGP
     11         12        
  3356 51167, (aggregated by 51167 gw02.contabo.net.)
    ldn-icore1. (metric 9713) from mlv-tcore2. (66.110.10.215)
      Origin IGP, valid, internal, atomic-aggregate
      Community: 
      Originator: ldn-icore1.
  3356 51167, (aggregated by 51167 gw02.contabo.net.)
    ldn-icore1. (metric 9713) from mlv-tcore1. (66.110.10.202)
      Origin IGP, valid, internal, atomic-aggregate
      Community: 
      Originator: ldn-icore1.
  3356 51167, (aggregated by 51167 gw02.contabo.net.)
    ldn-icore1. (metric 9713) from cxr-tcore1. (66.110.10.113)
      Origin IGP, valid, internal, atomic-aggregate, best
      Community: 
      Originator: ldn-icore1.



There seems to be direct path via mlv – tcore 1 (Mumbai > Europe) but overall it is less preferred and cxr-tcore1 is given preference (Chennai > East Asian route). Same applies on most of other Europe based prefixes.I tried pulling data from my RIPE Probe #1032 but not able to login to RIPE atlas site hosted in Europe!

 

That’s all for now. Will post updates as things improve.