15 Jun

Routing with North East India!

A few weeks back I got in touch with Marc from Meghalaya. He offered to host RIPE Atlas probe at Shillong and that’s an excellent location which isn’t there on RIPE Atlas coverage network yet. It took around 5 days for the probe to reach Shillong from Haryana. I think probably this probe is the one at the most beautiful place in India. 🙂

Now that probe is connected, I thought to look into routing which is super exciting for far from places like Shillong. Marc has a BSNL FTTH connection & mentioned about not-so-good latency. Let’s trace to 1st IP of the corresponding /24 pool on which probe is hosted:

traceroute -A 117.247.134.1
traceroute to 117.247.134.1 (117.247.134.1), 30 hops max, 60 byte packets
1 172.16.0.1 (172.16.0.1) [*] 0.552 ms 0.738 ms 0.909 ms
2 192.168.0.1 (192.168.0.1) [*] 3.070 ms 3.330 ms 3.266 ms
3 10.8.64.1 (10.8.64.1) [*] 18.401 ms 18.497 ms 18.097 ms
4 192.168.250.2 (192.168.250.2) [*] 18.655 ms 18.778 ms 24.928 ms
5 125.23.236.157 (125.23.236.157) [AS24560/AS9498] 25.758 ms 25.532 ms 25.251 ms
6 182.79.234.57 (182.79.234.57) [*] 181.730 ms 174.327 ms 174.075 ms
7 182.79.205.173 (182.79.205.173) [*] 50.784 ms 41.060 ms 41.120 ms
8 182.79.178.150 (182.79.178.150) [*] 177.159 ms 176.908 ms 182.79.234.195 (182.79.234.195) [*] 179.786 ms
9 182.79.247.165 (182.79.247.165) [*] 50.855 ms 182.79.177.99 (182.79.177.99) [*] 50.611 ms 182.79.247.115 (182.79.247.115) [*] 50.299 ms
10 182.79.206.46 (182.79.206.46) [*] 175.914 ms 182.79.190.125 (182.79.190.125) [*] 178.664 ms 182.79.222.189 (182.79.222.189) [*] 178.358 ms
11 149.3.183.129 (149.3.183.129) [AS6762] 175.873 ms 175.950 ms 149.3.183.125 (149.3.183.125) [AS6762] 174.901 ms
12 xe11-1-0.marsig2.mar.seabone.net (213.144.176.214) [AS6762] 196.814 ms xe7-3-0.marsig2.mar.seabone.net (213.144.176.172) [AS6762] 169.245 ms xe11-0-0.marsig2.mar.seabone.net (213.144.176.224) [AS6762] 161.673 ms
13 * * *
14 103-16-152-26-noc.bsccl.com (103.16.152.26) [AS132602] 249.369 ms 245.773 ms 249.437 ms
15 163.47.80-138-noc.bsccl.com (163.47.80.138) [AS132602] 213.663 ms 214.164 ms 213.697 ms
16 218.248.255.1 (218.248.255.1) [AS9829] 213.832 ms 210.132 ms 209.828 ms
17 * * *
18 218.248.170.229 (218.248.170.229) [AS9829] 213.674 ms 217.609 ms 217.687 ms
19 117.247.134.1 (117.247.134.1) [AS9829] 223.293 ms 223.366 ms 222.937 ms

 

This is interesting output. So there are two parts of it:

  1. Traffic going via Bangladesh
  2. Traffic to Bangladesh going via Europe!

 

While #1 may look like a routing issue, it’s actually desired result of a deal between BSNL & BSCCL. I blogged about it in last year when it was visible in BGP tables. Eventually, this link was launched by Indian Prime Minister Modi.

 

From the map, it seems like an ideal choice but I really wish BSNL went for some kind of circuits instead of transit with BSCCL. Reason being poor routing across Asian backbones which we see in reason #2.

 

Coming to #2 – this clearly is bad and broken. Traffic is hitting from Siti broadband > Airtel > Telecom Italia > BSCCL and this is resulting in traffic going from India to Europe first before returning to South Asia.

 

Let’s trace to same 1st IP of the pool from all Indian RIPE Atlas probes for a detailed picture:

Measurement result: https://atlas.ripe.net/measurements/8844267/

 

As we can see latency numbers are quite decent from BSNL’s AS9829 itself. 60-70ms seems fine considering it’s from the probes which are in North or South India to far away in North East. Let’s look at some of these traces from probes on BSNL itself:

 

 

This shows that there is indeed a direct backbone circuit of BSNL to that location. There’s a low chance of it being on top of BSCCL infra.

 

Except for BSNL, rest all other Indian networks are routing towards that BSNL segment in Meghalaya from Europe or Singapore/Hong Kong. All the ones from Europe are from Marseille in France. That’s the landing station for 11 cable systems:

  • SEACOM
  • SEA-ME-WE-4
  • EIG
  • I-ME-WE
  • Ariane 2
  • Atlas Offshore
  • Med Cable
  • TE North
  • Tamares Telecom
  • Alexandros
  • AAE-1 (Asia Africa Europe)

 

Out of these Se-Me-We-4 lands in Bangladesh and I guess that is being used by BSCCL for traffic. So coming back to why routing is so terrible from Indian networks towards BSNL in North East? To understand that we need to look at uplinks of BSCCL.

Well, BSNL is announcing 117.247.134.0/24 to BSCCL AS132602 only. BSCCL is buying transit from Telecom Italia AS6762 and NTT AS2914.

http://bgp.he.net/AS132602#_graph4

 

Looking at one of few traces from Europe:

 

213.144.176.194/31 TIS – BSCCL connectivity
213.144.176.194 – 10Gig port on TIS AS6762 router in Marseille
213.144.176.195 – TIS’s IP on BSCCL router in somewhere in Bangladesh

 

Next, looking at NTT AS2914 transit of BSCCL:

 

Here as traffic handoff from Tata AS6453 is happening to NTT AS2914 in Singapore (logical and correct!) and NTT to BSCCL also within Singapore.  The latency is high due to bad return. Here forward is slightly bad but not as bad as return possibly.

Let’s look at return trace to 2nd hop 115.118.168.1 from RIPE probe at destination (measurement here):

So clearly return path i.e Shillong to Hyderabad is via Europe because BSCCL used TIS for forwarding path.

 

So keeping above traces in mind, here’s the reason for high latency:

  1. BSNL is routing traffic over its backbone but rest all traffic i.e which is not going towards BSNL is being routed from Bangladeshi provider BSCCL.
  2. BSCCL is announcing routes to NTT AS2914 in Singapore & TIS AS6762 in France. Thus to send any traffic to BSNL’s segment in Meghalaya, one has to send it either via TIS router in Marseille, France or NTT Singapore. This adds up latency significantly for Indian networks (excluding) towards BSNL Meghalaya.
  3. BSCCL is using TIS AS6762 to reach Tata AS6453 and this is resulting in very bad return route and thus Meghalaya to any other network in India who is Tata AS6453 downstream is via Marseille, France.

Quite a lot seems messed up. BSNL’s should at least start announcing 117.247.134.0/24 immediately across NIXI’s subject to capacity between their core network in North East. If there’s a capacity constrained, they should use L1 circuits from BSCCL to connect network in Shillong instead of IP transit.

 

How is BSNL in North East reaching Google?

Seems direct to BSNL’s PNI with Google within India.

31 May

BGP Administrative Shutdown Communication

I recently came across an excellent draft at IETF by Job Snijders &  friends.  This is to address scenarios where a network might miss communication about a maintenance activity when BGP shutdown happens. Once implemented, this can potentially offer to send peer a message with up to 128 bytes with info about shutdown like “Ticket XXX: We are upgrading the router, will be back live in 1hr” etc. It depends by appending such data to the sys notification which is part of BGP protocol. This is one which sends a message just before the shutdown of the session. So it similar to the way you see session tearing down due to prefix limits etc. This has already been implemented in some of the open source routing implementations like OpenBGPd, GoBGP, PMacct, Exabgp etc.

 

Here’s the latest draft of this change: https://tools.ietf.org/html/draft-ietf-idr-shutdown-09. And here’s Job’s talk from NANOG conference at the start of this year.

 

Hopefully, we will see this implemented across large vendor routers!

19 Dec

Google Public DNS and Akamai issues in India

A quick blog post on a interesting issue coming up due to combined problem of CDN failure on Google Public DNS and bad Akamai performance due to Tata-NTT peering issue.

 I was trying Zembra mail since there’s no more free Google Apps edition and one of my friend asked me to basic email on his domain up. It was more or less a straight task by installing Zembra with decent GUI.

 

I downloaded it on my Europe based server and during installation realized it was for 64 bit and thus I turned my head to my other server in India.
I started download again and it was slow. DEAD SLOW!

 

Something like this:

root@server2:~# wget http://files2.zimbra.com/downloads/8.0.1_GA/zcs-8.0.1_GA_5438.UBUNTU12_64.20121105164409.tgz
–2012-12-18 14:02:59– http://files2.zimbra.com/downloads/8.0.1_GA/zcs-8.0.1_GA_5438.UBUNTU12_64.20121105164409.tgz
Resolving files2.zimbra.com (files2.zimbra.com)… 23.32.241.26, 125.56.200.51
Connecting to files2.zimbra.com (files2.zimbra.com)|23.32.241.26|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 701053545 (669M) [binary/octet-stream]
Saving to: `zcs-8.0.1_GA_5438.UBUNTU12_64.20121105164409.tgz.1′

0% [ ] 5,545,378 67.5K/s eta 2h 28m ^C
root@server2:~#

 

Would have taken 2hrs + on 512Kbps speed while server is on 100Mbps connection and I usually get 20Mbps or so for US/Europe based sources. Since I downloaded same 700MB file on Europe based server and it was quite fast 40Mbps+ while here just 512kbps.

 

I looked at route from Indian server and route was:

traceroute to 23.32.241.26 (23.32.241.26), 30 hops max, 60 byte packets
1 103.6.87.1 (103.6.87.1) [AS36236] 1.310 ms 1.426 ms 1.716 ms
2 180.179.33.245 (180.179.33.245) [AS17439/AS9584] 0.843 ms 0.951 ms 0.958 ms
3 180.179.37.93 (180.179.37.93) [AS17439] 0.761 ms 0.762 ms 0.750 ms
4 * * *
5 180.179.37.137 (180.179.37.137) [AS17439] 0.840 ms 0.938 ms 1.091 ms
6 59.163.105.170.static-chennai.vsnl.net.in (59.163.105.170) [AS4755] 4.441 ms 3.891 ms 3.848 ms
7 * * *
8 ix-0-100.tcore2.MLV-Mumbai.as6453.net (180.87.39.25) [*] 27.175 ms 27.178 ms 27.927 ms
9 if-6-2.tcore1.L78-London.as6453.net (80.231.130.5) [AS6453] 143.917 ms 145.977 ms if-2-2.tcore1.MLV-Mumbai.as6453.net (180.87.38.1) [*] 135.972 ms
10 if-9-5.tcore1.WYN-Marseille.as6453.net (80.231.217.17) [AS6453] 132.133 ms 133.902 ms 133.286 ms
11 if-8-1600.tcore1.PYE-Paris.as6453.net (80.231.217.6) [AS6453] 134.763 ms 133.217 ms 136.691 ms
12 if-2-2.tcore1.PVU-Paris.as6453.net (80.231.154.17) [AS6453] 134.674 ms 137.558 ms *
13 * * *
14 ae-1.r21.parsfr01.fr.bb.gin.ntt.net (129.250.2.224) [AS2914] 153.657 ms 155.550 ms 152.412 ms
15 as-4.r22.amstnl02.nl.bb.gin.ntt.net (129.250.3.84) [AS2914] 155.136 ms 151.697 ms as-0.r25.tokyjp01.jp.bb.gin.ntt.net (129.250.3.79) [AS2914] 383.668 ms
16 * * *
17 * xe-3-2.a16.tokyjp01.jp.ra.gin.ntt.net (203.105.72.78) [AS2914] 271.296 ms 270.202 ms
18 a23-32-241-26.deploy.akamaitechnologies.com (23.32.241.26) [AS20940] 280.674 ms 282.407 ms xe-3-2.a16.tokyjp01.jp.ra.gin.ntt.net (203.105.72.78) [AS2914] 263.254 ms

 

Akami CDN node in Japan and route via Europe!!

 

This poor performance case is result of multiple issues:

  1. Datacenter is not running own DNS server but instead replying on Google Public DNS 8.8.8.8 and 8.8.4.4 which has upstream/full table connectivity outside in East Asia and not really in India. Thus Google DNS resolvers always pass IP of nodes outside India in Japan or sometimes in Malaysia. I blogged about it in detail sometimes back here.
  2. Datacenter is picking Tata-VSNL AS4755 for upstream for this route (not Airtel or Reliance), and interesting enough Tata does NOT peers with NTT Communications in Japan. They peer everywhere else except home market of NTT which I guess will be case with other ISPs as well. Thus nearest point for Tata to handle traffic to NTT is Europe which we can see in traceroute. 
  3. Since IP belongs to Akamai Japan, it brings traffic back to Asia over NTT Asia’s network and eventually passes it to Akamai. Strange that Akamai has not fixed this problem from long time. They must be knowing this since I posted this problem in NANOG mailing list last year and also blogged about it here.

 

OK – the fix!

I can surely do better then waiting for 2 hours to download that package! 
I quickly installed BIND and since BIND runs as “recursive resolver” by default, I simply pointed /etc/resolv.conf to 127.0.0.1

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND — YOUR CHANGES WILL BE OVERWRITTEN
# nameserver 8.8.8.8
# nameserver 8.8.4.4

nameserver 127.0.0.1

 

OK – now running download again, let’s see how it works:

 

root@server2:~/tmp2# wget http://files2.zimbra.com/downloads/8.0.1_GA/zcs-8.0.1_GA_5438.UBUNTU12_64.20121105164409.tgz
–2012-12-18 14:31:55– http://files2.zimbra.com/downloads/8.0.1_GA/zcs-8.0.1_GA_5438.UBUNTU12_64.20121105164409.tgz
Resolving files2.zimbra.com (files2.zimbra.com)… 125.252.226.97, 125.252.226.106
Connecting to files2.zimbra.com (files2.zimbra.com)|125.252.226.97|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 701053545 (669M) [binary/octet-stream]
Saving to: `zcs-8.0.1_GA_5438.UBUNTU12_64.20121105164409.tgz’

100%[============================================================================================================>] 701,053,545 3.07M/s in 3m 45s

2012-12-18 14:35:43 (2.98 MB/s) – `zcs-8.0.1_GA_5438.UBUNTU12_64.20121105164409.tgz’ saved [701053545/701053545]

root@server2:~/tmp2# 

 

Fast? Yeah a lot!

 

How?

Simply doing a trace to destination this time takes me to: 

traceroute to 125.252.226.97 (125.252.226.97), 30 hops max, 60 byte packets
1 103.6.87.1 (103.6.87.1) [AS36236] 2.089 ms 2.060 ms 2.047 ms
2 180.179.33.245 (180.179.33.245) [AS17439/AS9584] 2.012 ms 2.002 ms 1.997 ms
3 180.179.37.89 (180.179.37.89) [AS17439] 1.940 ms 1.949 ms 1.937 ms
4 180.179.37.38 (180.179.37.38) [AS17439] 2.258 ms 2.668 ms 3.144 ms
5 218.100.48.143 (218.100.48.143) [*] 2.199 ms 2.181 ms 2.174 ms
6 * 182.79.220.182 (182.79.220.182) [*] 1.683 ms 1.802 ms
7 a125-252-226-97.deploy.akamaitechnologies.com (125.252.226.97) [AS9498] 1.821 ms 1.775 ms 1.947 ms

An interesting case here is that NTT owns majority stake in Netmagic datacenter where this server is located! But likely they can’t do much since they need a license in India to offer their own network in Netmagic or simply peer more? 🙂

 

This is how I increased my download speed from 512Kbps to 24Mbps! 😉