F root server, Chennai down from 5 months. Who cares?

05-10-2012

Time for a quick followup blog post. On 26th April of this year I blogged about broken connectivity of F root server which was hosted in NIXI Chennai. Apart from that blog post, I did informed ISC which operates F root (NIXI was host on behalf of them in India). In my open email on APNIC mailing list, I got a reply from Network Operations Center of ISC that they will verify and will take necessary action. Within 48 hours of that email they figured out root cause and since they couldn’t fix it right at that point, they pulled plug off from that root server.

This was one of 3 Global root DNS servers hosted in India. I am sad to post that till date they have not been able to turn server back live. No blame to ISC but this is how serious Indian bodies are about internet and infrastructure.

My current traceroute to F root:

traceroute to f.root-servers.net (192.5.5.241), 30 hops max, 60 byte packets
1 router.local (10.0.0.1) [AS1] 0.969 ms 1.168 ms 1.488 ms
2 117.206.176.1 (117.206.176.1) [AS9829] 19.203 ms 20.001 ms 22.286 ms
3 218.248.169.122 (218.248.169.122) [AS9829] 26.905 ms 28.801 ms 29.490 ms
4 115.114.131.137.static-mumbai.vsnl.net.in (115.114.131.137) [AS4755] 64.299 ms 66.175 ms 68.068 ms
5 172.31.16.193 (172.31.16.193) [*] 95.702 ms 172.31.19.145 (172.31.19.145) [*] 96.813 ms 98.304 ms
6 ix-4-2.tcore1.CXR-Chennai.as6453.net (180.87.36.9) [*] 304.038 ms 280.526 ms 280.544 ms
7 if-3-3.tcore2.CXR-Chennai.as6453.net (180.87.36.6) [*] 330.969 ms if-5-2.tcore1.SVW-Singapore.as6453.net (180.87.12.53) [*] 327.010 ms if-3-3.tcore2.CXR-Chennai.as6453.net (180.87.36.6) [*] 333.282 ms
8 if-5-2.tcore2.SVW-Singapore.as6453.net (180.87.15.69) [*] 319.188 ms if-2-2.tcore2.SVW-Singapore.as6453.net (180.87.12.2) [*] 319.458 ms if-5-2.tcore2.SVW-Singapore.as6453.net (180.87.15.69) [*] 341.489 ms
9 Vlan1870.icore1.HK2-HongKong.as6453.net (180.87.15.61) [*] 339.646 ms Vlan1850.icore1.HK2-HongKong.as6453.net (180.87.15.18) [*] 337.416 ms Vlan1779.icore1.HK2-HongKong.as6453.net (180.87.15.38) [*] 338.317 ms
10 isc2-FE.hkix.net (202.40.161.200) [AS2687/AS4862/AS9498/AS10026/AS1221] 340.247 ms 339.589 ms 344.179 ms
11 f.root-servers.net (192.5.5.241) [AS55440/AS3557/AS23708/AS8167] 340.218 ms 341.172 ms 341.604 ms

So I am still hitting Hong Kong.

Please note ultra high latency here is due to usual old problem of BSNL that they have broken return paths. We can see that as soon as traffic is handed over to AS6453 on hop 6, there is a huge spike in latency. Since AS6453 - Tata has a publically available looking glass, I can traceroute back to my IP from there and see the path:

Router: gin-cfo-core1
Site: IN, Chennai - CFO, VSNL
Command: traceroute ip 117.206.184.217

Tracing the route to 117.206.184.217

1 if-11-0-2-0.tcore1.CXR-Chennai.as6453.net (180.87.36.26) [MPLS: Label 613458 Exp 0] 268 msec
if-1-0-0-0.tcore1.CXR-Chennai.as6453.net (180.87.36.13) [MPLS: Label 613458 Exp 0] 252 msec
if-1-3-0-0.tcore1.CXR-Chennai.as6453.net (180.87.36.17) [MPLS: Label 613458 Exp 0] 280 msec
2 if-7-2.tcore1.MLV-Mumbai.as6453.net (180.87.36.33) [MPLS: Label 508693 Exp 0] 248 msec 304 msec
if-3-3.tcore2.CXR-Chennai.as6453.net (180.87.36.6) [MPLS: Label 557305 Exp 0] 400 msec
3 if-9-2.tcore2.MLV-Mumbai.as6453.net (180.87.37.10) [MPLS: Label 320866 Exp 0] 412 msec 404 msec 404 msec
4 if-6-2.tcore1.L78-London.as6453.net (80.231.130.5) [MPLS: Label 731443 Exp 0] 400 msec 248 msec
if-2-2.tcore2.WYN-Marseille.as6453.net (80.231.217.2) [MPLS: Label 404482 Exp 0] 244 msec
5 if-2-2.tcore2.L78-London.as6453.net (80.231.131.1) [MPLS: Label 515300 Exp 0] 256 msec 256 msec 256 msec
6 if-20-2.tcore2.NYY-NewYork.as6453.net (216.6.99.13) [MPLS: Label 300800 Exp 0] 260 msec 268 msec 260 msec
7 if-9-0-0-19.mcore4.NYY-NewYork.as6453.net (209.58.60.149) 252 msec 252 msec 252 msec
8 ix-14-2.mcore4.NYY-NewYork.as6453.net (64.86.71.58) 484 msec 476 msec 480 msec
9 218.248.255.109 [AS 9829] 500 msec 612 msec 604 msec
10 218.248.169.121 [AS 9829] 624 msec 504 msec 504 msec
11 218.248.169.121 [AS 9829] 500 msec 504 msec 504 msec
12 * * *
13 * * *
14 * * *
15 * * *

So path is like Chennai > Mumbai > London > New York > back to India BSNL. This is completely due to negligence of BSNL. They are doing BGP announcement only at New York which is why India to Hong Kong packets go straight but return is via New York taking latency super high. Anyways this is separate issue on it’s own. Coming back on main issue of this post i.e F root server - it is yet not up and things are still “moving” but slowly.

Looking at last week latency to F root server from home hosted RIPE NCC probe:

IMG_20130411_0055151

What exactly was cause of problem?

The cause of problem was forced MLP and regional only MLP. Here’s the exact NIXI’s policy which says:

An ISP at any NIXI node must at a minimum announce all its regional routes to the NIXI router at that NIXI location. All ISPs connecting to that NIXI node are entitled to receive these routes using a single BGP session with the NIXI router. This will guarantee the exchange of regional traffic within a NIXI node. This is referred to as forced regional multi-lateral peering under the policy.

Now ISC was running F root server without any transit and was relying completely on peering sessions in Chennai region. If you recall at that time problem effecting few networks only. For networks like Sify, IDEA Cellular it was all running well while for BSNL it was failing. The reason is when ISPs like BSNL participate at NIXI, they announce ONLY regional routes. So BSNL was getting BGP announcement of ISC which was sitting below NIXI Chennai router, while BSNL itself was announcing prefixes only at New Delhi (closest to Haryana) exchange and not at Chennai exchange. Since node was without any transit, it was not able to reach BSNL users outside Chennai at all (and so does for many other big ISPs). As of now ISC is working on deal with NIXI to get a basic transit pipe from STPI (well another Govt. ISP). Since it will be transit pipe, it will provide full global routing table feed including BSNL Haryana and other routes.

This is truly an absurd that Indian Govt. is terribly slow with this critical part of Internet infrastructure and still has as high as $5 billion to invest in to connect Gram panchayats over fiber even when there’s no electricity to quite a few of them. The prime problem for now is that there are SOOOOO many Govt. departments dealing with “problem” that they themselves constitute a significant part of “problem”. There is terrible co-ordination between all these Govt. bodies & companies.

List of Govt. bodies involved in telecommunications:

So many departments. BSNL holds domestic fiber everywhere except metros where Govt. replies on MTNL. Then we have PowerGrid which puts fibers along with power lines and so does RailTel which does with Railway lines. I wonder why this work can’t be done via single body BSNL alone? Railtel has ambition of building Nation wide broadband network via RailWire project. I thought Govt. relied on BSNL’s 4.5lakh exchanges with over 100 pair of fiber capable of running 10G DWDM pipes for that!

Then on top of that we have NIXI which is all different i.e an IXP and not ISP and it has no direct relation with BSNL or other fiber holding bodies. STPI i.e State Technology Parks of India itself sounds like a funny name for an ISP but it exists and actually is more popular for layer 2 circuits nearby NIXI exchanges. NIC works to hold datacenters for Govt. websites (BSNL or none of other previous bodies have no clue how to run datacenter?) and then we get NKN which is running MPLS over BSNL+RailTel+Powergrid to provide 1Gbps connectivity to IITs + IIM’s + NIT’s etc. And if you are from a private small state college like mine - you can’t do much other then writing blog posts like these to yell out result frustration after years! :)

With hope we will have some better policies and governance, private sector will do way too better then these Govt. bodies. Time for me to get back to my work! :)