04 Jun

F-root DNS node back up in Chennai!

And finally ACN i.e “Advanced Computer Networks” exam next. Hopefully less to cram in this one and syllabus is pretty interesting. 

 

Talking about networks – I am very happy to post this update. Finally F root server’s node in Chennai is back up! 

Though ISC did not updated me about this development but anyways I can always assume they were busy in hitting head with India bureaucratic bodies. 🙂

If you are following my blog, you might have seen my past blog post about “Broken connectivity of F root server” due to NIXI’s routing policies. When I informed ISC (root server operator for F root) about it, they took down the Indian anycasting instance in order to work on fix. 

 

Quoting publically available reply from Mr Leo from ISC on RIPE mailing list in response to my original post about routing glitch:

ISC has experienced a number of routing challenges at our Chennai node over the past year resulting in problems such as this one. We’ve been working with APNIC (the node Sponsor) and NIXI (the node host) to get these resolved. We’ve found the Indian market to be rather unique in the way various large ISP’s chose to announce (or not announce, as the case may be) their routes at various exchanges. In this particular case it appears that the traffic is making it to our Chennai node, but not making it back.

I will work with Anurag directly in the ticket to get this issue resolved. If anyone else is seeing issues in India please mail noc at isc.org to open a ticket. The more reports we have of problems the more attention we can get this issue with the various parties involved.

 

 

Here’s updated route:

anurag:~ anurag$ traceroute -a f.root-servers.net
traceroute to f.root-servers.net (192.5.5.241), 64 hops max, 52 byte packets
1 [AS65534] router.home (10.10.0.1) 1.809 ms 0.915 ms 0.981 ms
2 [AS9829] 117.222.224.1 (117.222.224.1) 17.017 ms 18.209 ms 18.170 ms
3 [AS9829] 218.248.169.126 (218.248.169.126) 25.560 ms 29.347 ms 26.233 ms
4 [AS9829] 218.248.250.86 (218.248.250.86) 87.839 ms 86.766 ms 86.485 ms
5 [AS0] 218.100.48.142 (218.100.48.142) 97.747 ms 104.656 ms 98.620 ms
6 [AS55440] f.root-servers.net (192.5.5.241) 97.615 ms 97.153 ms 97.531 ms
anurag:~ anurag$

 

97 ms latency seems OK. Likely should be slightly less but I assume that’s BSNL problem (return path via Airtel) rather then NIXI or F root’s issue. 

I guess Indian instance has been brought up somewhere in mid of April last month (20 days back). I can guess that based on data collected from RIPE Atlas Probe #1032 hosted at my home network along with BGP session details at NIXI’s route server in Chennai.

 

Screen Shot 2013-06-04 at 8.56.52 PM

 

 

It seems like ISC team has done a good job and brought back the node with full connectivity. I can’t blame them for time since it involved Indian bureaucracy. 

 

Going into some of technical details…

 

Original Problem

The original problem with this node was because NIXI enforces regional route policy. According to that ISPs are forced to announce regional routes and they have option of announcing or not announcing other region routes. E.g BSNL was participating at NIXI Chennai and announcing it’s South Indian prefixes only and not BSNL Haryana (in North of India) prefixes. Now since BSNL was participating at NIXI and had a BGP session with NIXI’s route server on AS24029, it was getting routes to F root while in return it was NOT announcing BSNL Haryana’s prefixes. 

 

 

There could be multiple solution to this problem

  1. ISP’s like BSNL start announcing all prefixes at all NIXI. (Hard because they won’t let others to “use their backbone” by just peering) 
  2. ISP’s like BSNL peer directly with ISC’s router at Chennai and pass their entire table. (Bit hard since number of ISPs & hard to connivence them)
  3. ISC adds transit link defaulting on to transit interface for any non-available routes. It does NOT means announcing F root server prefixes to transit but rather using transit as “default” incase of partial connectivity. 

 

 

I think ISC went with #3. F root server uses IP address 192.5.5.241 which comes from 192.5.5.0/24 subnet announced by AS3557. In India AS3557 sits under another AS24049 of ISC and router on AS24049 connects to NIXI’s shared peering switch on IP 218.100.48.142. 

There was exactly same problem with Netnod’s i root server in Mumbai and they also brought it down after I informed them last year (blog post here). It still seems down. Let’s hope it will be up again very soon.

 

So that’s all about it. Back to studies for now…

 

Disclaimer: I work for PCH and part of what we do involves backend DNS hosting. My involvement here is completely out of personal interest about infrastructure in India and has nothing to do with my employeer. All comments are completely personal.

06 Oct

F root server, Chennai down from 5 months. Who cares?

Time for a quick followup blog post. On 26th April of this year I blogged about broken connectivity of F root server which was hosted in NIXI Chennai. Apart from that blog post, I did informed ISC which operates F root (NIXI was host on behalf of them in India). In my open email on APNIC mailing list, I got a reply from Network Operations Center of ISC that they will verify and will take necessary action. Within 48 hours of that email they figured out root cause and since they couldn’t fix it right at that point, they pulled plug off from that root server.

This was one of 3 Global root DNS servers hosted in India. I am sad to post that till date they have not been able to turn server back live. No blame to ISC but this is how serious Indian bodies are about internet and infrastructure.
 

My current traceroute to F root:

traceroute to f.root-servers.net (192.5.5.241), 30 hops max, 60 byte packets
1 router.local (10.0.0.1) [AS1] 0.969 ms 1.168 ms 1.488 ms
2 117.206.176.1 (117.206.176.1) [AS9829] 19.203 ms 20.001 ms 22.286 ms
3 218.248.169.122 (218.248.169.122) [AS9829] 26.905 ms 28.801 ms 29.490 ms
4 115.114.131.137.static-mumbai.vsnl.net.in (115.114.131.137) [AS4755] 64.299 ms 66.175 ms 68.068 ms
5 172.31.16.193 (172.31.16.193) [*] 95.702 ms 172.31.19.145 (172.31.19.145) [*] 96.813 ms 98.304 ms
6 ix-4-2.tcore1.CXR-Chennai.as6453.net (180.87.36.9) [*] 304.038 ms 280.526 ms 280.544 ms
7 if-3-3.tcore2.CXR-Chennai.as6453.net (180.87.36.6) [*] 330.969 ms if-5-2.tcore1.SVW-Singapore.as6453.net (180.87.12.53) [*] 327.010 ms if-3-3.tcore2.CXR-Chennai.as6453.net (180.87.36.6) [*] 333.282 ms
8 if-5-2.tcore2.SVW-Singapore.as6453.net (180.87.15.69) [*] 319.188 ms if-2-2.tcore2.SVW-Singapore.as6453.net (180.87.12.2) [*] 319.458 ms if-5-2.tcore2.SVW-Singapore.as6453.net (180.87.15.69) [*] 341.489 ms
9 Vlan1870.icore1.HK2-HongKong.as6453.net (180.87.15.61) [*] 339.646 ms Vlan1850.icore1.HK2-HongKong.as6453.net (180.87.15.18) [*] 337.416 ms Vlan1779.icore1.HK2-HongKong.as6453.net (180.87.15.38) [*] 338.317 ms
10 isc2-FE.hkix.net (202.40.161.200) [AS2687/AS4862/AS9498/AS10026/AS1221] 340.247 ms 339.589 ms 344.179 ms
11 f.root-servers.net (192.5.5.241) [AS55440/AS3557/AS23708/AS8167] 340.218 ms 341.172 ms 341.604 ms

 

So I am still hitting Hong Kong.

Please note ultra high latency here is due to usual old problem of BSNL that they have broken return paths. We can see that as soon as traffic is handed over to AS6453 on hop 6, there is a huge spike in latency. Since AS6453 – Tata has a publically available looking glass, I can traceroute back to my IP from there and see the path:

 

Router: gin-cfo-core1
Site: IN, Chennai – CFO, VSNL
Command: traceroute ip 117.206.184.217

Tracing the route to 117.206.184.217

1 if-11-0-2-0.tcore1.CXR-Chennai.as6453.net (180.87.36.26) [MPLS: Label 613458 Exp 0] 268 msec
if-1-0-0-0.tcore1.CXR-Chennai.as6453.net (180.87.36.13) [MPLS: Label 613458 Exp 0] 252 msec
if-1-3-0-0.tcore1.CXR-Chennai.as6453.net (180.87.36.17) [MPLS: Label 613458 Exp 0] 280 msec
2 if-7-2.tcore1.MLV-Mumbai.as6453.net (180.87.36.33) [MPLS: Label 508693 Exp 0] 248 msec 304 msec
if-3-3.tcore2.CXR-Chennai.as6453.net (180.87.36.6) [MPLS: Label 557305 Exp 0] 400 msec
3 if-9-2.tcore2.MLV-Mumbai.as6453.net (180.87.37.10) [MPLS: Label 320866 Exp 0] 412 msec 404 msec 404 msec
4 if-6-2.tcore1.L78-London.as6453.net (80.231.130.5) [MPLS: Label 731443 Exp 0] 400 msec 248 msec
if-2-2.tcore2.WYN-Marseille.as6453.net (80.231.217.2) [MPLS: Label 404482 Exp 0] 244 msec
5 if-2-2.tcore2.L78-London.as6453.net (80.231.131.1) [MPLS: Label 515300 Exp 0] 256 msec 256 msec 256 msec
6 if-20-2.tcore2.NYY-NewYork.as6453.net (216.6.99.13) [MPLS: Label 300800 Exp 0] 260 msec 268 msec 260 msec
7 if-9-0-0-19.mcore4.NYY-NewYork.as6453.net (209.58.60.149) 252 msec 252 msec 252 msec
8 ix-14-2.mcore4.NYY-NewYork.as6453.net (64.86.71.58) 484 msec 476 msec 480 msec
9 218.248.255.109 [AS 9829] 500 msec 612 msec 604 msec
10 218.248.169.121 [AS 9829] 624 msec 504 msec 504 msec
11 218.248.169.121 [AS 9829] 500 msec 504 msec 504 msec
12 * * *
13 * * *
14 * * *
15 * * *

 

So path is like Chennai > Mumbai > London > New York > back to India BSNL. This is completely due to negligence of BSNL. They are doing BGP announcement only at New York which is why India to Hong Kong packets go straight but return is via New York taking latency super high. Anyways this is separate issue on it’s own. Coming back on main issue of this post i.e F root server – it is yet not up and things are still “moving” but slowly. 

 

Looking at last week latency to F root server from home hosted RIPE NCC probe:

 

 

What exactly was cause of problem?

The cause of problem was forced MLP and regional only MLP. Here’s the exact NIXI’s policy which says:

An ISP at any NIXI node must at a minimum announce all its regional routes to the NIXI router at that NIXI location. All ISPs connecting to that NIXI node are entitled to receive these routes using a single BGP session with the NIXI router. This will guarantee the exchange of regional traffic within a NIXI node. This is referred to as forced regional multi-lateral peering under the policy.

 

Now ISC was running F root server without any transit and was relying completely on peering sessions in Chennai region. If you recall at that time problem effecting few networks only. For networks like Sify, IDEA Cellular it was all running well while for BSNL it was failing. The reason is when ISPs like BSNL participate at NIXI, they announce ONLY regional routes. So BSNL was getting BGP announcement of ISC which was sitting below NIXI Chennai router, while BSNL itself was announcing prefixes only at New Delhi (closest to Haryana) exchange and not at Chennai exchange. Since node was without any transit, it was not able to reach BSNL users outside Chennai at all (and so does for many other big ISPs). As of now ISC is working on deal with NIXI to get a basic transit pipe from STPI (well another Govt. ISP). Since it will be transit pipe, it will provide full global routing table feed including BSNL Haryana and other routes. 

 

This is truly an absurd that Indian Govt. is terribly slow with this critical part of Internet infrastructure and still has as high as $5 billion to invest in to connect Gram panchayats over fiber even when there’s no electricity to quite a few of them. The prime problem for now is that there are SOOOOO many Govt. departments dealing with “problem” that they themselves constitute a significant part of “problem”. There is terrible co-ordination between all these Govt. bodies & companies.

 

List of Govt. bodies involved in telecommunications:

 

So many departments. BSNL holds domestic fiber everywhere except metros where Govt. replies on MTNL. Then we have PowerGrid which puts fibers along with power lines and so does RailTel which does with Railway lines. I wonder why this work can’t be done via single body BSNL alone? Railtel has ambition of building Nation wide broadband network via RailWire project. I thought Govt. relied on BSNL’s 4.5lakh exchanges with over 100 pair of fiber capable of running 10G DWDM pipes for that!

Then on top of that we have NIXI which is all different i.e an IXP and not ISP and it has no direct relation with BSNL or other fiber holding bodies. STPI i.e State Technology Parks of India itself sounds like a funny name for an ISP but it exists and actually is more popular for layer 2 circuits nearby NIXI exchanges. NIC works to hold datacenters for Govt. websites (BSNL or none of other previous bodies have no clue how to run datacenter?) and then we get NKN which is running MPLS over BSNL+RailTel+Powergrid to provide 1Gbps connectivity to IITs + IIM’s + NIT’s etc. And if you are from a private small state college like mine – you can’t do much other then writing blog posts like these to yell out result frustration after years! 🙂
 

With hope we will have some better policies and governance, private sector will do way too better then these Govt. bodies. Time for me to get back to my work! 🙂