28 Oct

Akamai CDN and DNS resolution analysis

These days Open DNS resolvers are getting quite popular. With Open DNS resolver I mean resolvers including OpenDNS as well as Google Public DNS.

One of major issues these resolvers suffer is failure of integration with CDN providers like Akamai, Limelight etc. In this post I will analyse sample client site of Akamai – Malaysia Airlines website – http://www.malaysiaairlines.com.  

 

Looking at OpenDNS, Google Public DNS and my ISP (BSNL’s) DNS resolver for its DNS records:

OpenDNS 

;; QUESTION SECTION:
;www.malaysiaairlines.com. IN A

;; ANSWER SECTION:
www.malaysiaairlines.com. 12169 IN CNAME www.malaysiaairlines.com.edgesuite.net.
www.malaysiaairlines.com.edgesuite.net. 12169 IN CNAME a1456.b.akamai.net.
a1456.b.akamai.net. 20 IN A 125.252.225.158
a1456.b.akamai.net. 20 IN A 125.252.225.151

 

Google Public DNS

;; QUESTION SECTION:
;www.malaysiaairlines.com. IN A

;; ANSWER SECTION:
www.malaysiaairlines.com. 12312 IN CNAME www.malaysiaairlines.com.edgesuite.net.
www.malaysiaairlines.com.edgesuite.net. 12318 IN CNAME a1456.b.akamai.net.
a1456.b.akamai.net. 10 IN A 58.27.22.154
a1456.b.akamai.net. 10 IN A 58.27.22.138

 

BSNL’s DNS resolver

;; QUESTION SECTION:
;www.malaysiaairlines.com. IN A

;; ANSWER SECTION:
www.malaysiaairlines.com. 20410 IN CNAME www.malaysiaairlines.com.edgesuite.net.
www.malaysiaairlines.com.edgesuite.net. 20410 IN CNAME a1456.b.akamai.net.
a1456.b.akamai.net. 20 IN A 117.239.141.35
a1456.b.akamai.net. 20 IN A 117.239.141.10

 

Notice different IP’s coming when asked from different DNS resolvers. 

OpenDNS passes me 125.252.225.151 which is announced by Singtel in Singapore.
Google passes me  58.27.22.154 which is announced by Tmnet in Malaysia.
BSNL’s DNS resolver passes me  117.239.141.35 announced by BSNL-NIB itself is within India (yay!) 🙂

This results in latency of 300ms for www.malaysiaairlines.com when using OpenDNS & Google while 60ms when using ISP’s default resolver

 

How and why this is happening?

The answer lies on underlying DNS layer which is doing this magic. In all cases www.malaysiaairlines.com. is a cname (alias record) to www.malaysiaairlines.com.edgesuite.net.  Further www.malaysiaairlines.com.edgesuite.net. is a cname to a1456.b.akamai.net. Real magic comes here – “b.akamai.net.” itself is a DNS zone. Let’s look at this zone from all 3 DNS resolvers:

 

anurag@laptop:/$ dig b.akamai.net. ns +short @208.67.222.222
n6b.akamai.net.
n7b.akamai.net.
n1b.akamai.net.
n2b.akamai.net.
n4b.akamai.net.
n3b.akamai.net.
n5b.akamai.net.
n0b.akamai.net.

anurag@laptop:/$ dig b.akamai.net. ns +short @8.8.8.8
n1b.akamai.net.
n4b.akamai.net.
n8b.akamai.net.
n3b.akamai.net.
n2b.akamai.net.
n6b.akamai.net.
n5b.akamai.net.
n0b.akamai.net.
n7b.akamai.net.

anurag@laptop:/$ dig b.akamai.net. ns +short @10.0.0.1
n0b.akamai.net.
n1b.akamai.net.
n2b.akamai.net.
n3b.akamai.net.
n4b.akamai.net.
n5b.akamai.net.
n6b.akamai.net.
n7b.akamai.net.
n8b.akamai.net.

 

All identical names. Let’s pick one randomly and analyse:

n0b.akamai.net

 

anurag@laptop:/$ dig n0b.akamai.net a @208.67.222.222 +short
124.155.223.36

anurag@laptop:/$ dig n0b.akamai.net a @8.8.8.8 +short
202.175.5.150

anurag@laptop:/$ dig n0b.akamai.net a @10.0.0.1 +short
124.124.201.156

 

All different IPs!
At this stage everything seems very confusing.

 

Let’s revise what we have till now

www.malaysiaairlines.com. is CNAME to www.malaysiaairlines.com.edgesuite.net. and www.malaysiaairlines.com.edgesuite.net. is cname to a1456.b.akamai.net. Now a1456.b.akamai.net. is a absolute hostname under DNS zone “b.akamai.net” which is giving different IPs when checked from different DNS resolvers. b.akamai.net DNS zones has several DNS servers and I randomly pick one of them n0b.akamai.net. We see n0b.akamai.net itself gives different A records and thus I am going back to parent zone which is akamai.net to further find how this is happening.

 

Let’s see DNS servers of akamai.net:

To avoid further confusion due to interesting DNS lookups, let’s use whois record of akamai.net domain to see what authoritative DNS servers it is using rather then a DNS query:

anurag@laptop:~$ whois akamai.net

Whois Server Version 2.0

Domain names in the .com and .net domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.

Domain Name: AKAMAI.NET
Registrar: TUCOWS.COM CO.
Whois Server: whois.tucows.com
Referral URL: http://domainhelp.opensrs.net
Name Server: NS1-1.AKAMAITECH.NET
Name Server: NS2-193.AKAMAITECH.NET
Name Server: NS3-193.AKAMAITECH.NET
Name Server: NS4-193.AKAMAITECH.NET
Name Server: NS5-193.AKAMAITECH.NET
Name Server: NS6-193.AKAMAITECH.NET
Name Server: NS7-193.AKAMAITECH.NET
Name Server: ZC.AKAMAITECH.NET
Name Server: ZD.AKAMAITECH.NET
Name Server: ZE.AKAMAITECH.NET
Name Server: ZG.AKAMAITECH.NET
Name Server: ZH.AKAMAITECH.NET
Name Server: ZI.AKAMAITECH.NET
Status: clientTransferProhibited
Status: clientUpdateProhibited
Updated Date: 18-jun-2012
Creation Date: 03-mar-1999
Expiration Date: 03-mar-2022

>>> Last update of whois database: Sun, 28 Oct 2012 16:56:03 UTC <<<

 

Now again let’s pick one randomly – NS1-1.AKAMAITECH.NET and see what it tells us for hostname “n0b.akamai.net” 

 

anurag@laptop:~$ dig @NS1-1.AKAMAITECH.NET n0b.akamai.net +short
123.201.147.5

 

 

Wow! Akamai’s DNS setup can make a boring Sunday evening very interesting. 😉

 

Now since NS1-1.AKAMAITECH.NET. itself is on a different domain name (and so different DNS zone), let’s do bit more effort to get to the core of it. NS1-1.AKAMAITECH.NET. is simply an A record on DNS servers of AKAMAITECH.NET. zone.

 

Let’s look at that zone now:

anurag@laptop:/$ dig AKAMAITECH.NET ns +short
zh.AKAMAITECH.NET.
ns3-193.AKAMAITECH.NET.
ns2-193.AKAMAITECH.NET.
zm-1.AKAMAITECH.NET.
zg.AKAMAITECH.NET.
zb.AKAMAITECH.NET.
ze.AKAMAITECH.NET.
zf.AKAMAITECH.NET.
ns5-193.AKAMAITECH.NET.
zd.AKAMAITECH.NET.
zi.AKAMAITECH.NET.
ns4-193.AKAMAITECH.NET.
za.AKAMAITECH.NET.
zc.AKAMAITECH.NET.

 

Again, let’s pick – zh.AKAMAITECH.NET. and query for NS1-1.AKAMAITECH.NET.

anurag@laptop:/$ dig NS1-1.AKAMAITECH.NET. @zh.AKAMAITECH.NET.  +short
193.108.88.1

Finally some consistent result (YAY!). So is server with IP 193.108.88.1 playing game? Remember in 2nd last step this server was giving different IPs for hostname NS1-1.AKAMAITECH.NET. I SMELL ANYCASTING! 🙂

Let’s do a traceroute to 193.108.88.1 from my location (BSNL Haryana), Airtel Delhi node & my Europe server (where this blog is hosted!):

 

BSNL

traceroute to 193.108.88.1 (193.108.88.1), 30 hops max, 60 byte packets
1 10.0.0.1 (10.0.0.1) [AS1] 0.644 ms 1.022 ms 1.150 ms
2 117.220.160.1 (117.220.160.1) [AS9829] 19.467 ms 20.335 ms 21.824 ms
3 218.248.169.122 (218.248.169.122) [AS9829] 27.180 ms 29.092 ms 30.510 ms
4 115.254.1.138 (115.254.1.138) [AS18101] 61.354 ms 63.244 ms 64.209 ms
5 115.255.239.53 (115.255.239.53) [AS18101] 68.160 ms 68.907 ms 69.847 ms
6 115.248.226.21 (115.248.226.21) [AS18101] 72.336 ms 54.497 ms 54.633 ms
7 203.101.100.213 (203.101.100.213) [AS9498/AS7617] 80.766 ms 82.390 ms 83.732 ms
8 AES-Static-010.194.22.125.airtel.in (125.22.194.10) [AS24560/AS9498] 87.199 ms 88.580 ms 90.314 ms
9 * * *
10 * * *

 

Europe server

traceroute to 193.108.88.1 (193.108.88.1), 30 hops max, 60 byte packets
1 gw.giga-dns.com (91.194.90.1) [AS51167] 0.639 ms 0.637 ms 0.623 ms
2 host-93-104-204-33.customer.m-online.net (93.104.204.33) [AS8767] 0.600 ms 0.592 ms 0.585 ms
3 xe-1-1-0.rt-decix-2.m-online.net (82.135.16.102) [AS8767] 7.784 ms 7.740 ms 7.727 ms
4 xe-1-1-0.rt-decix-2.m-online.net (82.135.16.102) [AS8767] 7.464 ms 7.461 ms 7.452 ms
5 decix-fra6.netarch.akamai.com (80.81.192.28) [AS6695] 8.434 ms 8.916 ms 8.407 ms
6 * * *
7 * * *
8 * * *

 

Here we go! Surely anycasting. 193.108.88.1 is coming from prefix 193.108.88.0/24 announced by Akamai AS21342 announced at different locations.

 

Summary:

Let’s go in forward mode now:

Akamai CDN provider has a interesting DNS setup with mix of anycasting DNS servers where “edge servers” carry different A record for a given hostname. E.g at core Akamai has set of anycasted DNS servers like zh.AKAMAITECH.NET which hold A record for another set of DNS servers like NS1-1.AKAMAITECH.NET. which act as DNS server for akamai.net domain name. Next, these DNS servers hold different values for another set of DNS servers like n0b.akamai.net which are hold the delegation for a subzone like b.akamai.net which holds the hostname like a1456.b.akamai.net to which hostnames like www.malaysiaairlines.com.edgesuite.net. point to! 🙂 

 

Why Akamai is having such complex setup?

My strong guess here is that multiple zones and cross dependency here is simply to spread load and avoid single point failure. The important thing here is that at core of DNS Akamai uses anycasting but for serving content from these web servers there’s no anycasting. E.g I am getting IP 117.239.141.10 for Akamai’s client site why is a unicated IP from BSNL 117.239.128.0/20 prefix announcement. Akamai is NOT using anycasting on edge distribution and my strong guess for that is that it’s way too easy for Akamai to manage things in current rather then putting caching servers on anycasting IPs. E.g if in current situation Akamai node on BSNL is choked up, they can simply distribute traffic by modifying DNS server to pass A record to BSNL 1 out of 4 times and rest of time pass the IP of caching node on Airtel. In case of anycasting that is not possible. It will simply follow short AS/hop path and distribution of load partially is not possible. Again that’s my guess. 🙂

Time for me to change DNS resolver in my router now! 

11 Oct

i root server Mumbai node offline

Super dull time here. No classes going on due to “TCS Placement session” at college and this makes me to sit in my room most of time of my day. 

Yesterday I tested connectivity to all 13 Global Root DNS Servers and found i root was giving issue.

Here’s a my yesterday’s traceroute to i root: 

traceroute to i.root-servers.net. (192.36.148.17), 30 hops max, 60 byte packets
1 router.local (10.0.0.1) 1.470 ms 1.965 ms 2.452 ms
2 117.200.48.1 (117.200.48.1) 26.030 ms 28.857 ms 31.243 ms
3 218.248.173.46 (218.248.173.46) 34.673 ms 37.091 ms 41.025 ms
4 218.248.246.130 (218.248.246.130) 72.853 ms 75.272 ms 77.959 ms
5 * * *
6 * * *

 

Since i root is another root server hosted within India by NIXI, I was quite sure this was issue again due to NIXI’s regional route enforcement policy along with missing transit link on i root. You can see my last blog post about same issue with F root here.

What was happening here was that Swedish provider Netnod had a anycasting node of i root server at NIXI Mumbai. Netnod uses IP 192.36.148.17 from 192.36.148.0/24 subnet announced by their AS 29216. In current setup Netnod router was connected to NIXI’s Mumbai subnet and was announcing that prefix. Thus all providers including BSNL were getting prefix in their routing table and hence there was a forward path from BSNL to Netnod Mumbai router.
But since ISPs like BSNL are forced to announce regional routes only, BSNL was NOT announcing their prefixes uses in Haryana at Mumbai (they do it at nearest regional exchange which is NIXI Noida) and thus Netnod router was not having any return path. This is true for many other big Indian providers who participate at more then one NIXI.

 

I informed Netnod Network Operation Center about the issue and they acted promptly by taking Mumbai anycasting instance down. As per their last email to me, they are keeping root server instance down unless they figure out what can be done to prevent this problem.

It is important to note here that if a node is taken down in anycasting that is fine since traffic is routed to other nearest node but keeping a faulty node damages.

 

Here’s my updated traceroute:

traceroute to 192.36.148.17 (192.36.148.17), 30 hops max, 60 byte packets
1 router.local (10.0.0.1) 1.486 ms 1.965 ms 2.472 ms
2 117.200.48.1 (117.200.48.1) 26.766 ms 30.029 ms 32.558 ms
3 218.248.173.38 (218.248.173.38) 83.640 ms 83.920 ms 84.336 ms
4 115.114.57.165.static-Mumbai.vsnl.net.in (115.114.57.165) 92.011 ms 92.447 ms 92.964 ms
5 ix-0-100.tcore2.MLV-Mumbai.as6453.net (180.87.39.25) 85.625 ms 88.078 ms 90.528 ms
6 180.87.39.58 (180.87.39.58) 227.061 ms 236.796 ms 237.210 ms
7 195.229.3.193 (195.229.3.193) 238.669 ms 196.731 ms 197.479 ms
8 195.229.2.67 (195.229.2.67) 205.832 ms 207.994 ms 210.133 ms
9 195.229.27.22 (195.229.27.22) 204.067 ms 206.465 ms 208.859 ms
10 80.88.240.121 (80.88.240.121) 211.274 ms 213.719 ms 216.668 ms
11 80.88.241.170 (80.88.241.170) 223.069 ms 224.352 ms 225.494 ms
12 i.root-servers.net (192.36.148.17) 227.769 ms 229.160 ms 231.765 ms

 

With this, India has lost I root server along with F root for time being unless Netnod is able to workout with NIXI on this. Good luck to last one i.e K root in Delhi to handle the load! 🙂

06 Oct

F root server, Chennai down from 5 months. Who cares?

Time for a quick followup blog post. On 26th April of this year I blogged about broken connectivity of F root server which was hosted in NIXI Chennai. Apart from that blog post, I did informed ISC which operates F root (NIXI was host on behalf of them in India). In my open email on APNIC mailing list, I got a reply from Network Operations Center of ISC that they will verify and will take necessary action. Within 48 hours of that email they figured out root cause and since they couldn’t fix it right at that point, they pulled plug off from that root server.

This was one of 3 Global root DNS servers hosted in India. I am sad to post that till date they have not been able to turn server back live. No blame to ISC but this is how serious Indian bodies are about internet and infrastructure.
 

My current traceroute to F root:

traceroute to f.root-servers.net (192.5.5.241), 30 hops max, 60 byte packets
1 router.local (10.0.0.1) [AS1] 0.969 ms 1.168 ms 1.488 ms
2 117.206.176.1 (117.206.176.1) [AS9829] 19.203 ms 20.001 ms 22.286 ms
3 218.248.169.122 (218.248.169.122) [AS9829] 26.905 ms 28.801 ms 29.490 ms
4 115.114.131.137.static-mumbai.vsnl.net.in (115.114.131.137) [AS4755] 64.299 ms 66.175 ms 68.068 ms
5 172.31.16.193 (172.31.16.193) [*] 95.702 ms 172.31.19.145 (172.31.19.145) [*] 96.813 ms 98.304 ms
6 ix-4-2.tcore1.CXR-Chennai.as6453.net (180.87.36.9) [*] 304.038 ms 280.526 ms 280.544 ms
7 if-3-3.tcore2.CXR-Chennai.as6453.net (180.87.36.6) [*] 330.969 ms if-5-2.tcore1.SVW-Singapore.as6453.net (180.87.12.53) [*] 327.010 ms if-3-3.tcore2.CXR-Chennai.as6453.net (180.87.36.6) [*] 333.282 ms
8 if-5-2.tcore2.SVW-Singapore.as6453.net (180.87.15.69) [*] 319.188 ms if-2-2.tcore2.SVW-Singapore.as6453.net (180.87.12.2) [*] 319.458 ms if-5-2.tcore2.SVW-Singapore.as6453.net (180.87.15.69) [*] 341.489 ms
9 Vlan1870.icore1.HK2-HongKong.as6453.net (180.87.15.61) [*] 339.646 ms Vlan1850.icore1.HK2-HongKong.as6453.net (180.87.15.18) [*] 337.416 ms Vlan1779.icore1.HK2-HongKong.as6453.net (180.87.15.38) [*] 338.317 ms
10 isc2-FE.hkix.net (202.40.161.200) [AS2687/AS4862/AS9498/AS10026/AS1221] 340.247 ms 339.589 ms 344.179 ms
11 f.root-servers.net (192.5.5.241) [AS55440/AS3557/AS23708/AS8167] 340.218 ms 341.172 ms 341.604 ms

 

So I am still hitting Hong Kong.

Please note ultra high latency here is due to usual old problem of BSNL that they have broken return paths. We can see that as soon as traffic is handed over to AS6453 on hop 6, there is a huge spike in latency. Since AS6453 – Tata has a publically available looking glass, I can traceroute back to my IP from there and see the path:

 

Router: gin-cfo-core1
Site: IN, Chennai – CFO, VSNL
Command: traceroute ip 117.206.184.217

Tracing the route to 117.206.184.217

1 if-11-0-2-0.tcore1.CXR-Chennai.as6453.net (180.87.36.26) [MPLS: Label 613458 Exp 0] 268 msec
if-1-0-0-0.tcore1.CXR-Chennai.as6453.net (180.87.36.13) [MPLS: Label 613458 Exp 0] 252 msec
if-1-3-0-0.tcore1.CXR-Chennai.as6453.net (180.87.36.17) [MPLS: Label 613458 Exp 0] 280 msec
2 if-7-2.tcore1.MLV-Mumbai.as6453.net (180.87.36.33) [MPLS: Label 508693 Exp 0] 248 msec 304 msec
if-3-3.tcore2.CXR-Chennai.as6453.net (180.87.36.6) [MPLS: Label 557305 Exp 0] 400 msec
3 if-9-2.tcore2.MLV-Mumbai.as6453.net (180.87.37.10) [MPLS: Label 320866 Exp 0] 412 msec 404 msec 404 msec
4 if-6-2.tcore1.L78-London.as6453.net (80.231.130.5) [MPLS: Label 731443 Exp 0] 400 msec 248 msec
if-2-2.tcore2.WYN-Marseille.as6453.net (80.231.217.2) [MPLS: Label 404482 Exp 0] 244 msec
5 if-2-2.tcore2.L78-London.as6453.net (80.231.131.1) [MPLS: Label 515300 Exp 0] 256 msec 256 msec 256 msec
6 if-20-2.tcore2.NYY-NewYork.as6453.net (216.6.99.13) [MPLS: Label 300800 Exp 0] 260 msec 268 msec 260 msec
7 if-9-0-0-19.mcore4.NYY-NewYork.as6453.net (209.58.60.149) 252 msec 252 msec 252 msec
8 ix-14-2.mcore4.NYY-NewYork.as6453.net (64.86.71.58) 484 msec 476 msec 480 msec
9 218.248.255.109 [AS 9829] 500 msec 612 msec 604 msec
10 218.248.169.121 [AS 9829] 624 msec 504 msec 504 msec
11 218.248.169.121 [AS 9829] 500 msec 504 msec 504 msec
12 * * *
13 * * *
14 * * *
15 * * *

 

So path is like Chennai > Mumbai > London > New York > back to India BSNL. This is completely due to negligence of BSNL. They are doing BGP announcement only at New York which is why India to Hong Kong packets go straight but return is via New York taking latency super high. Anyways this is separate issue on it’s own. Coming back on main issue of this post i.e F root server – it is yet not up and things are still “moving” but slowly. 

 

Looking at last week latency to F root server from home hosted RIPE NCC probe:

 

 

What exactly was cause of problem?

The cause of problem was forced MLP and regional only MLP. Here’s the exact NIXI’s policy which says:

An ISP at any NIXI node must at a minimum announce all its regional routes to the NIXI router at that NIXI location. All ISPs connecting to that NIXI node are entitled to receive these routes using a single BGP session with the NIXI router. This will guarantee the exchange of regional traffic within a NIXI node. This is referred to as forced regional multi-lateral peering under the policy.

 

Now ISC was running F root server without any transit and was relying completely on peering sessions in Chennai region. If you recall at that time problem effecting few networks only. For networks like Sify, IDEA Cellular it was all running well while for BSNL it was failing. The reason is when ISPs like BSNL participate at NIXI, they announce ONLY regional routes. So BSNL was getting BGP announcement of ISC which was sitting below NIXI Chennai router, while BSNL itself was announcing prefixes only at New Delhi (closest to Haryana) exchange and not at Chennai exchange. Since node was without any transit, it was not able to reach BSNL users outside Chennai at all (and so does for many other big ISPs). As of now ISC is working on deal with NIXI to get a basic transit pipe from STPI (well another Govt. ISP). Since it will be transit pipe, it will provide full global routing table feed including BSNL Haryana and other routes. 

 

This is truly an absurd that Indian Govt. is terribly slow with this critical part of Internet infrastructure and still has as high as $5 billion to invest in to connect Gram panchayats over fiber even when there’s no electricity to quite a few of them. The prime problem for now is that there are SOOOOO many Govt. departments dealing with “problem” that they themselves constitute a significant part of “problem”. There is terrible co-ordination between all these Govt. bodies & companies.

 

List of Govt. bodies involved in telecommunications:

 

So many departments. BSNL holds domestic fiber everywhere except metros where Govt. replies on MTNL. Then we have PowerGrid which puts fibers along with power lines and so does RailTel which does with Railway lines. I wonder why this work can’t be done via single body BSNL alone? Railtel has ambition of building Nation wide broadband network via RailWire project. I thought Govt. relied on BSNL’s 4.5lakh exchanges with over 100 pair of fiber capable of running 10G DWDM pipes for that!

Then on top of that we have NIXI which is all different i.e an IXP and not ISP and it has no direct relation with BSNL or other fiber holding bodies. STPI i.e State Technology Parks of India itself sounds like a funny name for an ISP but it exists and actually is more popular for layer 2 circuits nearby NIXI exchanges. NIC works to hold datacenters for Govt. websites (BSNL or none of other previous bodies have no clue how to run datacenter?) and then we get NKN which is running MPLS over BSNL+RailTel+Powergrid to provide 1Gbps connectivity to IITs + IIM’s + NIT’s etc. And if you are from a private small state college like mine – you can’t do much other then writing blog posts like these to yell out result frustration after years! 🙂
 

With hope we will have some better policies and governance, private sector will do way too better then these Govt. bodies. Time for me to get back to my work! 🙂