06 Oct

K root server – Noida anycast and updates

K root in Noida seems to be not getting enough traffic from quite sometime and connectivity does seems bit broken. This is a blog post following up to Dyn’s excellent and detailed post about how TIC leaked the world famous address space used by AS25152. It was good to read this post from RIPE NCC written by my friend Emile (and thanks to him for crediting me to signal about traffic hitting outside!)

The route leak…

TIC AS48159 was supposed to keep the route within it’s IGP but it leaked it to Omantel AS8529 – a large International backbone which propagated route leak further to global table. It was mistake at by both players primarily by TIC for leaking route.
If we look at IPv4 route propagation graph of Omatel AS8529 on Hurricane Electric BGP tool kit, it shows two import ASNs:
Omantel IPv4 routing
This has AS9498 (Bharti Airtel) and AS6453 (Tata Communications). Both of these are extremely important networks and two of large International and domestic IP transit providers in India. Very likely Omantel is customer of Bharti Airtel and if we look at IRR record of Airtel as published in their peeringdb record: AS9498:AS-BHARTI-IN

Anurags-MacBook-Pro:~ anurag$ whois -h whois.apnic.net AS9498:AS-BHARTI-IN |grep -w AS8529
members: AS38476,AS45219,AS45264,AS45283,AS45514,AS45451,AS37662,AS45491,AS7642,AS45517,AS45514:AS-TELEMEDIA-SMB,AS45609,AS38740,As131210,AS45335,AS23937,AS132045,AS8529,AS132486,AS8164,AS133967,AS37048
Anurags-MacBook-Pro:~ anurag$

This also confirms the same. Airtel did picked this route and since it was a customer route, it had a higher local preference then the peering route Airtel learnt from NIXI Noida peering with  K root. For now route leak fixed and Airtel seems to be having good routing with K root anycast instance in Noida.

Current status

From Tata Communications – it’s yet not picking announcement of K root anycast instance from Noida since their peering session at NIXI Noida has been down from long time. NIXI moved over from STPI to Netmagic Sector 63 Noida in August (see heavy drop of traffic in NIXI Noida graphs here). From that time onwards Tata’s domestic backbone AS4755’s peering session seems down.

NIXI Looking Glass - show ip bgp summary
Router: NIXI Delhi (Noida)
Command: show ip bgp summary
BGP router identifier, local AS number 24029
BGP table version is 541676, main routing table version 541676
10616 network entries using 1528704 bytes of memory
13657 path entries using 1092560 bytes of memory
1546/1197 BGP path/bestpath attribute entries using 210256 bytes of memory
1275 BGP AS-PATH entries using 40472 bytes of memory
566 BGP community entries using 22196 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 2894188 total bytes of memory
BGP activity 523875/512278 prefixes, 1016379/1001610 paths, scan interval 60 secs
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd    4        25152   35502  102431   541675    0    0 3w3d              1   4        10029    8285   15774   541675    0    0 2d16h           194   4         9583    4750    9899   541675    0    0 2d16h          1969   4        17439  109297  191050   541675    0    0 9w5d             32   4         9829     713    2669   541675    0    0 11:04:52        857   4        17426    1205    3995   541675    0    0 19:57:29         17   4         9498  190999  159646   541675    0    0 3w3d           7254   4         4637   63761  141723   541675    0    0 6w2d              5   4        63829   30808   80566   541675    0    0 2w5d              5   4        17754   20071   50107   541675    0    0 1w5d            102   4        18101   14641   29277   541675    0    0 5d00h           190   4        17488   22887   58026   541675    0    0 2w0d            354   4        55410   58592  107852   541675    0    0 2w2d           2637   4        10201       0       0        1    0    0 2d08h    Active   4        55836    9164   23591   541675    0    0 6d08h             7   4        45528   38354  107593   541675    0    0 3w5d             18   4       132215   27000   56646   541675    0    0 1w2d             15   4       132453       0       0        1    0    0 2d07h    Idle

As per NIXI’s connected parties page, Tata Comm’s IP is From NIXI’s looking glass there seems to no peer on that IP !

NIXI Looking Glass - show ip bgp neighbors routes
Router: NIXI Delhi (Noida)
Command: show ip bgp neighbors routes
% No such neighbor or address family

Hence for now Tata Comm isn’t getting route at all from Noida instance and that explains reason for bad outbound path.
Example of trace from Tata Comm to K root:

## AS4755/TATACOMM-AS - TATA Communications formerly VSNL is Leading ISP (2.7% of browser users in IN)
#prb:15840 dst:
1 () [0.344, 0.426, 17.445]
2 err:{u'x': u'*'}
3 (AS4755) [2.73, 2.916, 2.921] |Pune,Maharashtra,IN|
4 () [5.659, 5.789, 6.274]
5 (AS6453) ix-0-100.tcore1.mlv-mumbai.as6453.net [5.143, 5.168, 5.755]
6 (AS6453) if-9-5.tcore1.wyn-marseille.as6453.net [125.474, 125.554, 125.596] |Marseille,Provence-Alpes-C?te d'Azur,FR|
7 (AS6453) if-2-2.tcore2.wyn-marseille.as6453.net [125.723, 125.739, 126.525] |Marseille,Provence-Alpes-C?te d'Azur,FR|
8 (AS6453) if-7-2.tcore2.fnm-frankfurt.as6453.net [126.535, 126.788, 127.22]
9 (AS6453) if-12-2.tcore1.fnm-frankfurt.as6453.net [125.75, 125.828, 125.871]
10 (AS6453) [262.957, 265.3, 266.39]
11 (AS20485) spb03.transtelecom.net [297.919, 297.954, 302.452] |Saint-Petersburg,St.-Petersburg,RU|
12 (AS20485) selectel-gw.transtelecom.net [288.789, 296.574, 298.442]
13 (AS25152) k.root-servers.net [296.981, 297.042, 297.118]

even same stays for its downstream customers who have outbound via TCL:

## AS45528/TDN - Tikona Digital Networks Pvt Ltd. (1.4% of browser users in IN)
#prb:22793 dst:
1 () [0.521, 0.539, 0.814]
2 (AS45528) [5.774, 7.721, 8.195]
3 (AS4755) [7.282, 14.754, 48.013] |Mumbai,Maharashtra,IN|
4 (AS6453) if-2-590.tcore2.l78-london.as6453.net [121.089, 122.755, 124.416] |London,England,GB|
5 (AS6453) if-2-2.tcore1.l78-london.as6453.net [121.828, 122.077, 123.869] |London,England,GB|
6 (AS6453) if-17-2.tcore1.ldn-london.as6453.net [120.716, 122.008, 122.768] |London,England,GB|
7 (AS6453) [122.039, 123.532, 125.424]
8 (AS8468) te2-2.interxion.core.enta.net [125.262, 126.587, 127.04]
9 (AS8468) 188-39-11-66.static.enta.net [122.424, 123.028, 123.163]
10 (AS5459) ge0-1-101.tr1.linx.net [121.656, 124.826, 125.182] |London,England,GB|
11 (AS5459) fe3-0.tr4.linx.net [120.654, 120.721, 138.858] |London,England,GB|
12 (AS5459) g00.router.linx.k.ripe.net [123.306, 123.536, 125.486] |London,England,GB|
13 (AS25152) k.root-servers.net [121.285, 122.653, 122.942]

Another issue which is causing serious trouble around K root is the fact that it appears to be broken IP transit pipe of K root Noida. Due to the way NIXI works, K root must have a IP transit pipe. I pointed long back about broken connectivity of root DNS servers due return path problems. After that both K root and i root got transit but seems like after NIXI moved over, IP transit has been broken for current setup in Netmagic.
Why “local node” of root server needs IP transit?
It needs transit because:

    1. NIXI has a weird pricing of “x-y” where requester pays and this leads to a quite high settlement amount for a network which has a high inbound traffic (eyeball network) – even few x times than that of transit! (paying 5Rs/GB!). This leads to scenario where networks do “partial prefix announcement” to keep their traffic balanced (or slightly in outbound direction) to avoid high settlement cost. Hence most of such eyeball networks announce their regional routes but avoid all routes while they still do learn K root’s route and inject in their IGP.This leads in case where K root’s is leant by networks in West and South India and hence there’s a forward path from customers >>> K root Noida node. Now since these networks aren’t announcing their West or South Indian routes at NIXI Noida, there’s no return path for packets. Thus for root DNS to stay operationally stable (which they should since they are critical) they must have transit / default route to return packets as last resort to IP’s which aren’t visible via peering.
    2. Similar case of some other random leaked routes. E.g if a large ISP decided to learn K root route and announce to customer’s table thus leading to Customer > Large network > K root Noida path while not announcing that customer’s route at NIXI resulting in no return path.

So in short – It does needs transit but just for outbound traffic, not for announcing routes on the transit.
I have informed of broken connectivity issue to RIPE NCC and their team is actively working on the fix. Hopefully it would be fixed very soon!
With hope that your DNS is not getting resolved from other side of world, good night! 🙂
Disclaimer: As usual – thoughts & comments are completely personal.

05 Mar

Different CDN technologies: DNS Vs Anycast Routing

And I am back from Malaysia after attending APRICOT 2014. It was a slightly slow event this time as less people came up due to change of location from Thailand to Malaysia. But I kind of enjoy the APRICOT in start of year. 🙂

It has been quite sometime when I blogged. After getting into Spectranet I got relatively more busy along with bit of travelling to Delhi NCR which has been taking lot of time. I wish to blog more over time. 

In recent time I got chance to understand in detail the working of CDN from the point of view of delivery and this brings me to this post where I will be working on putting in detail how the popular CDN networks work and where they are dependent on DNS recursors and where on anycast routing. 


Understanding CDN

CDN’s as we know are Content Delivery Networks and these are specialized networks which are designed for the content delivery to the edge networks by serving content from as close location as possible. The location of servers and type of connectivity heavily depends on each CDN provider and their business model. E.g Google maintains it’s own delivery network consisting of large number of GGC (Google Global Cache) nodes placed on ISPs network and help in serving Google’s static content while other large networks like Akamai (whose core business is into Cache delivery) put their servers on large number of edge networks but they stay as disconnected small islands. While the new comers in the industry like Limelight,  Cloudflare’s model of deployment is around putting node in major datacenter and direct connection to major networks via peering from IXPs. 


The key features of almost all these CDNs are:
  1. Low latency delivery of content giving very fast throughputs.
  2. Making networks more efficient by caching near to the point of serving and not consuming long haul International bandwidth.
  3. Ensuring that content is delivered with optimum performance with as low as possible dependency on middle networks/backbone. 
  4. Ensures that there is no single point distribution and hence during high load, traffic serving can be optimized. 


Technical side of “edge cache serving”

In order to make the “edge delivery” concept work, CDN providers have multiple options and it is slightly tricker here. Challenge here is to ensure that all users go to their nearest CDN node and get served from there rather then a node far away from them. 


Here we have ISP A with a Cache A deployed very near to it, ISP B with Cache B deployed just next to it and so does ISP C with Cache C right next to it. Assuming that end users visit a website which has services from the CDN provider. Here end user will get a url like “http://cdn.website.com/images/image1.jpg” and here cdn.website.com is supposed to be going to “nearest node”. Thus we expect that when users try to reach cdn.website.com on ISP A, it should hit Cache A, from ISP to Cache B and so on (under normal circumstances). 


Two fundamental ways to achieve that:

  1. Have DNS to do the magic i.e when users from network ISP A lookup for cdn.website.com, they should get a unicast IP address of Cache A in return, similarly for users coming from ISP B network, Cache B’s unicast IP should return. 
  2. Have routing to route to nearest cache node based on “anycast routing” concept. Here Cache A, Cache B and Cache C will use same identical IP address and routing will take care of reaching the closest one. 


Both of these approaches have their own advantages as well as challenges. Some of very large CDN providers like that of Akamai, Amazon Cloudfront rely on DNS. While some of new entrants like Cloudflare rely very much on anycast routing. I have discussed DNS and it’s importance in CDN and node selection in some previous posts, but will be going through this quickly in this one. 


Making use of DNS for CDN

DNS is pretty basic protocol. It’s role is simply into “hostname to IP resolution” (and vice versa). What makes is powerful is that based on certain logic, we can influence this “hostname to IP resolution” and do many cool things like load balancing, high availability, and more. However the key challenge in doing all that is first result of DNS changes usually is not instance since there is lot of caching by the “recursive DNS servers” and second that since recursive DNS servers contact authoritative DNS servers, thus authoritative DNS servers (as by default protocol design) don’t really know of end users. They only know that to which DNS recursor they are talking with (based on source IP of DNS recursor) which many times has relation with end users since primarily ISPs run the recursive DNS servers. But in modern world of large Open DNS recursors like OpenDNS, Google Public DNS – it faints out that impact. 


Here’s how DNS based CDN services work




Here we have users on ISP A requesting for “cdn.website.com” IP address. Requests will go to DNS recursor of ISP which will further hit authoritative DNS servers of CDN provider via DNS hierarchy. Green lines here show the flow of DNS information. Eventually based on IP of requesting DNS recursor, authoritative DNS will reply back with the IP address of cache node close to network A. 


Some of key features of this approach:
  1. Optimization logic is pretty much with authoritative DNS server which can change around IP in order to give a location which can serve off request in optimum manner. If one of edge servers is down, algorithm can take care of it by serving other location.
  2. In most of such deployments cdn.domain.com points to cdnxx.cdn-provider.com via cname record and thus actual resolution logic stays within domain of cdn-provider.com. The records like cdnxx.cdn-provider.com have very low TTL (less then a minute) to make changes reflect instantly. 
  3. These approaches fails significantly if end users do not use DNS recursors of their ISP since reply is very much dependent on location/GeoIP parameters of source IP of DNS recursor. 


Some of new CDN networks have came up with full anycast based setup with very little dependency on DNS. E.g Cloudflare.


Here’s how anycast routing based CDN providers work




Here  we have User1 & User 2 on ISP A connected to ISP A router, User 3 & User 4 on ISP B connected to ISP B router & finally User 5 & User 6 on ISP C connected on ISP C router. All off these routers are have CDN provider caches nearby and get multiple routes. So e.g for ISP A router, CDN server A is 1 hop away, while CDN server B is 2 hops away and CDN Server C is 3 hops away. If all servers use the same IP then ISP A will prefer going to CDN ServerA, B will go to CDN server B and so on with C. 


Some of key features of this approach:

  1. Optimization is based on BGP routing and announcement with little role of DNS. 
  2. This setup is very hard to build up and scale since for anycast to work perfectly at global level, one needs lot’s and lot’s of peering and consistent transit providers at each location. If any of peers leaks a route to upstream or other peers, there can be lot of unexpected traffic on a given cluster due to break of anycast. 
  3. This setup has no dependency on DNS recursor and hence Google DNS or OpenDNS works just fine. 
  4. This saves a significant amount of IP addresses since same pools are used at multiple locations. 



With that beings said, I hope you are getting served from nearest cache for static content of my blog. (since I use Amazon Cloudfront for static content). 🙂


Disclaimer: This is my personal blog and does not necessarily reflect thoughts of my employer.

28 Oct

Akamai CDN and DNS resolution analysis

These days Open DNS resolvers are getting quite popular. With Open DNS resolver I mean resolvers including OpenDNS as well as Google Public DNS.

One of major issues these resolvers suffer is failure of integration with CDN providers like Akamai, Limelight etc. In this post I will analyse sample client site of Akamai – Malaysia Airlines website – http://www.malaysiaairlines.com.  


Looking at OpenDNS, Google Public DNS and my ISP (BSNL’s) DNS resolver for its DNS records:


;www.malaysiaairlines.com. IN A

www.malaysiaairlines.com. 12169 IN CNAME www.malaysiaairlines.com.edgesuite.net.
www.malaysiaairlines.com.edgesuite.net. 12169 IN CNAME a1456.b.akamai.net.
a1456.b.akamai.net. 20 IN A
a1456.b.akamai.net. 20 IN A


Google Public DNS

;www.malaysiaairlines.com. IN A

www.malaysiaairlines.com. 12312 IN CNAME www.malaysiaairlines.com.edgesuite.net.
www.malaysiaairlines.com.edgesuite.net. 12318 IN CNAME a1456.b.akamai.net.
a1456.b.akamai.net. 10 IN A
a1456.b.akamai.net. 10 IN A


BSNL’s DNS resolver

;www.malaysiaairlines.com. IN A

www.malaysiaairlines.com. 20410 IN CNAME www.malaysiaairlines.com.edgesuite.net.
www.malaysiaairlines.com.edgesuite.net. 20410 IN CNAME a1456.b.akamai.net.
a1456.b.akamai.net. 20 IN A
a1456.b.akamai.net. 20 IN A


Notice different IP’s coming when asked from different DNS resolvers. 

OpenDNS passes me which is announced by Singtel in Singapore.
Google passes me which is announced by Tmnet in Malaysia.
BSNL’s DNS resolver passes me announced by BSNL-NIB itself is within India (yay!) 🙂

This results in latency of 300ms for www.malaysiaairlines.com when using OpenDNS & Google while 60ms when using ISP’s default resolver


How and why this is happening?

The answer lies on underlying DNS layer which is doing this magic. In all cases www.malaysiaairlines.com. is a cname (alias record) to www.malaysiaairlines.com.edgesuite.net.  Further www.malaysiaairlines.com.edgesuite.net. is a cname to a1456.b.akamai.net. Real magic comes here – “b.akamai.net.” itself is a DNS zone. Let’s look at this zone from all 3 DNS resolvers:


anurag@laptop:/$ dig b.akamai.net. ns +short @

anurag@laptop:/$ dig b.akamai.net. ns +short @

anurag@laptop:/$ dig b.akamai.net. ns +short @


All identical names. Let’s pick one randomly and analyse:



anurag@laptop:/$ dig n0b.akamai.net a @ +short

anurag@laptop:/$ dig n0b.akamai.net a @ +short

anurag@laptop:/$ dig n0b.akamai.net a @ +short


All different IPs!
At this stage everything seems very confusing.


Let’s revise what we have till now

www.malaysiaairlines.com. is CNAME to www.malaysiaairlines.com.edgesuite.net. and www.malaysiaairlines.com.edgesuite.net. is cname to a1456.b.akamai.net. Now a1456.b.akamai.net. is a absolute hostname under DNS zone “b.akamai.net” which is giving different IPs when checked from different DNS resolvers. b.akamai.net DNS zones has several DNS servers and I randomly pick one of them n0b.akamai.net. We see n0b.akamai.net itself gives different A records and thus I am going back to parent zone which is akamai.net to further find how this is happening.


Let’s see DNS servers of akamai.net:

To avoid further confusion due to interesting DNS lookups, let’s use whois record of akamai.net domain to see what authoritative DNS servers it is using rather then a DNS query:

anurag@laptop:~$ whois akamai.net

Whois Server Version 2.0

Domain names in the .com and .net domains can now be registered
with many different competing registrars. Go to http://www.internic.net
for detailed information.

Domain Name: AKAMAI.NET
Registrar: TUCOWS.COM CO.
Whois Server: whois.tucows.com
Referral URL: http://domainhelp.opensrs.net
Name Server: NS2-193.AKAMAITECH.NET
Name Server: NS3-193.AKAMAITECH.NET
Name Server: NS4-193.AKAMAITECH.NET
Name Server: NS5-193.AKAMAITECH.NET
Name Server: NS6-193.AKAMAITECH.NET
Name Server: NS7-193.AKAMAITECH.NET
Status: clientTransferProhibited
Status: clientUpdateProhibited
Updated Date: 18-jun-2012
Creation Date: 03-mar-1999
Expiration Date: 03-mar-2022

>>> Last update of whois database: Sun, 28 Oct 2012 16:56:03 UTC <<<


Now again let’s pick one randomly – NS1-1.AKAMAITECH.NET and see what it tells us for hostname “n0b.akamai.net” 


anurag@laptop:~$ dig @NS1-1.AKAMAITECH.NET n0b.akamai.net +short



Wow! Akamai’s DNS setup can make a boring Sunday evening very interesting. 😉


Now since NS1-1.AKAMAITECH.NET. itself is on a different domain name (and so different DNS zone), let’s do bit more effort to get to the core of it. NS1-1.AKAMAITECH.NET. is simply an A record on DNS servers of AKAMAITECH.NET. zone.


Let’s look at that zone now:

anurag@laptop:/$ dig AKAMAITECH.NET ns +short


Again, let’s pick – zh.AKAMAITECH.NET. and query for NS1-1.AKAMAITECH.NET.

anurag@laptop:/$ dig NS1-1.AKAMAITECH.NET. @zh.AKAMAITECH.NET.  +short

Finally some consistent result (YAY!). So is server with IP playing game? Remember in 2nd last step this server was giving different IPs for hostname NS1-1.AKAMAITECH.NET. I SMELL ANYCASTING! 🙂

Let’s do a traceroute to from my location (BSNL Haryana), Airtel Delhi node & my Europe server (where this blog is hosted!):



traceroute to (, 30 hops max, 60 byte packets
1 ( [AS1] 0.644 ms 1.022 ms 1.150 ms
2 ( [AS9829] 19.467 ms 20.335 ms 21.824 ms
3 ( [AS9829] 27.180 ms 29.092 ms 30.510 ms
4 ( [AS18101] 61.354 ms 63.244 ms 64.209 ms
5 ( [AS18101] 68.160 ms 68.907 ms 69.847 ms
6 ( [AS18101] 72.336 ms 54.497 ms 54.633 ms
7 ( [AS9498/AS7617] 80.766 ms 82.390 ms 83.732 ms
8 AES-Static- ( [AS24560/AS9498] 87.199 ms 88.580 ms 90.314 ms
9 * * *
10 * * *


Europe server

traceroute to (, 30 hops max, 60 byte packets
1 gw.giga-dns.com ( [AS51167] 0.639 ms 0.637 ms 0.623 ms
2 host-93-104-204-33.customer.m-online.net ( [AS8767] 0.600 ms 0.592 ms 0.585 ms
3 xe-1-1-0.rt-decix-2.m-online.net ( [AS8767] 7.784 ms 7.740 ms 7.727 ms
4 xe-1-1-0.rt-decix-2.m-online.net ( [AS8767] 7.464 ms 7.461 ms 7.452 ms
5 decix-fra6.netarch.akamai.com ( [AS6695] 8.434 ms 8.916 ms 8.407 ms
6 * * *
7 * * *
8 * * *


Here we go! Surely anycasting. is coming from prefix announced by Akamai AS21342 announced at different locations.



Let’s go in forward mode now:

Akamai CDN provider has a interesting DNS setup with mix of anycasting DNS servers where “edge servers” carry different A record for a given hostname. E.g at core Akamai has set of anycasted DNS servers like zh.AKAMAITECH.NET which hold A record for another set of DNS servers like NS1-1.AKAMAITECH.NET. which act as DNS server for akamai.net domain name. Next, these DNS servers hold different values for another set of DNS servers like n0b.akamai.net which are hold the delegation for a subzone like b.akamai.net which holds the hostname like a1456.b.akamai.net to which hostnames like www.malaysiaairlines.com.edgesuite.net. point to! 🙂 


Why Akamai is having such complex setup?

My strong guess here is that multiple zones and cross dependency here is simply to spread load and avoid single point failure. The important thing here is that at core of DNS Akamai uses anycasting but for serving content from these web servers there’s no anycasting. E.g I am getting IP for Akamai’s client site why is a unicated IP from BSNL prefix announcement. Akamai is NOT using anycasting on edge distribution and my strong guess for that is that it’s way too easy for Akamai to manage things in current rather then putting caching servers on anycasting IPs. E.g if in current situation Akamai node on BSNL is choked up, they can simply distribute traffic by modifying DNS server to pass A record to BSNL 1 out of 4 times and rest of time pass the IP of caching node on Airtel. In case of anycasting that is not possible. It will simply follow short AS/hop path and distribution of load partially is not possible. Again that’s my guess. 🙂

Time for me to change DNS resolver in my router now!