Mapping major CDNs across Indian networks
I was recently discussing with a friend Jio’s Fifa streaming issues. Considering PNI capacity challenges with other telcos, I wonder if they were serving FIFA streams out of their network or if it would be on some CDN like Akamai. As I was testing, I noticed a couple of megs of flow data with my provider’s local IP. Turns out that was a local Google GGC node in Rohtak and as I try to connect to it, it replies on HTTP port 80 and 443. The port 443 response is rather more interesting because while connecting to IP throws an error, it does give me the SSL certificate out of handshake and now I know it’s indeed Google! :)
Here’s how it looks:
I tested the same logic on Facebook and that works as well. Technically this logic should work on all popular CDN since they are supposed to reply back on TLS for serving traffic with their certificate. This gives an idea of scanning the entire Indian routing table to determine popular CDNs and their respective network. Due to the sheer size of the IPv6 table, I cannot do this in IPv6 and hence I ran this check only on IPv4. I already collect Indian prefixes as part of daily RPKI ROA check (detailed post here on the logic). In short, I am looking at ASNs assigned to India as per the APNIC delegation file (directly via APNIC as well as via NIR IRINN). Next, map pools originated by each ASN by looking at the RIPE RIS MRT dump.
So extracted data gives me 42129 prefixes. Unique prefixes are much lower once covering prefixes are removed.
So with some interactions, I determined the most efficient way as:
- Find IPs which reply to ICMP open ports
- From #1, determine the ones with open TCP 443 port
- From #2, find out SSL certificate information
The reason to do #1 instead of directly jumping on to #2 is that finding open ICMP ports is faster. There surely can be cases where port 443 is open but ICMP is filtered but most of the popular CDNs won’t filter ICMP.
To make it fast, I broke unique Indian prefixes into a batch of thousand and triggered parallel containers across a bunch of servers to process the data. The result of these containers was stored centrally in a Backblaze B2 in Amsterdam.
Scanning results
Before going to the results, it’s important to put some important notes here:
- The starting point here is ASNs allocated to India by APNIC. This method would exclude any non-Indian ASNs operating in India and would still be including Indian ASNs operating outside of India. Again in both cases, I cannot imagine a large number of CDN nodes on that and hence they should likely not impact this data.
- Cloudflare will not show up in data because for their caching nodes inside other networks, they use their own address pool with anycast. I cannot locate anycast node presence by looking at the routing table from one point.
- Similarly Fastly and Limelight will also not show up since they mostly sit on their own address space & ASN.
CDN Provider | No. unique Indian networks | SSL domain | Details list URL |
---|---|---|---|
Google GGC | 228 | googlevideo.com | List here |
Facebook FNA | 181 | fbcdn.net, fna.whatsapp.net | List here |
Netflix OCA | 107 | oca.nflxvideo.net, assets.nflxext.com, fast.com, netflix.com | List here |
Akamai* | 70 | *.akamai.net | List here |
Microsoft (via Akamai) | 10 | microsoft.com, (and many more!) | List here |
Apple (via Akamai) | 11 | Many sub-domains under apple.com | List here |
*Number of Akamai nodes is way more than this. This is because Akamai has many large customers who have dedicated IPs & dedicated SSL certs. These are Akamai nodes dedicatedly to serving those clients. This includes Microsoft, Apple, Alibaba, Hotstar etc. The list here shows general nodes which are serving multiple clients.
Feeling numbers are low?
No. This data is not giving a complete picture because the unique number of nodes varies a lot across these ISPs. Here we are counting unique nodes only. E.g while I see just 70 unique networks with Akamai, the actual number of nodes is extremely high. That is because there is massive deployment of these CDN nodes across large telco networks like Airtel, Jio, BSNL etc. Those top three in many cases have more nodes than remaining networks due to their customer size, reach and traffic.
Keep in mind that the number of fixed-line broadband users is less than 3 crores (30 million) and 4G LTE is over 74 crores (749 million) and these mobile internet users are all on Jio, Airtel and VI (a tiny bit on BSNL). Plus on fixed-line Airtel, Jio & BSNL still have a large presence. Due to this massive asymmetry, it is very much expected to see a high number of CDN nodes across the top 5 networks by user size.
Initially, I thought I can look for IPs in unique /24s and assume /24 boundary as a cluster/location. Facebook FNA tells that is not the case. E.g unique /24s for Facebook for Bharti Airtel are just 31. But because Facebook uses airport codes in the names, I can see unique airport codes visible 43.
So take e.g:
9498|BHARTI Airtel Ltd|116.119.32.0/24|116.119.32.32| *.fjai2-2.fna.fbcdn.net
9498|BHARTI Airtel Ltd|116.119.32.0/24|116.119.32.81| *.fixc1-3.fna.fbcdn.net
Now JAI is Jaipur while IXC is Chandigarh (as per airport / IATA naming convention) but both IPs seem to be part of the same /24. These nodes are 500 km apart and cannot be counted as a single cluster.
Similarly in the case of BSNL:
9829|BSNL-NIB|117.254.132.0/24|117.254.132.43| *.fccu17-1.fna.fbcdn.net
9829|BSNL-NIB|117.254.132.0/24|117.254.132.82| *.fbbi4-1.fna.fbcdn.net
Now the first one (CCU) is Kolkata while the second one is Bhubaneswar again part of same /24. Thus I had to drop the plan to use /24 as the boundary. For Facebook, one can surely identify based on airport codes and I have done this the other way around in past (check out this post from July 2022). Thus I cannot determine how many unique Google GGC or Netflix OCA exists in Airtel or Jio. People sitting on these networks locally might be able to tell by looking at the latency.
Thinking OTTs are not building infrastructure to support high traffic flow, think again after looking at the data in this post. Raw data shows over 32000 endpoints serving part of the traffic for these large CDN players.
Raw data
Raw data Google, Facebook, Netflix, Akamai, Microsoft and Apple published here (Google sheets) and here (text file).
Update - 02 Dec 2022 - 00:53
My good friend Vivin (from AS18229) suggested a better format for this data clubing data by ASNs. So I put quick code to prepare that and summary data can be found here.