Bgp

Facebook cache FNA updates - July 2022

As returning readers of this blog would be aware - I found a trick to find Facebook caching servers around the world during the APRICOT 2018 hackathon. Since then I am running my code again every year to see the changes and publish this report.

Previous reports

  1. March 2018 here
  2. Nov 2019 here
  3. April 2021 here

Facebook knows!

Back in 2019, I was in San Francisco, California for NANOG 75. While roaming around in the lobby, someone read the NANOG card hanging around my neck and greeted me. His 2nd line after greeting was “Oh I know that name, you are the guy who mapped our caching nodes” and we both laughed. I must say this specific category of the post has brought some attention around.

Algorithm to detect a transit free network

In a recent Network AF podcast Avi Freedman (Kentik) joked with the guest about how he finds who is transit free / tier 1 network. He said, “I ask everyone who they think is a tier 1 network. Everyone includes their own name + other names”. Next, he ignores the self-nomination & looks at the common list to find who actually is a tier 1 network. This is funny, intuitive and gives some clue.

GGN Summit | Bangalore | IPv6 transitioning & more!

I am in Bangalore for two days. While there are many things packed into these two days short schedule, one of the most exciting ones is Google Global Network India Innovation Summit. While Google has presented across various events in past talking about their AS15169 backbone, this is the first summit where they are covering it in detail and that too with the Indian context!

Must say that I find AS15169 quite fascinating on the BGP side of things. A massive network which follows “cold potato” routing i.e keeping the majority of traffic over IGP over larger locations, terminating BGP sessions on the virtual appliance with SDN backing, a pretty robust failover design with BGP + DNS taking care of server(s) and even entire PoP failing. I blogged about them back in 2020 here. So this should be fun!

Inefficient IGP can make eBGP go wild!

Lately, I have been struggling to keep latency in check between my servers in India and Europe. Since Nov 2021 multiple submarine cables are down impacting significant capacity between Europe & India. The impact was largely on Airtel earlier but also happened on Tata Comm for a short duration. As of now Airtel is still routing traffic from Europe > India towards downstream networks via the Pacific route via EU > US East > US West > Singapore path. Anyways, this blog post is not about the submarine cable issue.

Welcome to India Vultr!

Vultr has announced start of their Mumbai location on 12th of this month. It’s amazing to see them entering India. Always a good thing for growth of cloud computing on demand in India.
Besides Vultr, we have got Amazon AWS, Microsoft Azure, Google Cloud, Digital Ocean, Linode, Oracle Cloud etc in India. I heard OVH also planning for Indian location and so have to see how that goes.

In meanwhile, let’s have a quick check on Vultr’s network connectivity. I just created a Virtual machine in Mumbai to look at the routing and connectivity. I got following for my test VM:

NIXI expansion & some thoughts

Background

Lately, NIXI has been making a bit of news in the Indian peering ecosystem. NIXI for those who may not be aware is the National Internet Exchange of India. It was founded in 2003 with the idea to provide inter-connection layer 2 peering fabric for local Indian ISPs. They were supposed to ensure domestic Indian traffic is exchanged within India and not outside of India. In my previous post, I did cover how that is not true for now. They never picked up much interconnection due to a number of fundamental issues with their policies.

Redundancy on the servers without BGP

A developer friend recently asked me about the design of redundancy on servers. He had a valid point - running BGP can be tricky and expensive since most colo & datacenter host would offer simple static routing & usually with just a couple of IP addresses. Furthermore, due to IPv4 exhaustion, the prices of /24 have shot off pretty massively. On top of this burning, a /24 on single or multiple servers is also a questionable design practice unless one of hosting & selling hundreds of virtual machines on those servers.

Why Indian internet traffic routes from outside of India?

After my last post about home networking, I am jumping back into global routing. More specifically how Indian traffic is hitting the globe when it does not need to. This is an old discussion across senior management folks in telcos, policymakers, and more. It’s about “Does Indian internet traffic routes from outside of India?” and if the answer is yes then “Why?” and “How much?”

It became a hot topic, especially after the Snowden leaks. There was even an advisory back in 2018 from Deputy National Security Advisor to ensure Indian internet traffic stays local (news here). Over time this has come up a few dozen times in my discussion with senior members from the Indian ISP community, individuals, and even latency-sensitive gamers. So I am going to document some of that part here. I am going to put whatever can be verified publically and going to avoid putting any private discussions I had with friends in these respective networks. The data specially traceroutes will have measurement IDs from RIPE Atlas so they can be independently verified by other network engineers.

Tracking Indian RPKI data

So based on my friend - Abdul Awal’s tweet, I started looking at the latest RPKI ROA data for India. His Tweet came when I was in the middle of moving my blog from WordPress running over LXC containers to now WordPress over docker with Bitnami image. Bit of optimisation is still pending.

Espresso: Google's peering edge architecture

Back in 2017 Google shared details about Espresso which is their SDN solution for scaling up their routing.
Saw this fascinating presentation from Google at SIGCOMM 2017. This blog post covers it in detail besides the talk.

 

Key design principles for their routing platform

  1. Hierarchical control plane consisting of both global as well as local control. Global takes care of overall traffic flow, inputs coming from performance metric etc while local take care of failure of BGP sessions, port/device failure etc.