14 Jul

Tracking Indian RPKI data

So based on my friend – Abdul Awal’s tweet, I started looking at the latest RPKI ROA data for India. His Tweet came when I was in the middle of moving my blog from WordPress running over LXC containers to now WordPress over docker with bitnami’s image. Bit of optimisation is still pending.

Firstly I wanted to validate the claim. I have seen some data here and there but not comprehensive data to compare various countries across the region. So I thought to prepare it. As I started, I realised it makes more sense to write a tool and just automate it so that tool can lookup for data every day and keeps a webpage updated.

How can one compare RPKI ROAs across the region?

I thought of a few possible ways and settled for using APNIC’s delegation file to find ASNs and then looked for the announcement from those ASNs for selective countries. My code checked the data for Afghanistan, Pakistan, India, Nepal, Bhutan, Bangladesh, Sri Lanka, Myanmar, Thailand, China, Taiwan, Cambodia, Vietnam, Malaysia and Singapore.

Next step was to find prefixes. For this, I relied on RIPE RIS RRC01. It has got 27 full table feeds at the time of writing this blog post from its members.

Then to run validation, I relied on super fast RPKI API from my friend Louis. Results go into a database and next, Grafana to graphs this data.

And with only 12% valid signed prefixes (against total announcement) we are looking at pretty low levels of ROAs. 🙁
So Awal does indeed has a point. In comparison Bhutan is at 100% level, Nepal + Sri Lanka at a 90% level, Pakistan at 73% level, Myanmar at 79%. China seems to be doing equally bad at just 5% level. When I looked at data of unique ASNs visible in the routing table, it clearly seems like India, China and Japan are lagging.

Another noticeable thing here is that while India has 1873 unique ASNs and Japan has 3135 in comparison to only 629 in China but China has 427 million unique IPv4 addresses as visible in routing. India has only 47 million addresses announced by 1800+ ASNs.

I have published this and some more dedicated data on this page here which will be auto-updated every 24hrs (around 1 am IST). This also has a list of Indian invalids and I will try to use it to get active some cleanup done for the invalids.

Next logical steps for now…

  1. Contact 60 odd origin ASNs which are announcing 300 or so invalids in India and try to get those cleaned up.
  2. There seems to be zero documentation about RPKI on IRINN website. In fact, there’s not even a mention of RPKI on the IRINN website which is bad. I will try to reach out to friends at IRINN and will request them to put documentation about RPKI.
  3. Reaching out to telcos who hold a large set of IP blocks and will try to convince them for creating ROAs as the first logical step.

Limitation of this data

  1. I am looking at prefixes originated by Indian ASNs. Some of these prefixes might be originated outside of India. So a very small % of these numbers might be Indian prefixes which are used in the US or Europe by an Indian ASN (e.g a web hosting company).
  2. We miss a small % of prefixes in this data which are originated by non-Indian ASNs like Google, Cloudflare, Microsoft etc in India.
  3. I see what the collector gets. Thus hypothetically speaking if Tata, Airtel, Jio, Sify, BSNL and Vodafone/IDEA all start dropping invalids, I will not see any of these invalids while they may still exist. Though that’s the unlikely case because people will notice a drop in connectivity for all endpoints outside of India and that would anyway result in getting those fixed.

Finishing this at 5:24am. Time to get some sleep!

08 Apr

Espresso: Google's peering edge architecture

Back in 2017 Google shared details about Espresso which is their SDN solution for scaling up their routing.
Saw this fascinating presentation from Google at SIGCOMM 2017. This blog post covers it in detail besides the talk.

Key design principles for their routing platform

  1. Hierarchical control plane consisting of both global as well as local control. Global takes care of overall traffic flow, inputs coming from performance metric etc while local take care of failure of BGP sessions, port/device failure etc.
  2. Fail static – To ensure that any part of the system can fail and the system keeps working as it was before.
  3. Software Programmability

Key features of the Espresso platform

  1. Peers physically terminate on MPLS switch and BGP feature is in software and hosted on a set of a host (servers). Sessions are spread across different hosts to avoid a single point of failure. If a host fails, it will result in the failure of only a set of peering and not all. Plus, they keep backup hosts in event of failure of the primary.
  2. Single BGP runs on the software, the table goes in RAM of the server giving very high scalability to hold large routing tables.
  3. Google “sprays” small amounts of traffic across all available paths (non-best paths) to have a picture of all available paths and based on that data as well as inputs from applications, it selects the path.
  4. This platform proves that SDN is not only for the jailed gardens and can be used for BGP routing optimisation. Many people believed SDN was for “internal network” only.
  5. Back in 2017, this platform was being used for around 22% of their existing capacity and entire new buildout was using it. Now in 2020 probably number would be much higher.

The talk ended with a nice Q&A where someone asked how they know capacity on other paths because on an “unloaded path” they may see it’s all good but as soon as they send traffic it may actually choke that path. Clearly that is something which does not happen with Google peering that often and hence I must say their platform is very quick in determining and re-routing traffic.

While the presenter did not mention it in response to the question I think that due to distribution of BGP sessions across various host and carrying a large set of a table in such scalable way, they probably do not have BGP convergence issues. Also, since it’s outbound heavy, they can pick the path to send traffic. It will work in all cases where the other side is able to send traffic back to Google (TCP traffic) and their selected path is not dead.

Think about peer on the other side when you bring up your BGP session with AS15169 next time. 🙂