08 Apr

Espresso: Google's peering edge architecture

Back in 2017 Google shared details about Espresso which is their SDN solution for scaling up their routing.
Saw this fascinating presentation from Google at SIGCOMM 2017. This blog post covers it in detail besides the talk.

Key design principles for their routing platform

  1. Hierarchical control plane consisting of both global as well as local control. Global takes care of overall traffic flow, inputs coming from performance metric etc while local take care of failure of BGP sessions, port/device failure etc.
  2. Fail static – To ensure that any part of the system can fail and the system keeps working as it was before.
  3. Software Programmability

Key features of the Espresso platform

  1. Peers physically terminate on MPLS switch and BGP feature is in software and hosted on a set of a host (servers). Sessions are spread across different hosts to avoid a single point of failure. If a host fails, it will result in the failure of only a set of peering and not all. Plus, they keep backup hosts in event of failure of the primary.
  2. Single BGP runs on the software, the table goes in RAM of the server giving very high scalability to hold large routing tables.
  3. Google “sprays” small amounts of traffic across all available paths (non-best paths) to have a picture of all available paths and based on that data as well as inputs from applications, it selects the path.
  4. This platform proves that SDN is not only for the jailed gardens and can be used for BGP routing optimisation. Many people believed SDN was for “internal network” only.
  5. Back in 2017, this platform was being used for around 22% of their existing capacity and entire new buildout was using it. Now in 2020 probably number would be much higher.

The talk ended with a nice Q&A where someone asked how they know capacity on other paths because on an “unloaded path” they may see it’s all good but as soon as they send traffic it may actually choke that path. Clearly that is something which does not happen with Google peering that often and hence I must say their platform is very quick in determining and re-routing traffic.

While the presenter did not mention it in response to the question I think that due to distribution of BGP sessions across various host and carrying a large set of a table in such scalable way, they probably do not have BGP convergence issues. Also, since it’s outbound heavy, they can pick the path to send traffic. It will work in all cases where the other side is able to send traffic back to Google (TCP traffic) and their selected path is not dead.

Think about peer on the other side when you bring up your BGP session with AS15169 next time. 🙂

02 Apr

Railtel-Google free railway station wifi using 49Gbps!


Railtel (the telecom arm of Indian railways) is running free wifi hotspots across the country in collaboration with Google.  It’s there since last two years and started with the MoU between Railtel and Google (news here) back in 2015.
Fast forward to 2018 – the free wifi project railway stations seems to be doing quite well with so many users using it. The project covers 361 stations and is expected to reach it’s target of 400 stations soon. The IP network for the service is under the name “Mahataa Information India Private Limited” and originates IP pools from AS134426 – https://bgp.he.net/AS134426#_asinfo. It is a single homed network behind Railtel’s AS24186.
 
 

Google’s free wifi at Indian railway stations is better than most of the country’s paid services


 
I put an RTI to Railtel asking them about MoU details as well as bandwidth consumption for each state. In their reply, Railtel denied the request for MoU under the exemption from disclosure as well as NDAs they have with Google but they did share detailed of state wise bandwidth consumption.
 

 
 

Some interesting points

 

  1. This data is peak bandwidth usage and not average bandwidth.
  2. Highest usage seems to be of Maharashtra which very likely is because of high usage in Mumbai.
  3. Second highest in Uttar Pradesh which isn’t surprising based on the size of the state.
  4. Rajasthan, as well as Punjab, seem quite low against their size.
  5. It seems to be mostly 0 for North East states – Arunachal Pradesh, Mizoram, Meghalaya, Manipur, Tripura & Sikkim. The only traffic is in Assam (450Mbps) and Nagaland (90Mbps).  In Assam there are 5 active stations under the project and Nagaland there’s just one (Dimapur) station. 90Mbps usage for one station is interesting.
  6. Total bandwidth consumption of 49.68Gbps looks like a nice number. Hard to predict the cost of the bandwidth since a significant part of this would be local cached/peered traffic like Google, Facebook, Akamai, Amazon etc. My guess would be that 35% of the 49.68Gbps i.e ~ 17Gbps would be the IP transit expense part of it which would be much cheaper against the long haul network Railtel is maintaining.

 
For anyone interested in raw RTI, I have posted the reply from Railtel here which includes my question & their replies. Document hides my personal details like phone number & address. So far impact seems good but I very much wish to know the cost of offering such service for free and if it is sustainable or not.