A developer friend recently asked me about the design of redundancy on servers. He had a valid point - running BGP can be tricky and expensive since most colo & datacenter host would offer simple static routing & usually with just a couple of IP addresses. Furthermore, due to IPv4 exhaustion, the prices of /24 have shot off pretty massively. On top of this burning, a /24 on single or multiple servers is also a questionable design practice unless one of hosting & selling hundreds of virtual machines on those servers.
An excellent presentation by James Quinn from Facebook on “Being Open How Facebook Got Its Edge” at NANOG68. YouTube link here and video is embedded in the post below.
Some key points mentioned by James:
BGP routing is inefficient as scale grows especially around distributing traffic. They can get a lot of traffic concentrated to a specific PoP apart from the fact that BGP best AS_PATH can simply be an inefficient low AS_PATH based path.
Facebook comes with a cool idea of “evolving beyond BGP with BGP” where they use BGP concepts to beat some of the BGP-related problems.
He also points to fact that IPv6 has much larger address space and huge summarization can result in egress problems for them. A single route announcement can just have almost entire network behind it!
Traffic management is based on local and a global controller. Local controller picks efficient routes, injects them via BGP and takes care of traffic balancing within a given PoP/city, balancing traffic across local circuits. On the other hand, Global PoP is based on DNS logic and helps in moving traffic across cities.
It’s wonderful to see that Facebook is solving the performance and load related challenges using fundamental blocks like BGP (local controller) and DNS (global controller). :)