27 Oct

Being Open How Facebook Got Its Edge

An excellent presentation by James Quinn from Facebook on “Being Open How Facebook Got Its Edge” at NANOG68. YouTube link here and video is embedded in the post below.

 
 
Some key points mentioned by James:

  1. BGP routing is inefficient as scale grows especially around distributing traffic. They can get a lot of traffic concentrated to a specific PoP apart from the fact that BGP best AS_PATH can simply be an inefficient low AS_PATH based path.
  2. Facebook comes with a cool idea of “evolving beyond BGP with BGP” where they use BGP concepts to beat some of the BGP-related problems.
  3. He also points to fact that IPv6 has much larger address space and huge summarization can result in egress problems for them. A single route announcement can just have almost entire network behind it!
  4. Traffic management is based on local and a global controller. Local controller picks efficient routes, injects them via BGP and takes care of traffic balancing within a given PoP/city, balancing traffic across local circuits. On the other hand, Global PoP is based on DNS logic and helps in moving traffic across cities.

 
It’s wonderful to see that Facebook is solving the performance and load related challenges using fundamental blocks like BGP (local controller) and DNS (global controller). 🙂

05 Oct

Partial outage on .bd ccTLD on 5th Oct 2016

outage
 
Bangladesh’s .bd ccTLD faced another outage. As I mentioned in one of the previous posts – .bd domain seems to be primarily on BTCL (AS17494). Zone delegation of .bd is still pending with PCH and while PCH is mentioned in NS records of the authoritative DNS servers but delegation is pending in the root DNS servers as per reply from Kabindra from PCH on the bdNOG mailing list during the last outage.
If we look at root DNS zone .bd has following delegations:

bd.			172800	IN	NS	dns.bd.
bd.			172800	IN	NS	surma.btcl.net.bd.
bd.			172800	IN	NS	jamuna.btcl.net.bd.

 
After last outage few people started tracking uptime using RIPE Atlas Probes. Thus this time, we do know about the downtime.
Following RIPE Atlas measurements were tracking the DNS checks on jamuna.btcl.net.bd.
4598529 and 4598527. As per measurement ID 4598529, it wasn’t available from 15:44UTC on 4th Oct till 05:44UTC on 5th Oct 2016. This outage was visible from RIPE Atlas probes hosted in India, Singapore, Hong Kong, Japan and Germany.
jamuna.btcl.net.bd. measurement
 
At this point, it’s not known for the cause of the issue. It was mentioned on the bdNOG mailing list today.