Understanding multiple routes from same ASN

A while back bgp.he.net added the feature of a live route propagation graph for a given prefix. Besides being near real-time, it is also specific to a prefix, e.g let’s look up for 2401:4900:87f0::/44 (Airtel mobility - 5G prefix):

https://bgp.he.net/net/2401:4900:87f0::/44#_graph

What sometimes confuses people is why there are multiple paths from a given ASN towards the originator ASN. Many people assume multiple paths exist for different prefixes only, and that is not true. Take e.g in the above route propagation graph AS3356 has a direct route to AS9498 as well as via AS2914, AS6939, AS6453, etc. And this kind of multiple paths is not limited just to AS3356 (which is run by three companies as of now - Lumen in North America, Colt in the EU and Cirion in South America). Notice GTT AS3257 is also learning it via Cogent AS174 as well as Tata Comm AS6453. Which is actually “best path” as determined by the BGP? Quick short answer: Both!



Why does this happen?

This kind of routing often happens when the other side is a large network with multiple routers doing eBGP with multiple other larger networks & not anywhere in the upstream path. Let’s take the GTT AS3257 example first before coming to AS3356 case (as AS3356 has something more to it as well, which I will cover after this one).

Let’s query 2401:4900:87f0::/44 on Hurricane Electric’s super-lg and restrict output with AS3257 in the path - output here

Path 2 shows GTT AS3257 learning it from Cogent AS174 but path 4 shows it’s learning it from Tata Comm AS6453. Both of these are “best paths” as determined by BGP in the respective routers that feed the routes. Path 2 via Cogent is being fed into route-views.chicago while path 4 is being fed into RIPE RIS RRC12 (DE-CIX, Frankfurt).

Let’s look at the GTT looking glass and trace to this route in the Chicago and Frankfurt router:

GTT Chicago > 2401:4900:87f0::1

IPv6 traceroute to 2401:4900:87f0::1
HOST: cr1-chi1-re1                Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 2001:668:0:2:ffff:0:d5c8:7ff  0.0%     5    0.5   1.2   0.5   1.9   0.6 <- GTT AS3257
 2. 2001:550:3::18d              40.0%     5    2.7   1.9   1.3   2.7   0.7 <- Cogent AS174
 3. 2001:550:0:1000::9a36:2eb1   40.0%     5    1.2   1.3   1.2   1.6   0.3
 4. ???                          100.0     5    0.0   0.0   0.0   0.0   0.0
 5. 2001:550:0:1000::9a36:5f6d   80.0%     5   31.9  31.9  31.9  31.9   0.0
 6. 2001:550:0:1000::9a36:5a15   80.0%     5   29.1  29.1  29.1  29.1   0.0
 7. 2001:550:0:1000::9a36:a3fa   60.0%     5   38.5  50.8  38.5  63.0  17.3
 8. 2001:550:0:1000::9a36:59a     0.0%     5  181.7 181.5 180.7 183.1   1.0
 9. 2001:550:0:1000::9a36:a8fd    0.0%     5  188.2 188.4 187.9 189.8   0.8
 10. 2001:550:0:1000::9a36:a902    0.0%     4  188.0 188.3 187.7 189.6   0.8
 11. 2001:550:0:1000::9a36:1996    0.0%     4  222.2 204.6 187.3 222.2  19.0
 12. 2001:550:0:1000::9a36:5e71   75.0%     4  182.2 182.2 182.2 182.2   0.0
 13. static-36-2-68-128.xxxxx.svi  0.0%     4  182.7 182.7 181.9 184.2   1.0
 14. 2404:a800::106                0.0%     4  280.8 270.9 266.9 280.8   6.6
 15. 2404:a800:1a00:800::72        0.0%     4  262.1 263.2 262.0 265.1   1.4
 16. ???                          100.0     4    0.0   0.0   0.0   0.0   0.0

GTT Frankfurt > 2401:4900:87f0::1

IPv6 traceroute to 2401:4900:87f0::1
HOST: cr10-fra2-re0               Loss%   Snt   Last   Avg  Best  Wrst StDev
1. 2001:668:0:2:ffff:0:8d88:6be  0.0%     5    1.3  20.2   1.3  89.5  38.8 <- GTT AS3257
2. 2001:668:0:3:ffff:0:9a0e:9aa  0.0%     5    1.1   1.1   1.0   1.2   0.1 <- GTT AS3257
3. 2a01:3e0:ff20::110           80.0%     5    7.3   7.3   7.3   7.3   0.0 <- Tata Comm AS6453
4. 2a01:3e0:ff20:110::3         80.0%     5    6.8   6.8   6.8   6.8   0.0
5. ???                          100.0     5    0.0   0.0   0.0   0.0   0.0
6. ???                          100.0     5    0.0   0.0   0.0   0.0   0.0
7. 2a01:3e0:3900::48             0.0%     5   15.6  15.7  15.5  16.0   0.3
8. 2a01:3e0:3900::17            80.0%     5  230.8 230.8 230.8 230.8   0.0
9. 2405:2000:d00:100::14         0.0%     5  231.1 231.1 230.9 231.3   0.1
10. 2001:5a0:2300:200::d2         0.0%     5  129.6 129.4 129.2 129.6   0.2
11. 2404:a800::106                0.0%     4  160.6 162.8 160.3 169.7   4.6
12. 2404:a800:1a00:800::72        0.0%     4  162.3 162.3 162.2 162.4   0.1
13. ???                          100.0     4    0.0   0.0   0.0   0.0   0.0

Essentially, what is happening here is that for the GTT Chicago route via Cogent AS174 became the best path, while for GTT’s router in Frankfurt, the route via Tata Comm AS6453 became the best path.

Revisiting the BGP route selection algorithm:

  1. Weight - Cisco specific, higher weight wins
  2. Local preference, higher localpref wins
  3. Locally Originated wins (over externally originated)
  4. AS_Path - Shorter AS_PATH wins
  5. Origin code preference i>e>? - Not common these days
  6. MED, low MED wins (used in multiple sessions across the same set of ASNs)
  7. eBGP wins over iBGP (enables hot potato)
  8. Lowest IGP metric to next-hop
  9. Oldest route
  10. Lowest router-id

So if weight is different, the higher weight wins; if weight is the same, localpref is matched etc. In this case #1, #2, #3, #4, #5, #6 and #7 are likely all the same as far as I can think. So, deal breaker becomes #8 which is the lowest IGP metric to the next-hop. If e.g for this specific router in Chicago, the IGP metric is lower to the router connected to Cogent vs. Tata Comm it will prefer Cogent and vice versa in Frankfurt. If the IGP metric is the same and if both Cogent & Tata Comm are on the same set of devices, then the next one #9, the oldest route will take over. It could be a case that the Chicago router first learnt this route from Cogent via the Frankfurt router learnt it from Tata Comm and hence the case.



What about multiple paths to AS3356?

AS3356 is a rather interesting case because a direct path exists in this case as visible in the route propagation chart. So based on BGP route selection #4, the shortest AS_PATH wins should take over if everything else is the same.

Let’s query 2401:4900:87f0::/44 again and filter all routes with 3356 in the path - output here:

Let’s compare a direct path vs an indirect path:

2401:4900:87f0::/44 - 38008 3356 9498 45609 fed via rrc25.ripe.net (RIPE RRC in Amsterdam)

This has BGP communities:

3356:4 - APAC
3356:666 - Peer route
3356:703 - Singapore
3356:2172 - SNG3 - Singapore3

So this is a “peer route” in Singapore.

Vs

2401:4900:87f0::/44 - 209 3356 174 9498 45609

This has the following communities:

3356:3 - North America
3356:575 - USA
3356:666 - Peer route
3356:2003 - LAX1 - Los Angeles

So both are “peer routes” - one learnt from a peer in Singapore (short AS_PATH) while the other learnt from a peer in the US (longer AS_PATH). The localpref has to be the same in these cases because if localpref is higher on any route, AS_PATH would not be compared at all and that path would become the best path. To find what is happening, let’s query the Los Angeles router of AS3356 via their looking glass.

Output is here. Basically, the Singapore route is not there at all. Hence it seems like a special setup where AS3356 peers with AS9498 in Singapore, keeps the route local for APAC and does not export them in US/EU PoPs, etc. The same is reflected by Asia Vs non-Asia downstreams of AS3356 when they feed routes to public collectors and hence multiple paths. Since there isn’t a peering in the US/EU with AS9498, they would pick one of the few upstreams of AS9498, depending on their lowest IGP metric cost (or the oldest route if IGP metric cost is the same).

BGP is fascinating!

Disclaimer: This is my personal blog, and hence, posts made here are in my personal capacity. These do not represent the views of my employer.