BSNL-Level3 bad routing case

Quick analysis of BSNL-Level3 bad routing issue

I can see BSNL having pretty high latency again with most of Europe again. It seems like they are using Level3 Communications AS 3356 along with Tata-VSNL for upstream. With Level3 transit BSNL has badly screwed up reverse path causing very high latency and awful bandwidth.

anurag@laptop:~$ ping server7 -c 5
PING server7.anuragbhatia.com (178.238.225.247) 56(84) bytes of data.
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=1 ttl=52 time=320 ms
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=2 ttl=52 time=320 ms
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=3 ttl=52 time=319 ms
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=4 ttl=52 time=327 ms
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=5 ttl=52 time=320 ms
--- server7.anuragbhatia.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 319.880/**321.765**/327.384/2.828 ms
anurag@laptop:~$

Expected latency values here should be around 150ms. A packet should not take more then 150ms round trip between Radaur, Haryana to Munich located server.


Quick view at traceroute:

anurag@laptop:~$ traceroute server7
traceroute to server7 (178.238.225.247), 30 hops max, 60 byte packets
1 router.local (192.168.1.1) 1.440 ms 1.954 ms 2.433 ms
2 117.207.48.1 (117.207.48.1) 26.071 ms 29.648 ms 32.078 ms
3 218.248.173.42 (218.248.173.42) 34.528 ms 36.011 ms 39.674 ms
**4 218.248.246.130 (218.248.246.130) 70.313 ms 72.635 ms 75.355 ms 5 so-6-0-2.edge1.London1.Level3.net (212.113.12.29) 324.058 ms 350.902 ms 351.340 ms** 6 ae-1-51.edge5.London1.Level3.net (4.69.139.75) 349.419 ms 348.461 ms ae-2-52.edge5.London1.Level3.net (4.69.139.107) 348.564 ms
7 Telia (4.68.111.182) 349.354 ms ldn-b5-link.telia.net (213.248.96.37) 294.946 ms 296.696 ms
8 ldn-bb1-link.telia.net (80.91.246.96) 346.667 ms ldn-bb1-link.telia.net (80.91.248.217) 301.921 ms 304.189 ms
9 prs-bb1-link.telia.net (213.155.134.40) 426.722 ms prs-bb1-link.telia.net (80.91.247.34) 315.777 ms 318.168 ms
10 ffm-bb1-link.telia.net (80.91.245.102) 345.072 ms ffm-bb1-link.telia.net (213.155.132.157) 345.609 ms ffm-bb1-link.telia.net (80.91.245.102) 346.060 ms
11 mcn-b2-link.telia.net (80.91.248.29) 347.000 ms 348.939 ms 351.277 ms
12 gigahosting-ic-138043-mcn-b2.c.telia.net (213.248.101.78) 355.053 ms 356.168 ms 324.647 ms
13 server7.anuragbhatia.com (178.238.225.247) 321.058 ms 323.318 ms 332.473 ms
anurag@laptop:~$

Clearly hop 3 is New Delhi (30ms latency), hop 4 is Mumbai (again as per latency values). Hop 5 is London Level3. Seems like BSNL used Europe-India gateway link here (a submarine cable from Mumbai to London owned by multiple providers including BSNL and Bharti Airtel along with Global Crossing which is now owned by Level3). Also, as far as I know Level3 does not has a ISP license in India (doT’s list here) and thus they cannot sell bandwidth at Mumbai. Likely BSNL is using its own ILD license in this case and thus BSNL is responsible for purchase of bandwidth in London.

Thus, as per that traceroute and fact that BSNL is one who is purchasing transit from Level3 in London, BSNL should be having BGP session in London and should be exchanging it’s routing table in turn for global routing table provided by transit. While latency jumps as soon as we hit London as per that traceroute. Clearing BSNL > Level3 path seems OK while return path on Level3 > BSNL is faulty. 


Using Level3’s looking glass, we can have a quick check on traceroute to my IP:

Show Level 3 (London, England) Traceroute to 117.207.48.1  
 1 ae-51-51.csw1.London1.Level3.net (4.69.139.88) 0 msec  
 ae-52-52.csw2.London1.Level3.net (4.69.139.120) 0 msec  
 ae-51-51.csw1.London1.Level3.net (4.69.139.88) 0 msec  
 2 ae-227-3603.edge3.London1.Level3.net (4.69.166.154) 0 msec  
 ae-117-3503.edge3.London1.Level3.net (4.69.166.138) 0 msec  
 **ae-226-3602.edge3.London1.Level3.net (4.69.166.150) 32 msec**  
 **3 gblx-level3-50g.London1.Level3.net (4.68.110.158) 8 msec 4 msec 0 msec**  
 4 ae6.scr4.LON3.gblx.net (67.17.106.150) [AS3549 {GBLX}] 0 msec 0 msec  
 ae5.scr3.LON3.gblx.net (67.17.72.22) [AS3549 {GBLX}] 4 msec  
 **5 so5-0-0-2488M.ar1.NYC1.gblx.net (67.17.64.146) [AS3549 {GBLX}] 104 msec**  
 so6-0-0-2488M.ar1.NYC1.gblx.net (67.17.64.154) [AS3549 {GBLX}] 68 msec 68 msec  
 **6 BHARTIBSNL.so-7-0-0.ar1.NYC1.gblx.net (64.210.30.70) [AS3549 {GBLX}] 268 msec 268 msec 264 msec**  
 7 218.248.255.101 [AS9829 {APNIC-AS-3-BLOCK}] 276 msec 272 msec 276 msec  
 8 117.207.48.1 [AS9829 {APNIC-AS-3-BLOCK}] 272 msec 280 msec 276 msec

Hop3 - Level3, hop4 is Gblx (which is now owned by Level3), hop 5 is Gblx New York and hop 6 is BSNL router in New York. The target BSNL ip is coming from 117.207.48.0/20. Now interesting thing here is BSNL uses Level3 + Gblx both for transit. So return path via Gblx is not an issue but the path London > New York > India is surely an issue.


Looking for prefix 117.207.48.0/20 in Level3 London router:

BGP routing table entry for 117.207.48.0/20  
Paths: (2 available, best #1)  
 3549 9829  
 AS-path translation: { GBLX APNIC-AS-3-BLOCK }  
 edge3.London1 (metric 20020)  
 Origin IGP, metric 100000, localpref 88, valid, internal, best  
 Community: Europe Lclprf_86 United_Kingdom Level3_Peer London 3549:4351 3549:7000 3549:30840  
 Originator: edge3.London1  
 3549 9829  
 AS-path translation: { GBLX APNIC-AS-3-BLOCK }  
 edge3.London1 (metric 20020)  
 Origin IGP, metric 100000, localpref 88, valid, internal  
 Community: Europe Lclprf_86 United_Kingdom Level3_Peer London 3549:4351 3549:7000 3549:30840  
 Originator: edge3.London1

Only two paths that too via Gblx. No direct return path. Again, it is not big issue since Gblx could have a return path right within London (or somewhere else in Europe).

Let’s check GBLX Europe router for entry for 117.207.48.0/20

route-server.ams2>show ip bgp 117.207.48.0/20 long  
BGP table version is 176033437, local router ID is 67.17.81.187  
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,  
 r RIB-failure, S Stale, m multipath, b backup-path, x best-external  
Origin codes: i - IGP, e - EGP, ? - incomplete  
Network Next Hop Metric LocPrf Weight Path  
*>i117.207.48.0/20 67.16.147.121 300 0 9829 i  
* i 67.16.147.121 300 0 9829 i  
route-server.ams2>

Just one path. Doing a traceroute to see the actual path (since I don’t know where that next hop is located!) :)

route-server.ams2>traceroute 117.207.48.1  
Type escape sequence to abort.  
Tracing the route to 117.207.48.1  
1 67.16.147.121 0 msec 72 msec 4 msec  
 2 BHARTIBSNL.so-7-0-0.ar1.NYC1.gblx.net (64.210.30.70) 380 msec 376 msec 380 msec  
 3 218.248.255.101 [AS 9829] 376 msec 376 msec 376 msec  
 4 117.207.48.1 [AS 9829] 384 msec 384 msec 384 msec  
route-server.ams2>

Clearly here’s the issue. BSNL again is doing selective BGP announcement of prefixes at New York only and that is why Europe to India traffic is being routed via New York. BSNL is allowing entry path into it’s network from outside India only at New York and few other selected locations which causes serious damage to latency.

Time for me to get back on work of routing packets! Thanks for reading. :)