29 May

What makes BSNL AS9829 as most unstable ASN in the world?!

On weekend  I was looking at BGP Instability Report data. As usual (and unfortunately) BSNL tops that list. BSNL is the most unstable autonomous network in the world. In past, I have written previously about how AS9829 is the rotten IP backbone.

This isn’t a surprise since they keep on coming on top but I think it’s well worth a check on what exactly is causing that. So I looked into BGP tables updates published on Oregon route-views from 21st May to 27th May and pulled data specifically for AS9829. I see zero withdrawals which are very interesting. I thought there would be a lot of announcements & withdrawals as they switch transits to balance traffic.
If I plot the data, I get following chart of withdrawals against timestamp. This consists of summarised view of every 15mins and taken from 653 routing update dumps. It seems not feasible to graph data for 653 dumps, so I picked top 300. Here’s how it look like:

Except for few large spikes, it seems to have a relatively consistent pattern. We can see daily fresh announcements of close to 50,000 announcements.
This data gives no idea and I can’t say much by looking at it. Instead of looking at updates, I pulled last weeks RIBs and pulled AS9829 announcements. The idea here is to get map announcements to each upstream against time stamp along with announcements across various subnet masks.

Here’s total route announcement graph:
The graph above clearly shows that total routes announcements increased significantly on 23rd May at 06:00 UTC from 127664 to 129298. Thus dipped significantly at 14:00 on 26th May to 124301. So between 10:00 to 14:00 on 26th, the drop in routes as much as 4% drop clear indicating a large outage they had in their network.

Next part is to look at how they tweak their announcements to upstream.

So clearly they are announcing a large number of routes to Tata AS6453 and these are IPLC links where they are buying IP transit outside India. Some of these key spikes show a mirror among other transit giving a clear hint of circuit balancing by moving route announcement.
Next part is to view their announcements in terms of prefix size.

/20 as well as /22 as both seems relatively consistent except showing a dip on 26th.
So all I can say based on above data is following:

  1. BSNL had some issues last week. Possibly one of their upstream pipes had issues and they increased their announcements on Tata AS6453 during that time.
  2. They are an only large operator who is buying transits from as many as 9 upstream. This would result in broken capacity across at least 9 and possibly 30-40 circuits resulting in a major capacity management challenge across these upstream.
  3. They are announcing a large number of prefix sizes. /18, /20, /22, /23 and even /24s. This isn’t good practice at their large scale.
  4. They need to start peering. They are the only network of that scale who isn’t peering except with a couple of content players like Google AS15169. They need to peer aggressively inside India & follow same outside India if they actually keep on running such network. Or else even buying transit domestic only will be a better strategy.

Most of these problems can be fixed if BSNL aggregates it’s a number of transits (and circuits per transit) along with aggregation of routes. For a three transit scenario, they can follow /18, /20 and /22 strategy and leave /24 only for emergency cases to balance traffic.

28 Jun

BSNL routing glitch and updates

Today I noticed some traffic on my blog from a link from Broadband forum


Here’s what poster wrote:

I made a thread a few days ago complaining about BSNL’s horrible routing. Well it looks like it has been fixed. I thank all the guys who made efforts to bring this to BSNL’s notice. Especially Anurag Bhatia who highlighted the issue with much detail on his blog

anuragbhatia.com !!! » Blog Archive » BSNL > Softlayer connectivity problem & possible fix



Always good to see links to my blog. This was an interesting update and I can see forward does seems good for now. 


Here’s an updated traceroute from India to Singapore (BSNL > Softlayer):

anurag:~ anurag$ traceroute -a hostgator.in
traceroute to hostgator.in (, 64 hops max, 52 byte packets
1 [AS65534] router.home ( 1.183 ms 1.290 ms 0.849 ms
2 [AS9829] ( 17.517 ms 18.056 ms 17.163 ms
3 [AS9829] ( 71.872 ms 52.246 ms 114.018 ms
4 [AS4755] ( 49.644 ms 50.151 ms 49.265 ms
5 [AS0] ( 83.261 ms * 82.361 ms
6 [AS0] ix-4-2.tcore1.cxr-chennai.as6453.net ( 197.469 ms 199.161 ms 196.580 ms
7 [AS0] if-5-2.tcore1.svw-singapore.as6453.net ( 318.931 ms 307.292 ms
[AS0] if-3-3.tcore2.cxr-chennai.as6453.net ( 306.836 ms
8 [AS0] if-5-2.tcore2.svw-singapore.as6453.net ( 330.831 ms
[AS0] if-2-2.tcore2.svw-singapore.as6453.net ( 306.926 ms
[AS0] if-6-2.tcore2.svw-singapore.as6453.net ( 227.751 ms
9 [AS0] ( 230.692 ms 265.758 ms 241.768 ms
10 [AS4637] i-1-0-0.6ntp-core01.bi.telstraglobal.net ( 245.100 ms 235.299 ms 274.206 ms
11 [AS4637] i-0-1-0-0.istt02.bi.telstraglobal.net ( 307.158 ms 304.905 ms 307.080 ms
12 [AS4637] unknown.telstraglobal.net ( 307.409 ms 304.740 ms 307.178 ms
13 [AS36351] ae5.dar02.sr03.sng01.networklayer.com ( 307.167 ms 306.263 ms
[AS36351] ae5.dar01.sr03.sng01.networklayer.com ( 307.456 ms
14 [AS36351] po1.fcr01.sr03.sng01.networklayer.com ( 238.486 ms
[AS36351] po2.fcr01.sr03.sng01.networklayer.com ( 234.005 ms
[AS36351] po1.fcr01.sr03.sng01.networklayer.com ( 306.823 ms
15 * * *
16 * *^C
anurag:~ anurag$



So forward does seems good but latency is still way too high then an expected value of 120-150ms (from North India). There’s a jump as soon as we hit Chennai router for AS6453.


Quick ping output:

anurag:~ anurag$ ping -c 5 hostgator.in
PING hostgator.in ( 56 data bytes
64 bytes from icmp_seq=0 ttl=45 time=232.593 ms
64 bytes from icmp_seq=1 ttl=45 time=233.120 ms
64 bytes from icmp_seq=2 ttl=45 time=259.231 ms
64 bytes from icmp_seq=3 ttl=45 time=281.217 ms
64 bytes from icmp_seq=4 ttl=45 time=305.450 ms

— hostgator.in ping statistics —
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 232.593/262.322/305.450/28.154 ms
anurag:~ anurag$


We can ignore any value above 232ms because that’s simply latency added by router because they do not put ICMP on priority. But overall 232ms is quite high and it seems like there is issue in reverse path. I am doing this test from sitting on BSNL autonomous system 9829.


Looking at BGP table at Softlayer Singapore for this prefix via Softlayer Looking Glass, we get:



bbr01.eq01.sng02> show route protocol bgp table inet.0
inet.0: 461775 destinations, 1662681 routes (461773 active, 1 holddown, 1 hidden)
+ = Active Route, – = Last Active, * = Both *[BGP/170] 6:50:39, MED 1, localpref 160
AS Path: 4637 6453 9829 I

> to via xe-0/2/0.0
[BGP/170] 6:50:41, MED 5, localpref 10
AS Path: 2914 6453 9829 I

> to via ae11.0


So AS path is AS4637 > AS6453 > AS9829 


AS4637 is Reach/Telstra while AS6453 is Tata Comm and just next to it is AS9829 which again (as per my earlier post) is an IPLC link. AS6453 > AS9829 connection is from outside India for sure and it should be rather AS6453 > AS4755 (VSNL) > AS9829 for actual direct route from Singapore to Asia.


Just to confirm this, let’s run a trace to a random IP from Softlayer Singapore:

bbr01.eq01.sng02> traceroute
HOST: bbr01.eq01.sng02-re0 Loss% Snt Last Avg Best Wrst StDev
1. 0.0% 5 1.6 2.9 1.6 5.6 1.5
2. 0.0% 5 1.5 16.3 1.4 43.8 20.6
3. 0.0% 5 4.9 4.2 3.1 4.9 0.9
4. 0.0% 5 44.1 11.6 2.6 44.1 18.2
5. 0.0% 5 261.5 261.4 260.6 263.0 0.9
6. 0.0% 5 260.3 256.6 255.4 260.3 2.1
7. 0.0% 5 255.9 256.3 255.9 256.8 0.4
8. 0.0% 5 255.7 257.7 255.7 263.5 3.3
9. 60.0% 5 258.1 257.7 257.4 258.1 0.5
10. 80.0% 5 263.5 263.5 263.5 263.5 0.0
11. 0.0% 5 256.1 256.2 256.0 256.4 0.2
12. 0.0% 5 380.6 380.6 380.6 380.6 0.0
13. 0.0% 5 397.6 388.9 381.2 401.3 9.8
14. 0.0% 5 394.1 397.8 394.1 412.6 8.2
15. 20.0% 5 397.3 404.9 394.1 424.0 13.4
16. ??? 100.0 5 0.0 0.0 0.0 0.0 0.0




Clearly a high latency route but unfortunately Softlayer looking glass is not doing rDNS PTR mapping for IP to hostname. Let’s try to look at some specific hops for them via using dig command (with -x argument for PTR):

anurag:~ anurag$
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$ dig +short -x
anurag:~ anurag$




So return path for packets is as:


Telstra (Singapore) > Tata AS6453 (Singapore) > Tata AS6453 (Chennai via Tata Indicom cable link) > Tata AS6453 (Mumbai) > Tata AS6453 (Marseille, France) > Tata AS6453 (London) > IPLC Link >>> BSNL AS9829 India.


So basically BSNL fixed forward path but return path is badly messed up. They are not announcing this prefix – along with many more prefixes  to transit provider’s IP links. They are just relying on NIXI for domestic traffic while for transit they are relying on IPLC ports which in this case seems to be with Tata AS6453 in London.


Here’s what Tata AS6453 router in Mumbai is getting:


AS6453 IPv4 and IPv6 Looking Glass
show ip bgp

Router: gin-mlv-core1
Site: IN, Mumbai, MLV
Command: show ip bgp

BGP routing table entry for
Bestpath Modifiers: deterministic-med
Paths: (3 available, best #3)
Multipath: eBGP
11 12
l78-mcore3. (metric 2968) from mlv-tcore2. (
Origin IGP, valid, internal
Originator: Loopback5.mcore3.L78-London.as6453.net.
l78-mcore3. (metric 2968) from mlv-tcore1. (
Origin IGP, valid, internal
Originator: Loopback5.mcore3.L78-London.as6453.net.
l78-mcore3. (metric 2968) from cxr-tcore1. (
Origin IGP, valid, internal, best
Originator: Loopback5.mcore3.L78-London.as6453.net.



So clearly in all three cases Tata AS6453 is getting routes via loopback interfaces of it’s router in London (m core 3 – London). There’s not even a single route via m-core Chennai/Mumbai via VSNL AS4755.



So what’s the possible fix?

Likely something like this:

  1. BSNL should maintain good capacity with IP ports along with IPLC ports.
  2. They should announce all prefixes to IP ports atleast without doing any preferred more specific announcement on IPLC like they announce /18 on IP port and more specific /20 on IPLC.
  3. BSNL should implement BGP blackholing to avoid East Asian traffic via their IPLC ports since most of their ports are in London, New York and Los Angles and not really in East Asia (as far as I can see from routes).
  4. BSNL “could” do a basic 1 degree prepend for IPLC routes specially with Tata AS6453 since AS6453 > AS9829 is short AS path then AS6453 > AS4755 > AS9829. Hence with one degree prepend they can have AS6453 > AS9829 > AS9829 (repetition of own AS once) to increase AS path to make route less preferred. 
  5. Buying IPLC port to reach Equinix Singapore + HongKong Internet Exchange (HKIX) – that’s where they can find a lot of local Asian traffic.