15 Jun

Routing with North East India!

A few weeks back I got in touch with Marc from Meghalaya. He offered to host RIPE Atlas probe at¬†Shillong and that’s an excellent location which isn’t there on RIPE Atlas coverage network yet. It took around 5 days for the probe to reach¬†Shillong from Haryana. I think probably this probe is the one at the most beautiful place in India. ūüôā

Now that probe is connected, I thought to look into routing¬†which is super exciting for far from places like¬†Shillong. Marc has a BSNL FTTH connection & mentioned about not-so-good latency. Let’s trace to 1st IP of the corresponding /24 pool on which probe is hosted:

 

This is interesting output. So there are two parts of it:

  1. Traffic going via Bangladesh
  2. Traffic to Bangladesh going via Europe!

 

While #1 may look like a routing issue, it’s actually desired result of a deal between BSNL & BSCCL. I blogged about it in last year when it was visible in BGP tables. Eventually, this link was launched by Indian Prime Minister Modi.

 

From the map, it seems like an ideal choice but I really wish BSNL went for some kind of circuits instead of transit with BSCCL. Reason being poor routing across Asian backbones which we see in reason #2.

 

Coming to #2 – this clearly is bad and broken. Traffic is hitting from Siti broadband > Airtel > Telecom Italia > BSCCL and this is resulting in traffic going from India to Europe first before returning to South Asia.

 

Let’s trace to same 1st IP of the pool from all Indian RIPE Atlas probes for a detailed picture:

Measurement result: https://atlas.ripe.net/measurements/8844267/

 

As we can see latency numbers are quite decent from BSNL’s AS9829 itself. 60-70ms seems fine considering it’s from the probes which are in North or South India to far away in North East. Let’s look at some of these traces from probes on BSNL itself:

 

 

This shows that there is indeed a direct backbone circuit of BSNL to that location. There’s a low chance of it being on top of BSCCL infra.

 

Except for BSNL, rest all other Indian networks are routing towards that BSNL segment in Meghalaya from Europe or Singapore/Hong Kong. All the ones from Europe are from Marseille in France. That’s the landing station for 11 cable systems:

  • SEACOM
  • SEA-ME-WE-4
  • EIG
  • I-ME-WE
  • Ariane 2
  • Atlas Offshore
  • Med Cable
  • TE North
  • Tamares Telecom
  • Alexandros
  • AAE-1 (Asia Africa Europe)

 

Out of these Se-Me-We-4 lands in Bangladesh and I guess that is being used by BSCCL for traffic. So coming back to why routing is so terrible from Indian networks towards BSNL in North East? To understand that we need to look at uplinks of BSCCL.

Well, BSNL is announcing 117.247.134.0/24 to BSCCL AS132602 only. BSCCL is buying transit from Telecom Italia AS6762 and NTT AS2914.

http://bgp.he.net/AS132602#_graph4

 

Looking at one of few traces from Europe:

 

213.144.176.194/31 TIS – BSCCL connectivity
213.144.176.194 – 10Gig port on TIS AS6762 router in Marseille
213.144.176.195 – TIS’s IP on BSCCL router in somewhere in Bangladesh

 

Next, looking at NTT AS2914 transit of BSCCL:

 

Here as traffic handoff from Tata AS6453 is happening to NTT AS2914 in Singapore (logical and correct!) and NTT to BSCCL also within Singapore.  The latency is high due to bad return. Here forward is slightly bad but not as bad as return possibly.

Let’s look at return trace to 2nd hop¬†115.118.168.1 from RIPE probe at destination (measurement here):

So clearly return path i.e Shillong to Hyderabad is via Europe because BSCCL used TIS for forwarding path.

 

So keeping above traces in mind, here’s the reason for high latency:

  1. BSNL is routing traffic over its backbone but rest all traffic i.e which is not going towards BSNL is being routed from Bangladeshi provider BSCCL.
  2. BSCCL is announcing routes to NTT AS2914 in Singapore & TIS AS6762 in France. Thus to send any traffic to BSNL’s segment in Meghalaya, one has to send it either via TIS router in¬†Marseille, France or NTT Singapore. This adds up latency significantly for Indian networks (excluding) towards BSNL Meghalaya.
  3. BSCCL is using TIS AS6762 to reach Tata AS6453 and this is resulting in very bad return route and thus Meghalaya to any other network in India who is Tata AS6453 downstream is via Marseille, France.

Quite a lot seems messed up. BSNL’s should at least start announcing¬†117.247.134.0/24 immediately across NIXI’s subject to capacity between their core network in North East. If there’s a capacity constrained, they should use L1 circuits from BSCCL to connect network in Shillong instead of IP transit.

 

How is BSNL in North East reaching Google?

Seems direct to BSNL’s PNI with Google within India.

29 May

What makes BSNL AS9829 as most unstable ASN in the world?!

On weekend  I was looking at BGP Instability Report data. As usual (and unfortunately) BSNL tops that list. BSNL is the most unstable autonomous network in the world. In past, I have written previously about how AS9829 is the rotten IP backbone.

 

This isn’t a surprise since they keep on coming on top but I think it’s well worth a check on what exactly is causing that. So I looked into BGP tables updates published on Oregon route-views¬†from 21st May to 27th May and pulled data specifically for AS9829. I see zero withdrawals which are very interesting. I thought there would be a lot of announcements & withdrawals as they switch transits to balance traffic.

If I plot the data, I get following chart of withdrawals against timestamp. This consists of summarised view of every 15mins and taken from 653 routing update dumps. It seems not feasible to graph data for 653 dumps, so I picked top 300. Here’s how it look like:


 

Except for few large spikes, it seems to have a relatively consistent pattern. We can see daily fresh announcements of close to 50,000 announcements.

This data gives no idea and I can’t say much by looking at it. Instead of looking at updates, I pulled last weeks RIBs and pulled AS9829 announcements. The idea here is to get map announcements to each upstream against time stamp along with announcements across various subnet masks.

Here’s total route announcement graph:

The graph above clearly shows that total routes announcements increased significantly on 23rd May at 06:00 UTC from 127664 to 129298. Thus dipped significantly at 14:00 on 26th May to 124301. So between 10:00 to 14:00 on 26th, the drop in routes as much as 4% drop clear indicating a large outage they had in their network.

Next part is to look at how they tweak their announcements to upstream.

So clearly they are announcing a large number of routes to Tata AS6453 and these are IPLC links where they are buying IP transit outside India. Some of these key spikes show a mirror among other transit giving a clear hint of circuit balancing by moving route announcement.

 

Next part is to view their announcements in terms of prefix size.


/20 as well as /22 as both seems relatively consistent except showing a dip on 26th.

 

So all I can say based on above data is following:

  1. BSNL had some issues last week. Possibly one of their upstream pipes had issues and they increased their announcements on Tata AS6453 during that time.
  2. They are an only large operator who is buying transits from as many as 9 upstream. This would result in broken capacity across at least 9 and possibly 30-40 circuits resulting in a major capacity management challenge across these upstream.
  3. They are announcing a large number of prefix sizes. /18, /20, /22, /23 and even /24s. This isn’t good practice at their large scale.
  4. They need to start peering. They are the only network of that scale who isn’t peering except with a couple of content players like Google AS15169. They need to peer aggressively inside India & follow same outside India if they actually keep on running such network. Or else even buying transit domestic only will be a better strategy.

 

Most of these problems can be fixed if BSNL aggregates it’s a number of transits (and circuits per transit) along with aggregation of routes. For a three transit scenario, they can follow /18, /20 and /22 strategy and leave /24 only for emergency cases to balance traffic.