13 Nov

Understanding the game of bandwidth pricing

I thought about this long back – “Who pays to whom in case of internet bandwidth?” I have been working in this domain from sometime and so far I have learnt that it’s really complex. I will try to put a series of blog post to give some thoughts on this subject.

Firstly we have to understand that when we talk about “bandwidth price” it’s often layer 3 bandwidth which you buy in form of capacity over ethernet GigE, Ten-GigE and so on (or STMs if you are in India). As we know from back school class in networking – layer 3 works over layer 2 and so to deliver “bandwidth” on layer 3, one needs layer 2 physical circuit. Price paid by companies on layer 2 Vs layer 3 varies significantly based on their location, type of business, their target goal etc. E.g a content heavy company like Google pays hell lot of money on layer 2 circuits while it is strongly believed among networking community that Google is a tier 1 network and hence a “transit free” zone and they do not pay any amount on layer 3. In general the trend is pretty much as big networks have larger network footprint and connected “PoPs” over layer 2 (leading to a higher layer 2 bill) while relatively lower layer 3 bill while small networks depend significantly just on transit bandwidth (in form of layer3) and have very low layer 2 footprint.

 

What’s more interesting here is the trend on “Who pays for the layer 2?”¬†

Well it has relatively less clear trend and entire setup is based on fact that internet started in US and a significant content was hosted in US for a long time. Now overall that’s changing fast due to ways web is evolving but still if you consider traffic is flowing to/from Asia, you will often find that it’s on Asian telco’s layer 2 and not the Western telcos. E.g traffic from New York to Mumbai say between AT&T in New York to Tata Comm in Mumbai will be handed over to Tata’s PoP in New York and they will carry it all way down to Mumbai over their layer 2. While for packets travelling from Mumbai to New York, they will be carried over Tata’s layer 2 (yet again!) till New York and will be handed off to AT&T in New York. Now considering fact that Tata and AT&T both are Tier 1 settlement free ISPs, why it’s that only Tata pays? Same applies on relation between say NTT & Level3. And same even applies for traffic between small regional networks which eventually go via big tier 2 networks like say traffic flow between Level3 and Airtel etc (Airtel pays for layer 2).

 

Here’s list of reasons I can think off and have found so far in my numerous discussions with people around in industry:

  1. It has lot to do with sense of power. US is believed to be “Center of backbone” and over time that has changed a lot but still image stays like that.
  2. Except Singapore & Hong Kong most of other major regions in Asia do not much of content & infact consume significant content from Western world. E.g in case of India 70% of content still comes from outside (but don’t assume it to be 70% bandwidth since 30% of domestic hosted content has lot to do with CDN’s and way more % of traffic out on them).
  3. Most of countries in region are very restrictive in letting foreign player to get there and work (India, China, etc).
  4. There is no major exchange point acting as center of regional/global traffic exchange as like that of DE-CIX in Frankfurt or world famous 60 Hudson Street in New York.
  5. Somehow Asian businesses like this to continue and they have no issues in paying at layer 2 as long as they can keep their rivals away from domestic market.
  6. Western players like AT&T, Verizon, Century Link, Level3, Telia etc still stay in more dominant position as compared to Asian majors.

 

 

Will share more on this subject in upcoming blog posts.

 

Sidenote: Visited Amsterdam Botanic Garden today and ending this post with some of pics from there. ūüôā

 

 

Apart from that I also saw snowfall at Klagenfurt, Austria which was very nice experience. Here’s how it looked like:

 

06 Aug

AS Number hijacking due to misconfiguration

 

This Sunday I was looking at global routing table dump and found AS1 announcing some very weird prefixes.

 

AS1 i.e Autonomous System Number 1 belongs to Level3 but as far as I know they are not actively using it. They use AS3356 globally (along with Global Crossing’s AS3549). I noticed quite a few prefixes of a Brazil based telecom provider –¬†Netvip Telecomunicaes being announced by AS1.¬†

 

Some of entries in global routing table belonging to AS1 (as picked from BGP table dump of route-views archive):

Anurags-MacBook-Pro:Downloads anurag$ grep -w ‘1 i’ oix-full-snapshot-latest.dat|cut -f 3 -d ‘ ‘ |sort -u

177.185.100.0/23
177.185.100.0/24
177.185.101.0/24
177.185.102.0/23
177.185.102.0/24
177.185.103.0/24
177.185.104.0/23
177.185.104.0/24
177.185.105.0/24
177.185.106.0/23
177.185.106.0/24
177.185.107.0/24
177.185.108.0/23
177.185.108.0/24
177.185.109.0/24
177.185.110.0/23
177.185.110.0/24
177.185.111.0/24
177.185.96.0/23
177.185.96.0/24
177.185.97.0/24
177.185.98.0/23
177.185.98.0/24
177.185.99.0/24
186.251.240.0/21
186.65.112.0/20
190.185.108.0/22
4.31.236.64/29
4.34.12.0/24
4.34.13.0/24
94.31.44.0/24
Anurags-MacBook-Pro:Downloads anurag$

 

So there are quite a few prefixes belonging to different network providers being originated by AS1. Only¬†4.34.12.0/24 and¬†4.34.13.0/24 seem to be with Level3. Red ones here 188.185.xx.0/24 all belong to Netvip. This appeared very strange to me as why Level3 would let anyone to use AS1 and announce their own prefix? Could it be a hijacked ASN i.e someone using AS1 without having any specific relation to Level3? My past experience tells that if there’s a chance of hijacked ASN then easiest way out is to observe AS path and find who is providing upstream to that ASN.

 

Asking bgp table dump on what it knows about that prefix:

 

Anurags-MacBook-Pro:Downloads anurag$ grep -w 177.185.100.0/24 oix-full-snapshot-latest.dat
* 177.185.100.0/24 85.114.0.217 0 0 0 8492 9002 16735 52931 1 i
* 177.185.100.0/24 213.144.128.203 1 0 0 13030 16735 52931 1 i
* 177.185.100.0/24 66.185.128.1 556 0 0 1668 6762 26615 28309 52931 i
* 177.185.100.0/24 208.51.134.246 14233 0 0 3549 16735 52931 1 i
* 177.185.100.0/24 206.24.210.102 0 0 0 3561 6762 26615 28309 52931 i
* 177.185.100.0/24 67.17.82.114 14023 0 0 3549 16735 52931 1 i
* 177.185.100.0/24 134.222.87.1 0 0 0 286 6762 26615 28309 52931 i
* 177.185.100.0/24 157.130.10.233 0 0 0 701 3549 16735 52931 1 i
* 177.185.100.0/24 203.62.252.186 0 0 0 1221 4637 3549 16735 52931 1 i
* 177.185.100.0/24 198.129.33.85 0 0 0 293 16735 52931 1 i
* 177.185.100.0/24 216.18.31.102 0 0 0 6539 577 3549 16735 52931 1 i
* 177.185.100.0/24 154.11.11.113 0 0 0 852 2914 3549 16735 52931 1 i
* 177.185.100.0/24 137.164.16.84 0 0 0 2152 3356 3549 16735 52931 1 i
* 177.185.100.0/24 89.149.178.10 10 0 0 3257 3549 16735 52931 1 i
* 177.185.100.0/24 154.11.98.225 0 0 0 852 2914 3549 16735 52931 1 i
* 177.185.100.0/24 194.153.0.253 1015 0 0 5413 1299 3549 16735 52931 1 i
* 177.185.100.0/24 129.250.0.11 6 0 0 2914 3549 16735 52931 1 i
* 177.185.100.0/24 96.4.0.55 0 0 0 11686 11164 3549 16735 52931 1 i
* 177.185.100.0/24 195.22.216.188 100 0 0 6762 26615 28309 52931 i
* 177.185.100.0/24 216.218.252.164 0 0 0 6939 16735 52931 1 i
* 177.185.100.0/24 168.209.255.23 0 0 0 3741 2914 3549 16735 52931 1 i
* 177.185.100.0/24 144.228.241.130 0 0 0 1239 6762 26615 28309 52931 i
* 177.185.100.0/24 4.69.184.193 0 0 0 3356 3549 16735 52931 1 i
* 177.185.100.0/24 91.209.102.1 0 0 0 39756 3257 3549 16735 52931 1 i
* 177.185.100.0/24 80.91.255.62 0 0 0 1299 3549 16735 52931 1 i
* 177.185.100.0/24 12.0.1.63 0 0 0 7018 6762 26615 28309 52931 i
* 177.185.100.0/24 203.181.248.168 0 0 0 7660 2516 6762 26615 28309 52931 i
* 177.185.100.0/24 202.232.0.3 0 0 0 2497 3356 3549 16735 52931 1 i
* 177.185.100.0/24 147.28.7.1 0 0 0 3130 2914 3549 16735 52931 1 i
* 177.185.100.0/24 147.28.7.2 0 0 0 3130 1239 6762 26615 28309 52931 i

 

This is interesting. We can see two ASNs are originating this prefix – AS1 (as we know already) and AS52931. The fun fact is that wherever there’s AS1, the next ASN in AS path is AS52931 i.e AS1 for such prefixes is sitting below AS52931 which on other side is originating same prefix. Further AS52931 has upstream from AS28309 & AS16735. It seems like AS1 is coming only for routes which have AS16735 as upstream while for other case it’s direct announcement by AS52931. This gave me an interesting clue which was later verified by replies to my post on NANOG mailing list.¬†

 

Basically AS52931 РNetvip did not hijack AS1 intentionally but rather it was a case of mis-configured prepending. Netvip has two upstreams and was trying to prepend one of them (AS16735). In prepending networks simply repeat their own AS few times to increase AS-PATH which makes a route less preferred. 

 

Ideally what the needed is a AS path like this:

XXX XXX XXX 28309 52931 i – Preferred via AS28309 transit

XXX XXX XXX 16735 52931 52931 i РNot-preferred via AS16735 transit. 

 

Instead of putting their own ASN once in route map, they put “number 1” in the prepend which brought AS1 in global table for this prefix. I tried looking around and saw some funny prefixes from AS2, AS3, AS4 etc.¬†

 

Anurags-MacBook-Pro:Downloads anurag$ grep -w ‘2 i’ oix-full-snapshot-latest.dat|cut -f 3 -d ‘ ‘ |sort -u
128.4.0.0/16
177.129.161.0/24
31.192.64.0/19

 

Last prefix 31.192.64.0/19 does not belongs to AS2 (which is with UDEL-DCN РUniversity of Delaware). 

 

route-views>sh ip bgp 31.192.64.0/19 long
BGP table version is 4043628875, local router ID is 128.223.51.103
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
* 31.192.64.0/19 208.74.64.40 0 19214 25973 6830 3.1190 i
* 194.85.102.33 0 3277 3267 9002 6830 3.1190 i
* 4.69.184.193 0 0 3356 6830 3.1190 i
* 154.11.98.225 0 0 852 174 12570 3.1190 2 i
* 207.172.6.20 0 0 6079 6830 3.1190 i
* 193.0.0.56 0 3333 6830 3.1190 i
* 154.11.11.113 0 0 852 174 12570 3.1190 2 i
* 69.31.111.244 0 0 4436 6830 3.1190 i
* 194.85.40.15 0 3267 9002 6830 3.1190 i
* 66.59.190.221 0 6539 577 6830 3.1190 i
* 209.124.176.223 0 101 101 3356 6830 3.1190 i
* 128.223.253.10 0 3582 3701 3356 6830 3.1190 i
* 207.172.6.1 0 0 6079 6830 3.1190 i
* 157.130.10.233 0 701 1299 6830 3.1190 i
* 134.222.87.1 0 286 6830 3.1190 i
* 66.185.128.48 547 0 1668 6830 3.1190 i
* 202.249.2.86 0 7500 2497 6830 3.1190 i
* 216.218.252.164 0 6939 6830 3.1190 i
* 207.46.32.34 0 8075 6830 3.1190 i
* 144.228.241.130 0 1239 3257 8928 12570 3.1190 2 i
* 114.31.199.1 0 0 4826 6939 6830 3.1190 i
* 208.51.134.254 1 0 3549 3356 6830 3.1190 i
* 129.250.0.11 384 0 2914 8928 12570 3.1190 2 i
* 217.75.96.60 0 0 16150 6830 3.1190 i
* 195.66.232.239 0 5459 6830 3.1190 i
*> 164.128.32.11 0 3303 6830 3.1190 i
* 202.232.0.2 0 2497 6830 3.1190 i
* 203.62.252.186 0 1221 4637 6830 3.1190 i
* 203.181.248.168 0 7660 2516 3257 8928 12570 3.1190 2 i
* 66.110.0.86 0 6453 3356 6830 3.1190 i
* 89.149.178.10 10 0 3257 8928 12570 3.1190 2 i
* 12.0.1.63 0 7018 1299 6830 3.1190 i
* 206.24.210.102 0 3561 3356 6830 3.1190 i
route-views>

 

 

This seems even more interesting because of doted ASN. ūüôā¬†

3.1190 means AS197798 as per dot conversion following RFC 5396. So we have AS197798 as well as AS2 sitting below AS197798 announcing that prefix Рhence another misconfigured prepend case. (Nice tool by Sprint for dot.ASN conversion)

 

Regarding original case of AS1, I observed that yesterday  at 18:44:13 RIPE NCC route collectors noticed change in BGP announcements changes for this. 
One of route change noticed by Tinet AS3257 as route 3257 3549 16735 52931 1 was changed to 3257 3549 16735 52931. By 21:37:59 GMT, Netvip pulled off all routes from that mis-configured prepend. 

 

With hope that you won’t hijack an ASN while prepending, time for me to end this blog post and get back to work!

 

Note: 

I missed to thank Doug Madory from Renesys for his detailed explanation & Stephen Wilcox from IX Reach for giving clue about prepending in my original post. 

24 Sep

BSNL-Level3 bad routing case

Quick analysis of BSNL-Level3 bad routing issue

 

I can see BSNL having pretty high latency again with most of Europe again. It seems like they are using Level3 Communications AS 3356 along with Tata-VSNL for upstream. With Level3 transit BSNL has badly screwed up reverse path causing very high latency and awful bandwidth.

 

anurag@laptop:~$ ping server7 -c 5
PING server7.anuragbhatia.com (178.238.225.247) 56(84) bytes of data.
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=1 ttl=52 time=320 ms
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=2 ttl=52 time=320 ms
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=3 ttl=52 time=319 ms
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=4 ttl=52 time=327 ms
64 bytes from server7.anuragbhatia.com (178.238.225.247): icmp_req=5 ttl=52 time=320 ms

--- server7.anuragbhatia.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 319.880/321.765/327.384/2.828 ms
anurag@laptop:~$

Expected latency values here should be around 150ms. A packet should not take more then 150ms round trip between Radaur, Haryana to Munich located server.

 

Quick view at traceroute:

anurag@laptop:~$ traceroute server7
traceroute to server7 (178.238.225.247), 30 hops max, 60 byte packets
1 router.local (192.168.1.1) 1.440 ms 1.954 ms 2.433 ms
2 117.207.48.1 (117.207.48.1) 26.071 ms 29.648 ms 32.078 ms
3 218.248.173.42 (218.248.173.42) 34.528 ms 36.011 ms 39.674 ms
4 218.248.246.130 (218.248.246.130) 70.313 ms 72.635 ms 75.355 ms 5 so-6-0-2.edge1.London1.Level3.net (212.113.12.29) 324.058 ms 350.902 ms 351.340 ms 6 ae-1-51.edge5.London1.Level3.net (4.69.139.75) 349.419 ms 348.461 ms ae-2-52.edge5.London1.Level3.net (4.69.139.107) 348.564 ms
7 Telia (4.68.111.182) 349.354 ms ldn-b5-link.telia.net (213.248.96.37) 294.946 ms 296.696 ms
8 ldn-bb1-link.telia.net (80.91.246.96) 346.667 ms ldn-bb1-link.telia.net (80.91.248.217) 301.921 ms 304.189 ms
9 prs-bb1-link.telia.net (213.155.134.40) 426.722 ms prs-bb1-link.telia.net (80.91.247.34) 315.777 ms 318.168 ms
10 ffm-bb1-link.telia.net (80.91.245.102) 345.072 ms ffm-bb1-link.telia.net (213.155.132.157) 345.609 ms ffm-bb1-link.telia.net (80.91.245.102) 346.060 ms
11 mcn-b2-link.telia.net (80.91.248.29) 347.000 ms 348.939 ms 351.277 ms
12 gigahosting-ic-138043-mcn-b2.c.telia.net (213.248.101.78) 355.053 ms 356.168 ms 324.647 ms
13 server7.anuragbhatia.com (178.238.225.247) 321.058 ms 323.318 ms 332.473 ms
anurag@laptop:~$

Clearly hop 3 is New Delhi (30ms latency), hop 4 is Mumbai (again as per latency values). Hop 5 is London Level3. Seems like BSNL used Europe-India gateway link here (a submarine cable from Mumbai to London owned by multiple providers including BSNL and Bharti Airtel along with Global Crossing which is now owned by Level3). Also, as far as I know Level3 does not has a ISP license in India (doT’s list here) and thus they cannot sell bandwidth at Mumbai. Likely BSNL is using its own ILD license in this case and thus BSNL is responsible for purchase of bandwidth in London.

Thus, as per that traceroute and fact that BSNL is one who is purchasing transit from Level3 in London, BSNL should be having BGP session in London and should be exchanging it’s routing table in turn for global routing table provided by transit. While latency jumps as soon as we hit London as per that traceroute. Clearing BSNL > Level3 path seems OK while return path on Level3 > BSNL is faulty.¬†

 

Using Level3’s looking glass, we can have a quick check on traceroute to my IP:

Show Level 3 (London, England) Traceroute to 117.207.48.1
1 ae-51-51.csw1.London1.Level3.net (4.69.139.88) 0 msec
ae-52-52.csw2.London1.Level3.net (4.69.139.120) 0 msec
ae-51-51.csw1.London1.Level3.net (4.69.139.88) 0 msec
2 ae-227-3603.edge3.London1.Level3.net (4.69.166.154) 0 msec
ae-117-3503.edge3.London1.Level3.net (4.69.166.138) 0 msec
ae-226-3602.edge3.London1.Level3.net (4.69.166.150) 32 msec
3 gblx-level3-50g.London1.Level3.net (4.68.110.158) 8 msec 4 msec 0 msec
4 ae6.scr4.LON3.gblx.net (67.17.106.150) [AS3549 {GBLX}] 0 msec 0 msec
ae5.scr3.LON3.gblx.net (67.17.72.22) [AS3549 {GBLX}] 4 msec
5 so5-0-0-2488M.ar1.NYC1.gblx.net (67.17.64.146) [AS3549 {GBLX}] 104 msec
so6-0-0-2488M.ar1.NYC1.gblx.net (67.17.64.154) [AS3549 {GBLX}] 68 msec 68 msec
6 BHARTIBSNL.so-7-0-0.ar1.NYC1.gblx.net (64.210.30.70) [AS3549 {GBLX}] 268 msec 268 msec 264 msec
7 218.248.255.101 [AS9829 {APNIC-AS-3-BLOCK}] 276 msec 272 msec 276 msec
8 117.207.48.1 [AS9829 {APNIC-AS-3-BLOCK}] 272 msec 280 msec 276 msec

 

Hop3 РLevel3, hop4 is Gblx (which is now owned by Level3), hop 5 is Gblx New York and hop 6 is BSNL router in New York. The target BSNL ip is coming from 117.207.48.0/20. Now interesting thing here is BSNL uses Level3 + Gblx both for transit. So return path via Gblx is not an issue but the path London > New York > India is surely an issue.

 

Looking for prefix 117.207.48.0/20 in Level3 London router:

BGP routing table entry for 117.207.48.0/20
Paths: (2 available, best #1)
3549 9829
AS-path translation: { GBLX APNIC-AS-3-BLOCK }
edge3.London1 (metric 20020)
Origin IGP, metric 100000, localpref 88, valid, internal, best
Community: Europe Lclprf_86 United_Kingdom Level3_Peer London 3549:4351 3549:7000 3549:30840
Originator: edge3.London1
3549 9829
AS-path translation: { GBLX APNIC-AS-3-BLOCK }
edge3.London1 (metric 20020)
Origin IGP, metric 100000, localpref 88, valid, internal
Community: Europe Lclprf_86 United_Kingdom Level3_Peer London 3549:4351 3549:7000 3549:30840
Originator: edge3.London1

Only two paths that too via Gblx. No direct return path. Again, it is not big issue since Gblx could have a return path right within London (or somewhere else in Europe).

 

Let’s check Gblx Europe router for entry for¬†117.207.48.0/20

 

route-server.ams2>show ip bgp 117.207.48.0/20 long
BGP table version is 176033437, local router ID is 67.17.81.187
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, x best-external
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i117.207.48.0/20 67.16.147.121 300 0 9829 i
* i 67.16.147.121 300 0 9829 i
route-server.ams2>

¬†Just one path. Doing a traceroute to see the actual path (since I don’t know where that next hop is located!) ūüôā

 

route-server.ams2>traceroute 117.207.48.1
Type escape sequence to abort.
Tracing the route to 117.207.48.1
1 67.16.147.121 0 msec 72 msec 4 msec
2 BHARTIBSNL.so-7-0-0.ar1.NYC1.gblx.net (64.210.30.70) 380 msec 376 msec 380 msec
3 218.248.255.101 [AS 9829] 376 msec 376 msec 376 msec
4 117.207.48.1 [AS 9829] 384 msec 384 msec 384 msec
route-server.ams2>

 

Clearly here’s the issue. BSNL¬†again¬†is doing selective BGP announcement of prefixes at New York only and that is why Europe to India traffic is being routed via New York. BSNL is allowing entry path into it’s network from outside India only at New York and few other selected locations which causes serious damage to latency.

 

Time for me to get back on work of routing packets! Thanks for reading. ūüôā