12 Oct

UKNOF32 – Analysis of F-root placement using RIPE Atlas

Enjoyed ISC’s presentation about their analysis of F root server (one 13 root DNS servers which power the Internet) about anycast performance gloablly for 192.5.5.0/24 announced (and anycasted) by AS3557 (ISC). This was presentation at UKNOF 32.

Embedded presentation below (or click here to watch on YouTube directly)

 

09 Nov

Using BGP communities to influence routing

Some free time here in Europe and thus time for another quick blog post & to take my mind away from depressing people!

One of impressive features of major European networks is support for BGP communities. In India it’s almost non-existent. Setting it up isn’t hard technically but from capacity management side, Indian ISPs are somewhat shy in setting it up.

 

Let’s put a case where we have a Customer router (R1 with AS1), upstream of customer (R2 with AS2), upstream of upstream (R3 with AS3), peer of upstream (R4 with router4). Let’s try to setup communities so that customer at AS1 can control his BGP announcements and announce some prefixes to AS3 and some to AS4 selectively to control inbound traffic flow. 

 

Screen Shot 2013-11-08 at 3.26.23 pm

All of them are peering with basic simple BGP session. AS1 is announcing 8.8.8.0/24 and 9.9.9.0/24 to R2 and wishes to announce 8.8.8.0/24 to R3 and R4 while 9.9.9.0/24 just to R4. 

 

Now this selective announcement thing will be done at R2 but triggered by R1 based on community tags. 

Here provider R2 will provide say following community strings:

3000 – for announcement to R3 only
4000 – for announcement to R4 only

 

If route is not tagged with any community, it will be announced to both (default behavior of BGP/upstream). 

Before putting any community here’s what we can see on all routers:

 

R1#sh ip bgp
BGP table version is 3, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
*> 8.8.8.0/24 0.0.0.0 0 32768 i
*> 9.9.9.0/24 0.0.0.0 0 32768 i
R1#

R2#sh ip bgp
BGP table version is 26, local router ID is 1.1.1.2
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
*> 8.8.8.0/24 1.1.1.1 0 0 1 i
*> 9.9.9.0/24 1.1.1.1 0 0 1 i
R2#

 

R3>sh ip bgp
BGP table version is 15, local router ID is 1.1.1.6
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
*> 8.8.8.0/24 1.1.1.5 0 2 1 i
*> 9.9.9.0/24 1.1.1.5 0 2 1 i
R3>

 

R4>sh ip bgp
BGP table version is 13, local router ID is 1.1.1.10
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
*> 8.8.8.0/24 1.1.1.9 0 2 1 i
*> 9.9.9.0/24 1.1.1.9 0 2 1 i
R4>

  

So basic interface conf and BGP conf seems all good. Now setting up community rules on R2 for announcement:

 

Now here’s simple logic I will put on R1 to tag routes: 

  1. Route1 will have a route-map “rmap” which will match for given prefix and will set community based on that.
  2. Given prefix match in step #1 will be done using IP prefix list. 

 

R1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)#ip pre
R1(config)#ip prefix-list ?
WORD Name of a prefix list
sequence-number Include/exclude sequence numbers in NVGEN

R1(config)#ip prefix-list prefix-list1 ?
R1(config)#ip prefix-list prefix-list1 permit 8.8.8.0/24 

R1(config)#ip prefix-list prefix-list2 permit 9.9.9.0/24
R1(config)#

R1(config-route-map)#match ip address prefix-list prefix-list1
R1(config-route-map)#set community 3000
R1(config-route-map)#exit
R1(config)#route-map rmap permit 20
R1(config-route-map)#match ip address prefix-list prefix-list2
R1(config-route-map)#set community 4000
R1(config-route-map)#exit
R1(config)#

R1(config)#router bgp 1
R1(config-router)#neighbor 1.1.1.2 send-community
R1(config-router)#neighbor 1.1.1.2 route-map rmap out
R1(config-router)#end
R1#wr
Building configuration…
[OK]
R1#
*Nov 8 19:04:21.810: %SYS-5-CONFIG_I: Configured from console by console
R1#clear bgp all 2
R1#
*Nov 8 19:04:28.898: %BGP-5-ADJCHANGE: neighbor 1.1.1.2 Down User reset
*Nov 8 19:04:29.486: %BGP-5-ADJCHANGE: neighbor 1.1.1.2 Up

 

 

Now let’s check R2 on what it is getting:

R2#sh ip bgp 8.8.8.0
BGP routing table entry for 8.8.8.0/24, version 56
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
1 2
1
1.1.1.1 from 1.1.1.1 (1.1.1.1)
Origin IGP, metric 0, localpref 100, valid, external, best
Community: 3000
R2#sh ip bgp 9.9.9.0
BGP routing table entry for 9.9.9.0/24, version 55
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Advertised to update-groups:
1
1
1.1.1.1 from 1.1.1.1 (1.1.1.1)
Origin IGP, metric 0, localpref 100, valid, external, best
Community: 4000
R2#

 

All good! 🙂

 

So now R2 is getting communities. Next logical step is setup of R2 to announce prefixes with community 3000 to R3 and 4000 to R4. 

 

Next logical steps:

  1.  Create community list defining communities 3000 and 4000.
  2. Connect these lists with route-map.
  3. Add route-map on BGP neighbors. 

 

Here we go!

R2#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#ip community-list 1 permit 3000
R2(config)#ip community-list 2 permit 4000
R2(config)#

R2(config)#route-map rmap1 permit 10
R2(config-route-map)#match community 1
R2(config-route-map)#exit
R2(config)#route-map rmap2 permit 10
R2(config-route-map)#match community 2
R2(config-route-map)#exit
R2(config)#

R2(config)#router bgp 2
R2(config-router)#neighbor 1.1.1.6 route-map rmap1 out
R2(config-router)#neighbor 1.1.1.10 route-map rmap2 out
R2(config-router)#end
R2#wr
Building configuration…
[OK]
R2#c
*Nov 8 19:16:44.394: %SYS-5-CONFIG_I: Configured from console by consol
R2#
R2#
R2#clear bgp all 3
R2#clear bgp all 4
R2#

 

 

Checking BGP announcements to each peer now:

R2#sh ip bgp neighbors 1.1.1.6 advertised-routes
BGP table version is 56, local router ID is 1.1.1.2
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
*> 8.8.8.0/24 1.1.1.1 0 0 1 i

Total number of prefixes 1
R2#sh ip bgp neighbors 1.1.1.10 advertised-routes
BGP table version is 56, local router ID is 1.1.1.2
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
*> 9.9.9.0/24 1.1.1.1 0 0 1 i

Total number of prefixes 1
R2#

 

 

And cross checking on each R3 and R4:

 

R3>sh ip bgp
BGP table version is 34, local router ID is 1.1.1.6
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
*> 8.8.8.0/24 1.1.1.5 0 2 1 i
R3>

 

Only 8.8.8.0/24 is visible while on R3. Similarly on R4:

R4>sh ip bgp
BGP table version is 44, local router ID is 1.1.1.10
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
*> 9.9.9.0/24 1.1.1.9 0 2 1 i
R4>

 

You can find config of each R1, R2, R3 and R4  for reference. Also checkout One Step Consulting page with BGP communities used by some major networks. 

Time to end this blog post with a nice motivating poem from Former Prime Minister of India (sung in Hindi by popular Indian singer). 

 

 

01 Jun

BSNL > Softlayer connectivity problem & possible fix

It’s late night here in India. I am having final 8th semester exams and as usual really bored! 

Though this time we have interesting subjects but still syllabus is pretty boring spreading across multiple books, notes and pdf’s. Anyways I will be out of college after June which sounds good.

 

Tonight, I found a routing glitch. Yes a routing glitch!! 🙂

These issues somehow keep my life in orbit and give a good understanding on how routing works over the Internet.

 

 

OK – so the issue

I noticed a really bad (forward) route from my BSNL’s connection to hostgator.in website hosted in Softlayer Singapore. Let’s look at forward path:

anurag:~ anurag$ traceroute -a hostgator.in
traceroute to hostgator.in (216.12.194.67), 64 hops max, 52 byte packets
1 [AS65534] router.home (10.10.0.1) 1.189 ms 0.910 ms 0.810 ms
2 [AS9829] 117.220.160.1 (117.220.160.1) 17.707 ms 21.147 ms 16.925 ms
3 [AS9829] 218.248.169.126 (218.248.169.126) 30.195 ms 29.766 ms 29.976 ms
4 [AS9829] 218.248.250.82 (218.248.250.82) 75.432 ms 77.488 ms 76.761 ms
5 [AS6453] if-11-1-1.mcore3.laa-losangeles.as6453.net (209.58.85.5) 368.104 ms 303.206 ms 309.964 ms
6 [AS6453] if-10-2-0-14.tcore2.lvw-losangeles.as6453.net (216.6.84.6) 309.070 ms 308.725 ms 310.073 ms
7 [AS6453] 216.6.84.66 (216.6.84.66) 317.050 ms 318.714 ms 398.408 ms
8 [AS2914] ae-5.r21.lsanca03.us.bb.gin.ntt.net (129.250.5.85) 305.672 ms * 304.480 ms
9 [AS2914] as-2.r20.osakjp01.jp.bb.gin.ntt.net (129.250.3.202) 414.205 ms
[AS2914] as-1.r21.tokyjp01.jp.bb.gin.ntt.net (129.250.3.146) 485.451 ms
[AS2914] as-2.r20.osakjp01.jp.bb.gin.ntt.net (129.250.3.202) 414.272 ms
10 [AS2914] ae-3.r24.tokyjp05.jp.bb.gin.ntt.net (129.250.6.188) 381.221 ms
[AS2914] ae-1.r23.osakjp01.jp.bb.gin.ntt.net (129.250.2.49) 420.412 ms
[AS2914] ae-3.r25.tokyjp05.jp.bb.gin.ntt.net (129.250.6.192) 372.768 ms
11 [AS2914] ae-7.r25.tokyjp05.jp.bb.gin.ntt.net (129.250.3.223) 394.899 ms
[AS2914] ae-7.r24.tokyjp05.jp.bb.gin.ntt.net (129.250.3.221) 406.922 ms
[AS2914] ae-2.r00.tokyjp03.jp.bb.gin.ntt.net (129.250.2.5) 491.190 ms
12 [AS2914] ae-3.r00.tokyjp03.jp.bb.gin.ntt.net (129.250.4.233) 399.065 ms
[AS2914] xe-0-0-0.bbr01.eq01.tok01.networklayer.com (61.213.145.38) 307.955 ms
[AS2914] ae-2.r00.tokyjp03.jp.bb.gin.ntt.net (129.250.2.5) 392.937 ms
13 [AS2914] xe-0-0-0.bbr01.eq01.tok01.networklayer.com (61.213.145.38) 310.298 ms
[AS36351] ae1.bbr01.eq01.sng02.networklayer.com (50.97.18.165) 306.396 ms
[AS2914] xe-0-0-0.bbr01.eq01.tok01.networklayer.com (61.213.145.38) 407.191 ms
14 [AS36351] ae5.dar01.sr03.sng01.networklayer.com (50.97.18.197) 388.660 ms
[AS36351] ae5.dar02.sr03.sng01.networklayer.com (50.97.18.199) 303.546 ms 409.645 ms
15 [AS36351] po2.fcr01.sr03.sng01.networklayer.com (174.133.118.133) 407.589 ms
[AS36351] ae5.dar02.sr03.sng01.networklayer.com (50.97.18.199) 310.587 ms
[AS36351] po2.fcr01.sr03.sng01.networklayer.com (174.133.118.133) 305.969 ms
16 [AS36351] po2.fcr01.sr03.sng01.networklayer.com (174.133.118.133) 363.405 ms * 309.151 ms
17 * * *
18 * * *

 

BSNL (India) >> IPLC circuit >> Tata AS6453 Los Angeles, California >> NTT (US) >> NTT (Asia) >> NTT (Tokyo) >> Softlayer (Tokyo) >> Softlayer (Singapore)

Wow!

Pretty bad. Ideally route should be BSNL > Upstream – Tata/Reliance/Airtel/Vodafone > Singapore (that’s it. Over!)

 

Interesting enough that Softlayer operates a nice looking glass and hence I was able to trace return path to my home router from there to get idea of complete route.

bbr02.eq01.sng02> traceroute 117.220.163.128
HOST: bbr02.eq01.sng02-re0 Loss% Snt Last Avg Best Wrst StDev
1. 63.218.213.173 0.0% 5 0.4 0.5 0.4 0.5 0.0
2. 63.218.228.65 0.0% 5 0.6 0.6 0.5 0.7 0.0 <<< PCCW Global
3. 120.29.215.33 0.0% 5 11.1 7.4 0.6 12.7 5.1 <<< Tata AS6453
4. 120.29.214.13 0.0% 5 0.6 2.4 0.6 9.6 4.0
5. 180.87.12.9 0.0% 5 62.1 61.2 60.7 62.1 0.6
6. 180.87.12.54 0.0% 5 97.7 73.3 60.8 97.7 17.4
7. 180.87.36.33 0.0% 5 103.2 75.0 59.6 103.2 18.1
8. 180.87.38.74 0.0% 5 61.1 74.0 61.1 88.9 12.4 <<< Tata AS6453
9. 115.114.131.138 0.0% 5 91.7 92.6 91.7 96.3 2.0 <<<< VSNL AS4755
10. 218.248.255.101 0.0% 5 95.5 96.5 95.5 99.7 1.8 <<<< Hits BSNL AS9829
11. 218.248.169.117 0.0% 5 106.6 110.4 106.4 126.2 8.8
12. 218.248.169.117 0.0% 5 106.3 107.0 106.3 108.6 1.0
13. ???

 

 

Overall pretty good and direct. Basically latency value is also as we expect till hop 12 because forward route (i.e from BSNL > Softlayer) is direct from BSNL router on hop 12 but for routers below it they are taking route via US. Return path trace is not showing those routers because BSNL is dropping ICMP.

 

Reason for problem:

Forward path is terribly bad here because BSNL let usual BGP route selection algorithm to deal with it. Basically BSNL is getting multiple routes for that prefix from Softlayer. One from it’s IP port in India with Tata-VSNL AS4755 and other from it’s port from Tata in Los Angles (Tata AS6453) over IPLC.

 

So possible routes as per AS paths are:

AS9829 > AS4755 > AS6453 > AS2914 > AS36351 

AS9829 > AS6453 > AS2914 > AS36351

 

Based on default property of BGP, it is picking short AS path i.e 2nd one. In case of #1 BGP session between BSNL AS9829 and Tata-VSNL AS4755 is within India. 

For example:

1 [AS65534] router.home (10.10.0.1) 1.709 ms 0.912 ms 0.982 ms
2 [AS9829] 117.220.160.1 (117.220.160.1) 17.451 ms 18.075 ms 19.029 ms
3 [AS9829] 218.248.169.122 (218.248.169.122) 21.843 ms 24.584 ms 22.491 ms
4 [AS4755] 115.114.57.165.static-mumbai.vsnl.net.in (115.114.57.165) 57.399 ms 58.563 ms 57.446 ms

 

Very likely BGP session here is configured on usual /30 subnet with one IP on BSNL side, one on Tata’s side, third one as broadcast and 4th lying useless due to Math game!

So 115.114.57.165 is part of that /30. Let’s ping it:

anurag:~ anurag$ ping -c 5 115.114.57.165
PING 115.114.57.165 (115.114.57.165): 56 data bytes
64 bytes from 115.114.57.165: icmp_seq=0 ttl=58 time=63.286 ms
64 bytes from 115.114.57.165: icmp_seq=1 ttl=58 time=66.029 ms
64 bytes from 115.114.57.165: icmp_seq=2 ttl=58 time=59.063 ms
64 bytes from 115.114.57.165: icmp_seq=3 ttl=58 time=59.439 ms
64 bytes from 115.114.57.165: icmp_seq=4 ttl=58 time=61.719 ms

— 115.114.57.165 ping statistics —
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 59.063/61.907/66.029/2.573 ms
anurag:~ anurag$

 

60ms latency – for sure Mumbai and all good here.

 

Now let’s look at IP just next to it:

 

anurag:~ anurag$ ping -c 5 115.114.57.166
PING 115.114.57.166 (115.114.57.166): 56 data bytes
64 bytes from 115.114.57.166: icmp_seq=0 ttl=251 time=28.784 ms
64 bytes from 115.114.57.166: icmp_seq=1 ttl=251 time=25.586 ms
64 bytes from 115.114.57.166: icmp_seq=2 ttl=251 time=28.631 ms
64 bytes from 115.114.57.166: icmp_seq=3 ttl=251 time=26.905 ms
64 bytes from 115.114.57.166: icmp_seq=4 ttl=251 time=26.213 ms

— 115.114.57.166 ping statistics —
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 25.586/27.224/28.784/1.282 ms
anurag:~ anurag$

 

Half latency and that’s BSNL router in Delhi/Noida where they are taking drop from Tata. It’s BSNL’s router but sitting on Tata’s IP for BGP session. So this clearly tells that when we see routes from AS9829 to AS4755 Tata-VSNL they are between routers within India.

 

Now coming back to bad route between BSNL and Softlayer, in that case first few hops are:

1 [AS65534] router.home (10.10.0.1) 1.189 ms 0.910 ms 0.810 ms
2 [AS9829] 117.220.160.1 (117.220.160.1) 17.707 ms 21.147 ms 16.925 ms
3 [AS9829] 218.248.169.126 (218.248.169.126) 30.195 ms 29.766 ms 29.976 ms
4 [AS9829] 218.248.250.82 (218.248.250.82) 75.432 ms 77.488 ms 76.761 ms
5 [AS6453] if-11-1-1.mcore3.laa-losangeles.as6453.net (209.58.85.5) 368.104 ms 303.206 ms 309.964 ms

 

Hop 5 has latency of 300ms (usual for India > US routes). Again assuming 209.58.85.5 is coming from /30 and as per usual BSNL practice next IP in that subnet i.e 209.58.85.6 would be on BSNL’s side, let’s ping 209.58.85.6:

anurag:~ anurag$ ping -c 5 209.58.85.6
PING 209.58.85.6 (209.58.85.6): 56 data bytes
64 bytes from 209.58.85.6: icmp_seq=0 ttl=250 time=373.483 ms
64 bytes from 209.58.85.6: icmp_seq=1 ttl=250 time=395.493 ms
64 bytes from 209.58.85.6: icmp_seq=2 ttl=250 time=419.340 ms
64 bytes from 209.58.85.6: icmp_seq=3 ttl=250 time=305.460 ms
64 bytes from 209.58.85.6: icmp_seq=4 ttl=250 time=362.598 ms

— 209.58.85.6 ping statistics —
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 305.460/371.275/419.340/38.232 ms
anurag:~ anurag$

 

 

Hmm….300ms latency. Unexpected. I thought this router was in India but this seems slightly complex. Likely BGP session here is using BSNL’s /30 subnet and not via Tata Comm’s subnet. 

OK – let’s see last IP from BSNL on that trace – it was 218.248.250.82. Let’s ask Tata AS6453 Los Angles LAA router via AS6453 Looking Glass for BGP table:

 

Router: gin-laa-mcore3
Site: US, Los angeles, LAA
Command: show ip bgp 218.248.250.82

BGP routing table entry for 218.248.240.0/20
Bestpath Modifiers: deterministic-med
Paths: (2 available, best #1)
14 16 17 18
9829
ix-3-2.mcore3.LAA-LosAngeles. from ix-3-2.mcore3.LAA-LosAngeles. (218.248.254.99)
Origin IGP, valid, external, best
Community:
9829, (received-only)
ix-3-2.mcore3.LAA-LosAngeles. from ix-3-2.mcore3.LAA-LosAngeles. (218.248.254.99)
Origin IGP, valid, external

 

So BGP route is via – 218.248.254.99

Let’s trace:

traceroute to 218.248.254.99 (218.248.254.99), 64 hops max, 52 byte packets
1 router.home (10.10.0.1) 4.047 ms 0.875 ms 0.958 ms
2 117.220.160.1 (117.220.160.1) 18.779 ms 17.490 ms 19.334 ms
3 218.248.169.126 (218.248.169.126) 44.040 ms 32.802 ms 29.831 ms
4 218.248.250.174 (218.248.250.174) 82.626 ms 87.126 ms 84.243 ms
5 218.248.255.99 (218.248.255.99) 86.061 ms 85.503 ms 83.003 ms

 

Here we go!

So clearly BSNL on 218.248.255.99 is placed in India and is having a BGP session with Tata AS6453 router in Los Angeles. This is over an IPLC circuit of Tata Communications. 

 

Possible fix…

Following an amazing quote – “Never call it a problem unless you have the solution!

So problem here is not really via Tata’s network. They are just selling bandwidth in form of two products – IP Transit & IPLC. It’s BSNL’s wrong idea of using IPLC carelessly. Likely BSNL won’t care or put much effort in fixing it. 

There can be a possible fix from Softlayer side. If they blackhole prefix announcement to BSNL AS9829 via Tata AS6453, BSNL will never pick their IPLC (or even IP) route. Instead they will just pick route via any other upstream like Airtel or Reliance Globalcom.  

 

Let’s look at relationship of Tata AS6453 with PCCW Global (upstream for Softlayer)

anurag:~ anurag$ whois -h whois.radb.net as6453 | grep -w AS3491
import: from AS3491 action pref = 100; accept AS-CAIS
export: to AS3491 announce AS-GLOBEINTERNET
import: from AS3491 action pref = 100; accept AS-CAIS
export: to AS3491 announce AS-GLOBEINTERNET
anurag:~ anurag$

 

Clearly both are peering! 

Based on presentation from Mr Amit Dunga (from Tata Communications) at SANOG, here’s list of BGP communities used by Tata AS6453:

Screen Shot 2013-06-01 at 12.30.35 AM

 

 

Thus if Softlayer could get it’s upstream providers (like PCCW in this specific case) to use 65009:9829 – this will ensure that route learnt by Tata AS6453 from PCCW Global AS3491 is NOT exported to BSNL AS9829. Thus BSNL will instead get route via Bharti Airtel AS9498 or Reliance AS18101.

 

I just sent this detailed info as email to Softlayer and BSNL. And oh yes – I don’t know why hostgator.in is hosted in Softlayer Singapore anyways. They provide hosting in India out of Ctrls datacenter. Why they host their own home site in Singapore is something beyond my understanding!

 

With hopes that your packets to Singapore are not routing via US, time for me to get back to my “cramming” for exams. 🙂