07 Mar

Confusing traceroutes and more

And here goes my first post for 2017. The start of this year did not go well as I broke my hand in Jan and that resulted in a lot of time loss. Now I am almost recovered and in much better condition. I just attended HKNOG 4.0 at Hong Kong followed by APRICOT 2017 at Ho Chi Minh, Vietnam. an event and I enjoyed the both. Here’s my presentation from APRICOT 2017.

I recently I came across some of crazy confusing traceroutes as passed by one of my friends. I cannot share that exact traceroute on this blog post but can produce the same effect about which I am posting by doing a trace from one of large network like Telia London PoP to one of the Indian destinations via their looking glass

Example traceroute:

Router: London 
Command: traceroute inet 45.64.213.161 as-number-lookup


traceroute to 45.64.213.161 (45.64.213.161), 30 hops max, 40 byte packets
 1  ldn-bb3-link.telia.net (80.91.246.96)  3.186 ms ldn-bb2-link.telia.net (80.91.247.93)  91.337 ms ldn-bb3-link.telia.net (213.155.132.194)  0.512 ms
 2  ldn-b7-link.telia.net (62.115.114.177)  0.541 ms ldn-b7-link.telia.net (62.115.137.189)  0.600 ms ldn-b7-link.telia.net (62.115.141.151)  0.786 ms
 3  flag-ic-310275-ldn-b7.c.telia.net (62.115.47.106)  3.271 ms  0.777 ms  0.702 ms
 4  xe-0-0-1.0-pjr04.mmb004.flagtel.com (85.95.26.158) [AS  15412]  210.160 ms xe-3-0-1.0.pjr04.mmb004.flagtel.com (85.95.25.138) [AS  15412]  207.552 ms  207.699 ms
 5  * * *
 6  * * *
 7  nsg-static-222.29.72.182.airtel.in (182.72.29.222) [AS  9498]  149.335 ms  148.746 ms  148.834 ms
 8  45.64.213.161 (45.64.213.161) [AS  132933]  158.350 ms  155.053 ms  146.613 ms

 

 

Here’s trace is as London (AS1299) > London (AS15412) > Mumbai (AS15412) >>>> Somewhere in India (AS9498) > destination (AS132933)

 

So traffic enters India via Reliance and next handed off to Airtel and reaches the destination. Let’s check BGP table view of same PoP for this prefix:

Telia Carrier Looking Glass - show route protocol bgp 45.64.213.161 table inet.0

Router: London 
Command: show route protocol bgp 45.64.213.161 table inet.0



inet.0: 695650 destinations, 1553714 routes (695458 active, 1034 holddown, 554 hidden)
+ = Active Route, - = Last Active, * = Both

45.64.213.0/24     *[BGP/170] 5d 01:25:46, MED 0, localpref 200, from 62.115.128.73
                      AS path: 15412 18101 132933 I, validation-state: unverified
                      to 213.155.132.194 via ae0.0
                      to 213.155.132.196 via ae1.0
                      to 80.91.246.96 via ae15.0
                      to 80.91.246.114 via ae16.0
                      to 213.155.136.74 via ae22.0
                      to 213.155.136.76 via ae23.0
                    > to 80.91.248.217 via ae3.0
                      to 80.91.246.144 via ae4.0
                      to 80.91.247.91 via ae5.0
                      to 80.91.249.181 via ae6.0
                      to 80.91.246.146 via ae7.0
                      to 80.91.247.93 via ae8.0
                    [BGP/170] 5d 01:25:46, MED 0, localpref 200, from 213.248.64.252
                      AS path: 15412 18101 132933 I, validation-state: unverified
                      to 213.155.132.194 via ae0.0
                      to 213.155.132.196 via ae1.0
                      to 80.91.246.96 via ae15.0
                      to 80.91.246.114 via ae16.0
                      to 213.155.136.74 via ae22.0
                      to 213.155.136.76 via ae23.0
                    > to 80.91.248.217 via ae3.0
                      to 80.91.246.144 via ae4.0
                      to 80.91.247.91 via ae5.0
                      to 80.91.249.181 via ae6.0
                      to 80.91.246.146 via ae7.0
                      to 80.91.247.93 via ae8.0
                    [BGP/170] 2w2d 19:39:07, MED 0, localpref 150
                      AS path: 6453 4755 132933 I, validation-state: unverified
                    > to 62.115.9.174 via ae25.0

 

So out of both available routes both are 15412 > 18101 > 132933 direct and there are no AS9498 while Airtel (AS9498) does appear in the traceroute. 2nd last hop in the trace is 182.72.29.222 and that indeed belongs to Airtel.

If we trust routing table as well as the fact that usually Airtel and Reliance exchange domestic traffic only and typically we do not see AS15412 pushing traffic via Airtel. This means trace is wrong and it indeed is. Before we get to on why it’s wrong to let’s try to understand how exactly traceroute works.

 

Working of traceroute

The way traceroute works is by using TTL i.e Time to live on packets the tool is sending out. IP headers carry TTL to prevent them for looping forever. So for instance, if router R1 sends some traffic to router R2 and R2 is not learning that route from anywhere while has a default back to R1 then traffic will start looping between R1 and R2. IP routing prevents this by using TTL and IP packets are sent with certain TTL value and as soon as they cross a router, TTL is decreased. When TTL is zero a router is supposed to drop the traffic and not carry them any further. When a router drops traffic it is supposed to reply back with error “TTL exceeded”.

Now the way “traceroute tool” works is by sending packets with increasing TTL one after other. It sends first one with TTL 1. Router directly connected to it gets the packet. It reduces TTL (and 1 – 1 so it becomes zero) and since next TTL is now zero it just drops prefix instead of sending it further. And as a part of dropping it replies back to a system running a trace with “TTL time exceeded error” revealing it’s IP to the tool. Next, another packet will be sent with TTL 2 and it will cross 1st router & would drop on a 2nd router with “TTL time exceeded” revealing it’s IP.

 

 

Back to our problem…

Now, so that was about the working of traceroute. Now going back to the case I was discussing. Think of routing between two networks when routing is not symmetric. With asymmetric routing, I mean that source & destinations may be carried via different paths.

 

Say e.g here A is sending traffic to B via R1-R2 and B is replying back to A via R3. Now if A does a trace to B, R1 & R2 may appear fine but what source IP B uses to convey the message of TTL exceeded can confuse things. When packets reach B with TTL 1, B decrements TTL and drops them. Next to send that “TTL timeout exceeded message” B has two options:

  1. B can reply back from IP address on the interface connected to R2. Remember I am talking about B just using source IP for TTL exceeded error.  Actual reply path, of course, is via R3
  2. B can reply back from IP of address of the interface connected to R3 using the usual logic of how packets go out – use the source IP of the interface of the best path installed in the router

 

What logic B uses has it’s own advantages and disadvantages. If B follows #1 i.e sends TTL exceeded from the same interface which is connected to R2 then it will give very logical traceroute output. But if network R3 is filtering packets based on BCP38, it will just drop the traffic coming from B from R2’s IP. While if B follows #2 it won’t cause any issues with BCP38 but will confuse the traceroute replies as suddenly one hop in trace will appear from entirely another network. That is what exactly has been happening in the trace I shared above. Let’s read trace again.

 1 ldn-bb3-link.telia.net (80.91.246.96) 3.186 ms ldn-bb2-link.telia.net (80.91.247.93) 91.337 ms ldn-bb3-link.telia.net (213.155.132.194) 0.512 ms
 2 ldn-b7-link.telia.net (62.115.114.177) 0.541 ms ldn-b7-link.telia.net (62.115.137.189) 0.600 ms ldn-b7-link.telia.net (62.115.141.151) 0.786 ms
 3 flag-ic-310275-ldn-b7.c.telia.net (62.115.47.106) 3.271 ms 0.777 ms 0.702 ms
 4 xe-0-0-1.0-pjr04.mmb004.flagtel.com (85.95.26.158) [AS 15412] 210.160 ms xe-3-0-1.0.pjr04.mmb004.flagtel.com (85.95.25.138) [AS 15412] 207.552 ms 207.699 ms
 5 * * *
 6 * * *
 7 nsg-static-222.29.72.182.airtel.in (182.72.29.222) [AS 9498] 149.335 ms 148.746 ms 148.834 ms
 8 45.64.213.161 (45.64.213.161) [AS 132933] 158.350 ms 155.053 ms 146.613 ms

 

Here router right before destination i.e on hop7 is connected to Reliance & Airtel. It’s announcing the prefix covering the destination to Reliance and Reliance is bringing traffic but it’s using Airtel to send traffic out back to London router of Telia. While replying for “TTL exceeded” router 7 is using source IP of Airtel and thus we see the PTR record pointing to Airtel. This can be referred as “Random factoid” behaviour in traceroute. This comes from RFC1812 which suggests “ICMP source must be from the egress iface” and  Richard Steenbergen puts its very nicely in his presentation at NANOG here.

Checko

So that’s all about it for now!

25 Nov

Peering with content networks in India

peering

One of frequent email and contact form message I get my blog is about available content networks in India and where one can peer. There are certain content networks in India and of course most of the content networks have open peering policy and are usually happy with direct inter-connection (we call as “peering“) with the ISP networks (often referred to as “eyeball networks”). Some of these networks have a backbone which connects back to their key datacenter locations on their own circuits via Singapore/Europe, some other have simply placed their caching server where cache fill happens over IP transit.

 

Based on publically known information across community and of course peeringdb, following content players are available in India and known to be open for peering:

  1. Google
  2. Microsoft
  3. Amazon
  4. Limelight

 

A quick list of these with datacenter names and locations as taken from Peeringdb record of these networks.

Organisation ASN City Datacenter Location
Amazon 16509 Mumbai GPX Mumbai Unit A-001, Boomerang Chandivali Farm Road, Near Chandivali Studio, Andheri East Mumbai, Mumbai, 400 051
Amazon 16509 Noida Sify Greenfort – Noida B7, Block A, Sector 132, Noida Expressway, Noida , UP 201304
Amazon 16509 Mumbai Tata Mumbai IDC LVSB, Opposite Kirti College
6th floor, Prabahdevi
Mumbai, MH, 400 028
Google 15169 Chennai Bharti Airtel Santhome Bharti Towers, 101 Santhome High Road, Chennai, 600 028
Google 15169 Mumbai GPX Mumbai Unit A-001, Boomerang Chandivali Farm Road, Near Chandivali Studio, Andheri East Mumbai, Mumbai, 400 051
Google 15169 Noida Sify Greenfort – Noida B7, Block A, Sector 132, Noida Expressway, Noida , UP 201304
Google 15169 Chennai TATA Communications Ltd 14th floor, 2nd block
4, Swami Sivanand Salai, Chennai, TN 600 002
Google 15169 Delhi Tata Delhi VSB, Bangla Sahib Road, New Delhi 110001
Google 15169 Mumbai Tata Mumbai IDC LVSB, Opposite Kirti College
6th floor, Prabhadevi
Mumbai, MH, 400 028
Limelight 55439 / 22822 Chennai Bharti Airtel Santhome Bharti Towers, 101 Santhome High Road, Chennai, 600 028
Limelight 55439 / 22822 Mumbai Netmagic Vikhroli Mehra Industrial Estate
LBS Marg, Vikhroli
Mumbai, 400 079
Microsoft 8075 Mumbai Bharti Airtel Mumbai Plot No, TPS-2, 14/3, 2nd floor
Dattatray Road, Linking Road Extension
Mumbai, 400054
Microsoft 8075 Chennai Bharti Airtel Santhome Bharti Towers, 101 Santhome High Road, Chennai, 600 028
Microsoft 8075 Chennai TATA Communications Ltd 14th floor, 2nd block
4, Swami Sivanand Salai, Chennai, TN 600 002
Microsoft 8075 Delhi Tata Communications Ltd – GK1 Greater Kailash-1
New Delhi, 110048
Microsoft 8075 Mumbai Tata Mumbai IDC LVSB, Opposite Kirti College
6th floor, Prabhadevi
Mumbai, MH, 400 028

 

Besides these Google also has an option of GGC, Akamai has an option of Akamai Caching server, Facebook has the option for caching server which is hosted inside ISP’s network and Netflix has an option for OCAs. Besides these networks there are known nodes of Verizon’s Edgecast in Delhi, Mumbai & Chennai (as per this map), Cloudflare has nodes in Delhi, Mumbai & Chennai (as per this map), PCH & K-root server have a node with Web Werks available on MCH peering fabric and Dyn has a node in Mumbai (as per this map).

Go ahead and peer as after all it all starts with a handshake. 🙂

06 Oct

K root server – Noida anycast and updates

K root in Noida seems to be not getting enough traffic from quite sometime and connectivity does seems bit broken. This is a blog post following up to Dyn’s excellent and detailed post about how TIC leaked the world famous 193.0.14.0/24 address space used by AS25152. It was good to read this post from RIPE NCC written by my friend Emile (and thanks to him for crediting me to signal about traffic hitting outside!)

 

The route leak…

TIC AS48159 was supposed to keep the route within it’s IGP but it leaked it to Omantel AS8529 – a large International backbone which propagated route leak further to global table. It was mistake at by both players primarily by TIC for leaking route.

 

If we look at IPv4 route propagation graph of Omatel AS8529 on Hurricane Electric BGP tool kit, it shows two import ASNs:

 

Omantel IPv4 routing

 

 

This has AS9498 (Bharti Airtel) and AS6453 (Tata Communications). Both of these are extremely important networks and two of large International and domestic IP transit providers in India. Very likely Omantel is customer of Bharti Airtel and if we look at IRR record of Airtel as published in their peeringdb record: AS9498:AS-BHARTI-IN

 

Anurags-MacBook-Pro:~ anurag$ whois -h whois.apnic.net AS9498:AS-BHARTI-IN |grep -w AS8529
members: AS38476,AS45219,AS45264,AS45283,AS45514,AS45451,AS37662,AS45491,AS7642,AS45517,AS45514:AS-TELEMEDIA-SMB,AS45609,AS38740,As131210,AS45335,AS23937,AS132045,AS8529,AS132486,AS8164,AS133967,AS37048
Anurags-MacBook-Pro:~ anurag$

 

This also confirms the same. Airtel did picked this route and since it was a customer route, it had a higher local preference then the peering route Airtel learnt from NIXI Noida peering with  K root. For now route leak fixed and Airtel seems to be having good routing with K root anycast instance in Noida.

 

Current status

From Tata Communications – it’s yet not picking announcement of K root anycast instance from Noida since their peering session at NIXI Noida has been down from long time. NIXI moved over from STPI to Netmagic Sector 63 Noida in August (see heavy drop of traffic in NIXI Noida graphs here). From that time onwards Tata’s domestic backbone AS4755’s peering session seems down.

NIXI Looking Glass - show ip bgp summary

Router: NIXI Delhi (Noida)

Command: show ip bgp summary


BGP router identifier 218.100.48.1, local AS number 24029
BGP table version is 541676, main routing table version 541676
10616 network entries using 1528704 bytes of memory
13657 path entries using 1092560 bytes of memory
1546/1197 BGP path/bestpath attribute entries using 210256 bytes of memory
1275 BGP AS-PATH entries using 40472 bytes of memory
566 BGP community entries using 22196 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 2894188 total bytes of memory
BGP activity 523875/512278 prefixes, 1016379/1001610 paths, scan interval 60 secs

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
  218.100.48.6    4        25152   35502  102431   541675    0    0 3w3d              1
  218.100.48.10   4        10029    8285   15774   541675    0    0 2d16h           194
  218.100.48.12   4         9583    4750    9899   541675    0    0 2d16h          1969
  218.100.48.13   4        17439  109297  191050   541675    0    0 9w5d             32
  218.100.48.15   4         9829     713    2669   541675    0    0 11:04:52        857
  218.100.48.17   4        17426    1205    3995   541675    0    0 19:57:29         17
  218.100.48.20   4         9498  190999  159646   541675    0    0 3w3d           7254
  218.100.48.21   4         4637   63761  141723   541675    0    0 6w2d              5
  218.100.48.23   4        63829   30808   80566   541675    0    0 2w5d              5
  218.100.48.25   4        17754   20071   50107   541675    0    0 1w5d            102
  218.100.48.26   4        18101   14641   29277   541675    0    0 5d00h           190
  218.100.48.27   4        17488   22887   58026   541675    0    0 2w0d            354
  218.100.48.28   4        55410   58592  107852   541675    0    0 2w2d           2637
  218.100.48.29   4        10201       0       0        1    0    0 2d08h    Active
  218.100.48.31   4        55836    9164   23591   541675    0    0 6d08h             7
  218.100.48.34   4        45528   38354  107593   541675    0    0 3w5d             18
  218.100.48.36   4       132215   27000   56646   541675    0    0 1w2d             15
  218.100.48.40   4       132453       0       0        1    0    0 2d07h    Idle

 

As per NIXI’s connected parties page, Tata Comm’s IP is 218.100.48.30. From NIXI’s looking glass there seems to no peer on that IP !

NIXI Looking Glass - show ip bgp neighbors 218.100.48.30 routes

Router: NIXI Delhi (Noida)

Command: show ip bgp neighbors 218.100.48.30 routes


% No such neighbor or address family

 

Hence for now Tata Comm isn’t getting route at all from Noida instance and that explains reason for bad outbound path.

 

Example of trace from Tata Comm to K root:

## AS4755/TATACOMM-AS - TATA Communications formerly VSNL is Leading ISP (2.7% of browser users in IN)
#prb:15840 dst:193.0.14.129
1 () 192.168.34.1 [0.344, 0.426, 17.445]
2 err:{u'x': u'*'}
3 (AS4755) 115.114.137.158.static-pune.vsnl.net.in [2.73, 2.916, 2.921] |Pune,Maharashtra,IN|
4 () 172.29.250.33 [5.659, 5.789, 6.274]
5 (AS6453) ix-0-100.tcore1.mlv-mumbai.as6453.net [5.143, 5.168, 5.755]
6 (AS6453) if-9-5.tcore1.wyn-marseille.as6453.net [125.474, 125.554, 125.596] |Marseille,Provence-Alpes-C?te d'Azur,FR|
7 (AS6453) if-2-2.tcore2.wyn-marseille.as6453.net [125.723, 125.739, 126.525] |Marseille,Provence-Alpes-C?te d'Azur,FR|
8 (AS6453) if-7-2.tcore2.fnm-frankfurt.as6453.net [126.535, 126.788, 127.22]
9 (AS6453) if-12-2.tcore1.fnm-frankfurt.as6453.net [125.75, 125.828, 125.871]
10 (AS6453) 195.219.156.146 [262.957, 265.3, 266.39]
11 (AS20485) spb03.transtelecom.net [297.919, 297.954, 302.452] |Saint-Petersburg,St.-Petersburg,RU|
12 (AS20485) selectel-gw.transtelecom.net [288.789, 296.574, 298.442]
13 (AS25152) k.root-servers.net [296.981, 297.042, 297.118]

 

even same stays for its downstream customers who have outbound via TCL:

## AS45528/TDN - Tikona Digital Networks Pvt Ltd. (1.4% of browser users in IN)
#prb:22793 dst:193.0.14.129
1 () 10.135.150.254 [0.521, 0.539, 0.814]
2 (AS45528) 1.22.55.185 [5.774, 7.721, 8.195]
3 (AS4755) 115.113.133.125.static-mumbai.vsnl.net.in [7.282, 14.754, 48.013] |Mumbai,Maharashtra,IN|
4 (AS6453) if-2-590.tcore2.l78-london.as6453.net [121.089, 122.755, 124.416] |London,England,GB|
5 (AS6453) if-2-2.tcore1.l78-london.as6453.net [121.828, 122.077, 123.869] |London,England,GB|
6 (AS6453) if-17-2.tcore1.ldn-london.as6453.net [120.716, 122.008, 122.768] |London,England,GB|
7 (AS6453) 195.219.83.10 [122.039, 123.532, 125.424]
8 (AS8468) te2-2.interxion.core.enta.net [125.262, 126.587, 127.04]
9 (AS8468) 188-39-11-66.static.enta.net [122.424, 123.028, 123.163]
10 (AS5459) ge0-1-101.tr1.linx.net [121.656, 124.826, 125.182] |London,England,GB|
11 (AS5459) fe3-0.tr4.linx.net [120.654, 120.721, 138.858] |London,England,GB|
12 (AS5459) g00.router.linx.k.ripe.net [123.306, 123.536, 125.486] |London,England,GB|
13 (AS25152) k.root-servers.net [121.285, 122.653, 122.942]

 

 

Another issue which is causing serious trouble around K root is the fact that it appears to be broken IP transit pipe of K root Noida. Due to the way NIXI works, K root must have a IP transit pipe. I pointed long back about broken connectivity of root DNS servers due return path problems. After that both K root and i root got transit but seems like after NIXI moved over, IP transit has been broken for current setup in Netmagic.

 

Why “local node” of root server needs IP transit?

It needs transit because:

    1. NIXI has a weird pricing of “x-y” where requester pays and this leads to a quite high settlement amount for a network which has a high inbound traffic (eyeball network) – even few x times than that of transit! (paying 5Rs/GB!). This leads to scenario where networks do “partial prefix announcement” to keep their traffic balanced (or slightly in outbound direction) to avoid high settlement cost. Hence most of such eyeball networks announce their regional routes but avoid all routes while they still do learn K root’s route and inject in their IGP.This leads in case where K root’s 193.0.14.0/24 is leant by networks in West and South India and hence there’s a forward path from customers >>> K root Noida node. Now since these networks aren’t announcing their West or South Indian routes at NIXI Noida, there’s no return path for packets. Thus for root DNS to stay operationally stable (which they should since they are critical) they must have transit / default route to return packets as last resort to IP’s which aren’t visible via peering.
    2. Similar case of some other random leaked routes. E.g if a large ISP decided to learn K root route and announce to customer’s table thus leading to Customer > Large network > K root Noida path while not announcing that customer’s route at NIXI resulting in no return path.

 

 

So in short – It does needs transit but just for outbound traffic, not for announcing routes on the transit.

I have informed of broken connectivity issue to RIPE NCC and their team is actively working on the fix. Hopefully it would be fixed very soon!

 

With hope that your DNS is not getting resolved from other side of world, good night! 🙂

 

Disclaimer: As usual – thoughts & comments are completely personal.