# Confusing traceroutes and more

And here goes my first post for 2017. The start of this year did not go well as I broke my hand in Jan and that resulted in a lot of time loss. Now I am almost recovered and in much better condition. I just attended HKNOG 4.0 at Hong Kong followed by APRICOT 2017 at Ho Chi Minh, Vietnam. an event and I enjoyed the both. Here’s my presentation from APRICOT 2017. I recently I came across some of crazy confusing traceroutes as passed by one of my friends. I cannot share that exact traceroute on this blog post but can produce the same effect about which I am posting by doing a trace from one of large network like Telia London PoP to one of the Indian destinations via their looking glass

Example traceroute:

Router: London
Command: traceroute inet 45.64.213.161 as-number-lookup
traceroute to 45.64.213.161 (45.64.213.161), 30 hops max, 40 byte packets
3  flag-ic-310275-ldn-b7.c.telia.net (62.115.47.106)  3.271 ms  0.777 ms  0.702 ms
4  xe-0-0-1.0-pjr04.mmb004.flagtel.com (85.95.26.158) [AS  15412]  210.160 ms xe-3-0-1.0.pjr04.mmb004.flagtel.com (85.95.25.138) [AS  15412]  207.552 ms  207.699 ms
5  * * *
6  * * *
7  nsg-static-222.29.72.182.airtel.in (182.72.29.222) [AS  9498]  149.335 ms  148.746 ms  148.834 ms
8  45.64.213.161 (45.64.213.161) [AS  132933]  158.350 ms  155.053 ms  146.613 ms


Here’s trace is as London (AS1299) > London (AS15412) > Mumbai (AS15412) »» Somewhere in India (AS9498) > destination (AS132933)   So traffic enters India via Reliance and next handed off to Airtel and reaches the destination.

Let’s check BGP table view of same PoP for this prefix:

Telia Carrier Looking Glass - show route protocol bgp 45.64.213.161 table inet.0
Router: London
Command: show route protocol bgp 45.64.213.161 table inet.0
inet.0: 695650 destinations, 1553714 routes (695458 active, 1034 holddown, 554 hidden)
+ = Active Route, - = Last Active, * = Both
45.64.213.0/24     *[BGP/170] 5d 01:25:46, MED 0, localpref 200, from 62.115.128.73
AS path: 15412 18101 132933 I, validation-state: unverified
to 213.155.132.194 via ae0.0
to 213.155.132.196 via ae1.0
to 80.91.246.96 via ae15.0
to 80.91.246.114 via ae16.0
to 213.155.136.74 via ae22.0
to 213.155.136.76 via ae23.0
> to 80.91.248.217 via ae3.0
to 80.91.246.144 via ae4.0
to 80.91.247.91 via ae5.0
to 80.91.249.181 via ae6.0
to 80.91.246.146 via ae7.0
to 80.91.247.93 via ae8.0
[BGP/170] 5d 01:25:46, MED 0, localpref 200, from 213.248.64.252
AS path: 15412 18101 132933 I, validation-state: unverified
to 213.155.132.194 via ae0.0
to 213.155.132.196 via ae1.0
to 80.91.246.96 via ae15.0
to 80.91.246.114 via ae16.0
to 213.155.136.74 via ae22.0
to 213.155.136.76 via ae23.0
> to 80.91.248.217 via ae3.0
to 80.91.246.144 via ae4.0
to 80.91.247.91 via ae5.0
to 80.91.249.181 via ae6.0
to 80.91.246.146 via ae7.0
to 80.91.247.93 via ae8.0
[BGP/170] 2w2d 19:39:07, MED 0, localpref 150
AS path: 6453 4755 132933 I, validation-state: unverified
> to 62.115.9.174 via ae25.0


So out of both available routes both are 15412 > 18101 > 132933 direct and there are no AS9498 while Airtel (AS9498) does appear in the traceroute. 2nd last hop in the trace is 182.72.29.222 and that indeed belongs to Airtel. If we trust routing table as well as the fact that usually Airtel and Reliance exchange domestic traffic only and typically we do not see AS15412 pushing traffic via Airtel. This means trace is wrong and it indeed is. Before we get to on why it’s wrong to let’s try to understand how exactly traceroute works.

#### Working of traceroute

The way traceroute works is by using TTL i.e Time to live on packets the tool is sending out. IP headers carry TTL to prevent them for looping forever. So for instance, if router R1 sends some traffic to router R2 and R2 is not learning that route from anywhere while has a default back to R1 then traffic will start looping between R1 and R2. IP routing prevents this by using TTL and IP packets are sent with certain TTL value and as soon as they cross a router, TTL is decreased. When TTL is zero a router is supposed to drop the traffic and not carry them any further. When a router drops traffic it is supposed to reply back with error “TTL exceeded”. Now the way “traceroute tool” works is by sending packets with increasing TTL one after other. It sends first one with TTL 1. Router directly connected to it gets the packet. It reduces TTL (and 1 - 1 so it becomes zero) and since next TTL is now zero it just drops prefix instead of sending it further. And as a part of dropping it replies back to a system running a trace with “TTL time exceeded error” revealing it’s IP to the tool. Next, another packet will be sent with TTL 2 and it will cross 1st router & would drop on a 2nd router with “TTL time exceeded” revealing it’s IP.

#### Back to our problem…

Now, so that was about the working of traceroute. Now going back to the case I was discussing. Think of routing between two networks when routing is not symmetric. With asymmetric routing, I mean that source & destinations may be carried via different paths.

Say e.g here A is sending traffic to B via R1-R2 and B is replying back to A via R3. Now if A does a trace to B, R1 & R2 may appear fine but what source IP B uses to convey the message of TTL exceeded can confuse things. When packets reach B with TTL 1, B decrements TTL and drops them. Next to send that “TTL timeout exceeded message” B has two options:

1. B can reply back from IP address on the interface connected to R2. Remember I am talking about B just using source IP for TTL exceeded error.  Actual reply path, of course, is via R3
2. B can reply back from IP of address of the interface connected to R3 using the usual logic of how packets go out - use the source IP of the interface of the best path installed in the router

What logic B uses has it’s own advantages and disadvantages. If B follows #1 i.e sends TTL exceeded from the same interface which is connected to R2 then it will give very logical traceroute output. But if network R3 is filtering packets based on BCP38, it will just drop the traffic coming from B from R2’s IP. While if B follows #2 it won’t cause any issues with BCP38 but will confuse the traceroute replies as suddenly one hop in trace will appear from entirely another network. That is what exactly has been happening in the trace I shared above. Let’s read trace again.

 1 ldn-bb3-link.telia.net (80.91.246.96) 3.186 ms ldn-bb2-link.telia.net (80.91.247.93) 91.337 ms ldn-bb3-link.telia.net (213.155.132.194) 0.512 ms