07 Mar

Confusing traceroutes and more

And here goes my first post for 2017. The start of this year did not go well as I broke my hand in Jan and that resulted in a lot of time loss. Now I am almost recovered and in much better condition. I just attended HKNOG 4.0 at Hong Kong followed by APRICOT 2017 at Ho Chi Minh, Vietnam. an event and I enjoyed the both. Here’s my presentation from APRICOT 2017.

I recently I came across some of crazy confusing traceroutes as passed by one of my friends. I cannot share that exact traceroute on this blog post but can produce the same effect about which I am posting by doing a trace from one of large network like Telia London PoP to one of the Indian destinations via their looking glass

Example traceroute:

 

 

Here’s trace is as London (AS1299) > London (AS15412) > Mumbai (AS15412) >>>> Somewhere in India (AS9498) > destination (AS132933)

 

So traffic enters India via Reliance and next handed off to Airtel and reaches the destination. Let’s check BGP table view of same PoP for this prefix:

 

So out of both available routes both are 15412 > 18101 > 132933 direct and there are no AS9498 while Airtel (AS9498) does appear in the traceroute. 2nd last hop in the trace is 182.72.29.222 and that indeed belongs to Airtel.

If we trust routing table as well as the fact that usually Airtel and Reliance exchange domestic traffic only and typically we do not see AS15412 pushing traffic via Airtel. This means trace is wrong and it indeed is. Before we get to on why it’s wrong to let’s try to understand how exactly traceroute works.

 

Working of traceroute

The way traceroute works is by using TTL i.e Time to live on packets the tool is sending out. IP headers carry TTL to prevent them for looping forever. So for instance, if router R1 sends some traffic to router R2 and R2 is not learning that route from anywhere while has a default back to R1 then traffic will start looping between R1 and R2. IP routing prevents this by using TTL and IP packets are sent with certain TTL value and as soon as they cross a router, TTL is decreased. When TTL is zero a router is supposed to drop the traffic and not carry them any further. When a router drops traffic it is supposed to reply back with error “TTL exceeded”.

Now the way “traceroute tool” works is by sending packets with increasing TTL one after other. It sends first one with TTL 1. Router directly connected to it gets the packet. It reduces TTL (and 1 – 1 so it becomes zero) and since next TTL is now zero it just drops prefix instead of sending it further. And as a part of dropping it replies back to a system running a trace with “TTL time exceeded error” revealing it’s IP to the tool. Next, another packet will be sent with TTL 2 and it will cross 1st router & would drop on a 2nd router with “TTL time exceeded” revealing it’s IP.

 

 

Back to our problem…

Now, so that was about the working of traceroute. Now going back to the case I was discussing. Think of routing between two networks when routing is not symmetric. With asymmetric routing, I mean that source & destinations may be carried via different paths.

 

Say e.g here A is sending traffic to B via R1-R2 and B is replying back to A via R3. Now if A does a trace to B, R1 & R2 may appear fine but what source IP B uses to convey the message of TTL exceeded can confuse things. When packets reach B with TTL 1, B decrements TTL and drops them. Next to send that “TTL timeout exceeded message” B has two options:

  1. B can reply back from IP address on the interface connected to R2. Remember I am talking about B just using source IP for TTL exceeded error.  Actual reply path, of course, is via R3
  2. B can reply back from IP of address of the interface connected to R3 using the usual logic of how packets go out – use the source IP of the interface of the best path installed in the router

 

What logic B uses has it’s own advantages and disadvantages. If B follows #1 i.e sends TTL exceeded from the same interface which is connected to R2 then it will give very logical traceroute output. But if network R3 is filtering packets based on BCP38, it will just drop the traffic coming from B from R2’s IP. While if B follows #2 it won’t cause any issues with BCP38 but will confuse the traceroute replies as suddenly one hop in trace will appear from entirely another network. That is what exactly has been happening in the trace I shared above. Let’s read trace again.

 

Here router right before destination i.e on hop7 is connected to Reliance & Airtel. It’s announcing the prefix covering the destination to Reliance and Reliance is bringing traffic but it’s using Airtel to send traffic out back to London router of Telia. While replying for “TTL exceeded” router 7 is using source IP of Airtel and thus we see the PTR record pointing to Airtel. This can be referred as “Random factoid” behaviour in traceroute. This comes from RFC1812 which suggests “ICMP source must be from the egress iface” and  Richard Steenbergen puts its very nicely in his presentation at NANOG here.

Checko

So that’s all about it for now!

26 Sep

Should Google pay to Airtel for “data interconnection” charges?

Yesterday I had a discussion with a friend from Airtel after long time. For some strange reason discussion topic was changed to old statements from Bharti Airtel’s executives that companies like Google, Facebook, Yahoo etc should pay to ISPs like Airtel for “data interconnection”. The argument goes more for Google then any other company. Statements from Airtel can be found here and here

 

So what’s the real argument?

Companies like Airtel who have built a “physical infrastructure” feel that companies like Google should pay to them since they are putting so much of traffic on their networks. Airtel feels that services like YouTube take significant amount of bandwidth and thus requires and infrastructure from core, middle mile to edge part of network and all that needs significant investment. Similarly there was another argument from Mr Sunil Mittal about fact that Facebook is enjoying on top of infrastructure which ISPs like Airtel have created.

 

At this point one can ask – to whom Google pays, who provides “internet” to Google & how exactly Internet functions in this manner?

Well, based on what I have heard so far from my US & Europe based ISP friends – Google is considered as Tier1 ISP in the ecosystem and they do not pay to anyone for transit. With that being said one has to understand the meaning of “they do not pay to anyone for transit” means there is no financial exchange involved on layer 3 (remember old college class on OSI model?). But however to really get into a position where Google can work fine without paying to anyone on layer 3, needs lots and lots of peering. And to achieve that one needs Point of Presence (PoP) at lots and lots of locations. Again with nature of applications Google is running (like say Google search, YouTube, Gmail etc) all these PoPs need to be connected at very high capacity pipes. And thus this answers the logic – Google does pays a lot on layer 2 to ISPs and other fiber infrastructure provider for layer 2 optical transport. This enables them to have own backbone with very high capacity pipes and eventually exist in exchanges local to most of ISPs to promote peering. Most of tier 2 and tier 3 ISPs in US, Europe & Asia are very much interested in peering with Google because if they don’t do so, they would be eventually reaching Google AS 15169 via their upstream transit connection. To save on those costs, it is always a good deal for small to mid size ISPs (infact just everyone) to peer with Google. 

As of now, Google seems publicly connected to 163 networks (and quite a few more in private). List can be see using Hurricane Electric tool here. Google is present in over a dozen of exchanges to facilitate peering. One can find list on peering db entry for Google’s AS 15169 here.

 

So should Google pay for “bandwidth”?

Well, it seems Google is already paying for bandwidth! It’s just that it on layer 2 rather then layer 3. Apart from that a very strong counter argument from Google/Facebook could be that they are the ones which makes the “dumb pipes” usable. Their services are the ones for which end user pay to ISPs and thus they also play a very important role in current ecosystem of internet. Moreover they are in very strong position here. E.g say if tomorrow Airtel breaks peering session from Google and “demands” charges for interconnection and Google denies that. Considering fact that Airtel is not tier 1 ISP, the packets from Airtel will still be routed to Google AS9498 -> AS15169 via Airtel’s upstreams like Level3, Qwest, Telia etc and return path will also go via Airtel’s upstreams because Google is tier 1 and since Airtel is a customer of quite a few tier 1 ISPs, path will turn out completely via other big ISPs. And who will loose here? Airtel! They would be paying a ton for upstream bandwidth in Western region along with consuming expensive submarine bandwidth when Google was itself willing to get content to Airtel in local region. 

 

How come Google seems in stronger position then big ISPs?

The answer is simple – Google’s popularity among end users and the network design they choose. E.g say if not Airtel but a tier 1 ISP like Level 3 breaks link to Google demanding for “data interconnection” charges. Now since it will be a disconnect of two tier 1 ISPs, Internet will behave pretty much paritioned. None of them pays to anyone for upstream bandwidth and since none of them is a “customer” of any other ISP, they will be virtually disconnected. Packets from Level 3 will not reach Google, and so does the return path. And since other big backbones like AT&T, Verizon etc have a peering (rather then transit) relationship, they could help either and packets won’t have a path via them at all. Who will complain to ISP? Customer! What is easy – to replace ISP connection or search/video/mail service? 🙂 

The other side of this is also fact that since Google peers with lot of small to mid sized ISPs, it makes big ISPs less relevant and depends very less on them to reach users on networks other then who are explicitly on those big networks.  Google’s peering policy promotes active peering. Policy can be found here. Google actively peers with networks exchanging as low as 100Mbps traffic apart from bi-lateral BGP session at exchanges for over 100Mbps aggregated data exchange. Thus Airtel is not really paying for Google’s traffic. Airtel is paying to keep its own infrastructure up for it’s own end users who do pay to Airtel and that is what we call as “business”! 🙂

 

Fun fact – Airtel itself is in consortium with Google for submarine cable Unity which goes from Los Angles, California to Tokyo, Japan. A couple of simple traceroutes show who is carrying traffic:

 

Trace from Airtel, New Delhi router to one of IPs of Google.com:

Wed Sep 26 11:07:03 GMT+05:30 2012
traceroute 173.194.36.35

Type escape sequence to abort.
Tracing the route to bom04s02-in-f3.1e100.net (173.194.36.35)

1 *
182.79.254.238 [MPLS: Label 485601 Exp 0] 0 msec *
2 182.79.254.245 0 msec
182.79.254.249 0 msec
182.79.254.245 0 msec
3 72.14.223.210 44 msec 44 msec 40 msec
4 66.249.95.106 [AS 15169] 44 msec 44 msec 40 msec
5 66.249.94.244 [AS 15169] 48 msec 48 msec 48 msec
6 209.85.241.189 [AS 15169] 44 msec 44 msec 40 msec
7 bom04s02-in-f3.1e100.net (173.194.36.35) [AS 15169] 44 msec 40 msec 40 msec
DEL-ISP-MPL-ACC-RTR-9#

 

Clearly Airtel carried traffic till South India, and next we see AS15169 routers taking care of rest. Does Google has all content required for “google.com” operation in India? Well no. They do keep on syncing servers, updating data and all that goes over their own private portion for net. For provider like Airtel, all it needs here is effort to route packets from North India to South India where Google’s routers are placed. Similarly if we check for trace from Airtel’s London router to google.com’s IP 173.194.35.3:

Wed Sep 26 11:14:18 GMT+05:30 2012
traceroute 173.194.35.3
traceroute to 173.194.35.3 (173.194.35.3), 30 hops max, 40 byte packets
1 64.211.193.85 (64.211.193.85) 0.556 ms 11.731 ms 0.418 ms
2 74.125.51.189 (74.125.51.189) 3.062 ms 0.389 ms 72.14.198.173 (72.14.198.173) 10.230 ms
3 209.85.252.188 (209.85.252.188) 0.490 ms 0.368 ms 0.422 ms
4 209.85.253.92 (209.85.253.92) 6.573 ms 0.943 ms 0.859 ms
MPLS Label=358083 CoS=4 TTL=1 S=1
5 209.85.243.33 (209.85.243.33) 11.933 ms 41.853 ms 7.227 ms
MPLS Label=600800 CoS=4 TTL=1 S=1
6 209.85.241.229 (209.85.241.229) 22.056 ms 15.454 ms 15.309 ms
MPLS Label=605209 CoS=4 TTL=1 S=1
7 72.14.232.79 (72.14.232.79) 25.005 ms 24.959 ms 24.413 ms
8 209.85.241.65 (209.85.241.65) 44.321 ms 37.930 ms 24.719 ms
9 mil01s16-in-f3.1e100.net (173.194.35.3) 24.426 ms 24.399 ms 24.690 ms

{master}
lookingglass@LON-ISP-IGW-RTR-35-RE1>

 

Clearly traffic goes from AS9498 to AS15169 on 2nd hop in less then 5ms. Thus again, Google’s router seems present in same facility (or very very close) to Airtel’s London router.

 

One interesting point here is Google uses DNS based CDN. Logic here is that DNS runs with anycast and thus you always hit local DNS servers, but the IP of application server is local (in general) and can be changed as per predefined algorithms. Thus one does not hits Google.com Indian node when looking from India due to anycasting but because Google’s DNS servers in India carry A records pointing to India instances. Thus, you will get different IP when you hit Google.com from US, Europe & Asia. 

So, the IP I get for Google.com is basically “Indian IP”  most of which seems coming from BGP announcement for 173.194.36.0/24. Now since I know this is Indian IP, let’s look at Oregon university route views for routing table entries for this prefix:

 

route-views>show ip bgp 173.194.36.0/24 long
BGP table version is 3507053534, local router ID is 128.223.51.103
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
* 173.194.36.0/24 194.85.102.33 0 3277 15169 i
* 206.24.210.102 0 3561 3356 15169 i
* 4.69.184.193 0 0 3356 15169 i
* 129.250.0.11 5 0 2914 1299 15169 i
* 208.74.64.40 0 19214 12989 15169 i
* 209.124.176.223 0 101 101 15169 i
* 216.218.252.164 0 6939 15169 i
* 144.228.241.130 0 1239 15169 i
* 203.62.252.186 0 1221 15169 i
* 157.130.10.233 0 701 15169 i
* 203.181.248.168 0 7660 15169 i
* 194.85.40.15 0 3267 15169 i
* 193.0.0.56 0 3333 3356 15169 i
* 154.11.11.113 0 0 852 15169 i
* 66.110.0.86 0 6453 15169 i
* 154.11.98.225 0 0 852 15169 i
* 207.172.6.20 0 0 6079 15169 i
* 89.149.178.10 10 0 3257 15169 i
* 66.185.128.48 6 0 1668 15169 i
* 217.75.96.60 0 0 16150 15169 i
* 207.46.32.34 0 8075 15169 i
* 202.249.2.86 0 7500 2497 15169 i
* 195.66.232.239 0 5459 15169 i
* 66.59.190.221 0 6539 15169 i
* 69.31.111.244 0 0 4436 15169 i
* 208.51.134.254 1 0 3549 15169 i
* 164.128.32.11 0 3303 15169 i
* 134.222.87.1 0 286 15169 i
* 207.172.6.1 0 0 6079 15169 i
* 202.232.0.2 0 2497 15169 i
* 12.0.1.63 0 7018 15169 i
* 128.223.253.10 0 3582 3701 15169 i
*> 114.31.199.1 0 0 4826 15169 i
route-views>

 

Single hop in most of cases and we see AS15169 is originating prefixes not any other network. Also, it does not seems like 2-3 upstream based network within India else path would be via AS9498 > AS15169, AS6453 > AS15169 and AS15412 > AS15169 (Airtel, Tata & Reliance) but it is not really the case here. We also see path via HE’s AS6939 > AS15169. Now HE is not present in India, then for sure it is handling traffic to Google outside India and this proves the logic that Google is itself taking care for International routing of packets. 

 

Also, a traceroute from Airtel’s London PoP to Google.com Indian IP: 

Wed Sep 26 11:16:46 GMT+05:30 2012
traceroute 173.194.36.35
traceroute to 173.194.36.35 (173.194.36.35), 30 hops max, 40 byte packets
1 64.211.193.85 (64.211.193.85) 1.650 ms 0.591 ms 0.425 ms
2 72.14.198.173 (72.14.198.173) 8.859 ms 0.358 ms 74.125.51.189 (74.125.51.189) 0.573 ms
3 209.85.240.63 (209.85.240.63) 22.330 ms 209.85.240.61 (209.85.240.61) 10.032 ms 0.512 ms
4 216.239.43.6 (216.239.43.6) 76.116 ms 216.239.46.224 (216.239.46.224) 78.707 ms 216.239.43.6 (216.239.43.6) 106.906 ms
MPLS Label=434462 CoS=4 TTL=1 S=1
5 72.14.236.146 (72.14.236.146) 77.000 ms 76.169 ms 72.14.236.152 (72.14.236.152) 76.076 ms
MPLS Label=7041 CoS=4 TTL=1 S=1
6 72.14.235.12 (72.14.235.12) 83.610 ms 84.370 ms 83.786 ms
MPLS Label=6576 CoS=4 TTL=1 S=1
7 72.14.239.64 (72.14.239.64) 96.803 ms 72.14.239.66 (72.14.239.66) 100.290 ms 91.948 ms
MPLS Label=5652 CoS=4 TTL=1 S=1
8 72.14.239.83 (72.14.239.83) 157.260 ms 72.14.239.81 (72.14.239.81) 138.627 ms 72.14.239.83 (72.14.239.83) 138.597 ms
MPLS Label=379763 CoS=4 TTL=1 S=1
9 64.233.174.177 (64.233.174.177) 242.499 ms 243.070 ms 242.210 ms
MPLS Label=374201 CoS=4 TTL=1 S=1
10 209.85.255.37 (209.85.255.37) 285.752 ms 244.542 ms 242.481 ms
MPLS Label=652013 CoS=4 TTL=1 S=1
11 64.233.175.2 (64.233.175.2) 246.304 ms 64.233.175.0 (64.233.175.0) 246.886 ms 64.233.175.2 (64.233.175.2) 263.090 ms
MPLS Label=389588 CoS=4 TTL=1 S=1
12 66.249.94.93 (66.249.94.93) 311.500 ms 66.249.94.105 (66.249.94.105) 355.443 ms 356.515 ms
13 66.249.94.73 (66.249.94.73) 342.597 ms 343.927 ms 72.14.239.23 (72.14.239.23) 340.597 ms
MPLS Label=726194 CoS=4 TTL=1 S=1
14 72.14.232.92 (72.14.232.92) 367.021 ms 72.14.232.101 (72.14.232.101) 368.278 ms 72.14.232.92 (72.14.232.92) 377.651 ms
15 209.85.241.189 (209.85.241.189) 367.582 ms 369.910 ms 369.931 ms
16 bom04s02-in-f3.1e100.net (173.194.36.35) 369.245 ms 367.977 ms 368.935 ms

{master}
lookingglass@LON-ISP-IGW-RTR-35-RE1>

 

Now since Google does not put the rDNS pointers for IP giving any clue of location, I do some guess work here based on latency values. Path is like London (hop3) > New York / New Yark (hop 4) > Palo Alto (hop8) > Singapore or Japan (hop 9) > Chennai/Mumbai (hop12).

How much of this part is covered by Airtel? Well, none! From London itself it’s all on Google’s MPLS and Airtel is not paying for it.  

 

Another example here of AT&T. Let’s look at path from AT&T router in US to Google.com’s Indian IP:

route-server>traceroute 173.194.36.35

Type escape sequence to abort.
Tracing the route to bom04s02-in-f3.1e100.net (173.194.36.35)

1 gateway.cbbtier3.att.net (12.0.1.202) [AS 7018] 424 msec 28 msec 4 msec
2 n54ny401me3-cbbtier3.ip.att.net (12.89.5.13) [AS 7018] 4 msec 4 msec 0 msec
3 cr2.n54ny.ip.att.net (12.122.115.74) [MPLS: Label 17130 Exp 1] 8 msec
cr1.n54ny.ip.att.net (12.122.131.170) [MPLS: Label 17017 Exp 1] 8 msec 4 msec
4 gar1.chsct.ip.att.net (12.122.105.117) 8 msec
gar1.chsct.ip.att.net (12.122.105.57) 4 msec
gar1.chsct.ip.att.net (12.122.105.117) 8 msec
5 12.249.88.6 [AS 7018] 4 msec 4 msec 4 msec
6 72.14.239.46 [AS 15169] 4 msec 4 msec 4 msec
7 209.85.251.88 [AS 15169] [MPLS: Label 368292 Exp 4] 0 msec
209.85.252.2 [AS 15169] [MPLS: Label 345131 Exp 4] 8 msec 8 msec
8 72.14.239.93 [AS 15169] [MPLS: Label 1186 Exp 4] 12 msec 8 msec 12 msec
9 72.14.235.12 [AS 15169] [MPLS: Label 9836 Exp 4] 16 msec 16 msec 16 msec
10 72.14.239.66 [AS 15169] [MPLS: Label 15897 Exp 4] 24 msec 24 msec 24 msec
11 72.14.239.83 [AS 15169] [MPLS: Label 423811 Exp 4] 76 msec 72 msec
72.14.239.81 [AS 15169] [MPLS: Label 437959 Exp 4] 72 msec
12 64.233.174.177 [AS 15169] [MPLS: Label 740776 Exp 4] 168 msec
64.233.174.179 [AS 15169] [MPLS: Label 494505 Exp 4] 168 msec
64.233.174.177 [AS 15169] [MPLS: Label 740776 Exp 4] 184 msec
13 209.85.255.35 [AS 15169] [MPLS: Label 473847 Exp 4] 168 msec
209.85.255.59 [AS 15169] [MPLS: Label 496983 Exp 4] 172 msec
209.85.255.35 [AS 15169] [MPLS: Label 473847 Exp 4] 168 msec
14 64.233.175.0 [AS 15169] [MPLS: Label 477363 Exp 4] 176 msec
64.233.175.2 [AS 15169] [MPLS: Label 747299 Exp 4] 176 msec
64.233.175.0 [AS 15169] [MPLS: Label 477363 Exp 4] 176 msec
15 66.249.94.105 [AS 15169] 236 msec 240 msec
66.249.94.93 [AS 15169] 240 msec
16 66.249.94.73 [AS 15169] [MPLS: Label 733058 Exp 4] 276 msec
72.14.239.23 [AS 15169] [MPLS: Label 728930 Exp 4] 268 msec
66.249.94.73 [AS 15169] [MPLS: Label 732850 Exp 4] 268 msec
17 72.14.232.92 [AS 15169] 296 msec 292 msec 316 msec
18 209.85.241.189 [AS 15169] 476 msec 404 msec 408 msec
19 bom04s02-in-f3.1e100.net (173.194.36.35) [AS 15169] 400 msec 504 msec 408 msec
route-server>

 

From hop 6 till hop 19, it’s all Google’s network. At&t is handling off packets very close to exit point as per latency of 4ms here. One very important fact here is that except Google and few others, almost all content companies PAY to ISPs, and ISPs get money from both ends which they do deserve when it’s all their network from origination to end point. Companies like Netflix, Twitter, Vimeo pay a significant amount to ISPs on layer 3 directly in form of upstream transit. 

 

With hop that Airtel will keep putting good infrastructure for us to connect to AS 15169, time for me to get start my day here in India! 🙂