Analysis: Inconsistent latency between two end points

An interesting evening here in village. From today sessional tests started at college and so does my blog posts too (to keep myself with positive energy!) ;)

 

Learned something new while troubleshooting. :)

I am used to getting latency of ~350ms with my server in Europe as I have mentioned in my past blog posts.

My connection > Server goes direct but return path goes via US and this is what increases latency. Today all of sudden I saw latency of 200ms with my server. 150ms less - that’s significant.

Immediately I got idea that BSNL has changed BGP announcement and likely announcing prefix at EU to have direct return path. To confirm that I connected to my server and shooted a traceroute  It gave me a strange result of latency over 350ms as it has been. I pinged my home router from server and latency was still 350ms. While from other side i.e my home connection > server ping was 200ms. Very very strange!

Remember “packets CAN take different path but whatever they follow” - that stays same. So if home > server is 200ms (for SURE not via US) then how come server > home is via US and 350ms? Based on my understanding even if forward and return route is different - the round trip latency stays same

I decided to collect some more data and confirm my observation. Thus I ran 1000 packets ping from both points - home to server and server to home.


Results: 

Home > Server ping:

1000 packets transmitted, 1000 received, 0% packet loss, time 999739ms
rtt min/avg/max/mdev = 197.128/201.314/294.035/11.347 ms

Server > Home ping:

1000 packets transmitted, 960 received, 4% packet loss, time 999710ms
rtt min/avg/max/mdev = 319.060/377.464/499.294/18.203 ms
You have new mail in /var/mail/root

clearly Home > Server is 200ms while server > home is 320ms. What seems going strange here? 

I did a trace from both end points to compare latency by each hop, though clearly return path is via US.


Home to server trace:

anurag@laptop:~$ traceroute 178.238.225.247 -A
traceroute to 178.238.225.247 (178.238.225.247), 30 hops max, 60 byte packets
1 router.local (10.0.0.1) [AS1] 3.578 ms 3.894 ms 4.234 ms
2 117.200.48.1 (117.200.48.1) [AS9829] 30.015 ms 32.443 ms 35.071 ms
3 218.248.173.46 (218.248.173.46) [AS9829] 37.402 ms 39.812 ms 43.527 ms
4 115.254.1.138 (115.254.1.138) [AS18101] 50.175 ms 52.586 ms 55.038 ms
5 115.255.252.57 (115.255.252.57) [AS18101] 80.995 ms 84.568 ms 92.177 ms
6 62.216.147.101 (62.216.147.101) [AS15412] 321.376 ms 287.380 ms 299.725 ms
7 xe-8-3-0.0.pjr04.mmb004.flagtel.com (85.95.26.69) [AS15412] 74.802 ms 76.024 ms 77.518 ms
8 xe-0-0-0.0.pjr04.ldn001.flagtel.com (85.95.25.186) [AS15412] 362.197 ms 364.614 ms 367.255 ms
9 xe-11-0-0.edge5.London1.Level3.net (212.187.138.53) [AS3356/AS9057] 365.825 ms 368.374 ms 370.728 ms
10 ae-52-52.csw2.London1.Level3.net (4.69.139.120) [AS3356] 383.136 ms 386.773 ms 387.971 ms
11 ae-57-222.ebr2.London1.Level3.net (4.69.153.133) [AS3356] 391.669 ms ae-59-224.ebr2.London1.Level3.net (4.69.153.141) [AS3356] 384.353 ms ae-58-223.ebr2.London1.Level3.net (4.69.153.137) [AS3356] 390.391 ms
12 ae-24-24.ebr2.Frankfurt1.Level3.net (4.69.148.198) [AS3356] 367.976 ms 368.210 ms ae-23-23.ebr2.Frankfurt1.Level3.net (4.69.148.194) [AS3356] 374.705 ms
13 ae-82-82.csw3.Frankfurt1.Level3.net (4.69.140.26) [AS3356] 368.563 ms ae-62-62.csw1.Frankfurt1.Level3.net (4.69.140.18) [AS3356] 371.392 ms ae-82-82.csw3.Frankfurt1.Level3.net (4.69.140.26) [AS3356] 380.242 ms
14 ae-71-71.ebr1.Frankfurt1.Level3.net (4.69.140.5) [AS3356] 366.064 ms ae-81-81.ebr1.Frankfurt1.Level3.net (4.69.140.9) [AS3356] 367.639 ms ae-61-61.ebr1.Frankfurt1.Level3.net (4.69.140.1) [AS3356] 375.044 ms
15 ae-1-19.bar1.Munich1.Level3.net (4.69.153.245) [AS3356] 388.552 ms 390.943 ms 393.417 ms
16 GIGA-HOSTIN.bar1.Munich1.Level3.net (62.140.24.126) [AS9057/AS3356] 222.769 ms 225.136 ms 227.559 ms
17 server7 (178.238.225.247) [AS51167] 231.750 ms 234.424 ms 236.893 ms

(clearly latency drop between 15th and 16th hop. Giving clue of different return path)


Server > Home trace:

root@server7:~# traceroute 117.200.61.87
traceroute to 117.200.61.87 (117.200.61.87), 30 hops max, 60 byte packets
1 gw.giga-dns.com (91.194.90.1) 0.725 ms 0.719 ms 0.705 ms
2 host-93-104-204-33.customer.m-online.net (93.104.204.33) 0.694 ms 0.678 ms 0.669 ms
3 xe-1-1-0.rt-decix-2.m-online.net (82.135.16.102) 7.859 ms 7.855 ms 7.853 ms
4 xe-1-1-0.rt-decix-2.m-online.net (82.135.16.102) 7.592 ms 7.597 ms 7.591 ms
5 213.198.72.237 (213.198.72.237) 8.048 ms 8.048 ms 8.271 ms
6 ae-5.r21.frnkge03.de.bb.gin.ntt.net (129.250.4.162) 7.776 ms ae-2.r20.frnkge04.de.bb.gin.ntt.net (129.250.5.217) 8.073 ms ae-5.r21.frnkge03.de.bb.gin.ntt.net (129.250.4.162) 7.820 ms
7 ae-0.r20.frnkge04.de.bb.gin.ntt.net (129.250.2.13) 8.067 ms ae-1.r21.asbnva02.us.bb.gin.ntt.net (129.250.3.20) 98.758 ms ae-0.r20.frnkge04.de.bb.gin.ntt.net (129.250.2.13) 8.042 ms
8 ae-0.r20.asbnva02.us.bb.gin.ntt.net (129.250.4.4) 93.975 ms ae-1.r21.asbnva02.us.bb.gin.ntt.net (129.250.3.20) 113.841 ms ae-0.r20.asbnva02.us.bb.gin.ntt.net (129.250.4.4) 93.847 ms
9 ae-0.r20.asbnva02.us.bb.gin.ntt.net (129.250.4.4) 99.587 ms 93.837 ms 93.829 ms
10 ae-2.r04.lsanca03.us.bb.gin.ntt.net (129.250.5.70) 168.760 ms 168.761 ms 167.983 ms
11 ae-2.r04.lsanca03.us.bb.gin.ntt.net (129.250.5.70) 167.236 ms xe-0-1-0-10.r04.lsanca03.us.ce.gin.ntt.net (198.172.90.222) 159.727 ms ae-2.r04.lsanca03.us.bb.gin.ntt.net (129.250.5.70) 160.751 ms
12 xe-0-1-0-10.r04.lsanca03.us.ce.gin.ntt.net (198.172.90.222) 163.235 ms 160.475 ms 165.715 ms
13 115.254.1.137 (115.254.1.137) 340.748 ms 344.682 ms 338.766 ms
14 115.254.1.137 (115.254.1.137) 336.735 ms 218.248.255.101 (218.248.255.101) 335.243 ms 115.254.1.137 (115.254.1.137) 338.721 ms
15 218.248.173.41 (218.248.173.41) 345.967 ms 218.248.255.101 (218.248.255.101) 335.266 ms 342.750 ms
16 218.248.173.41 (218.248.173.41) 342.260 ms 345.005 ms 343.014 ms
17 218.248.173.41 (218.248.173.41) 347.736 ms * 353.451 ms
18 * * *
19 * * *
20 * * *

This brings me back to question - how come when return path is via US then latency is 200ms when pinging from home? 

All of sudden I thought of multiple subnets on server! (YES YES YES!)

There are multiple subnets configured and they have IPs belonging to completely different range and infact different ASNs and BGP announcements (here we go!). Now let’s call them IP1 and IP2, where IP1 is default on server and gateway from IP1 range is default route for server while IP2 is secondary. So far I was pinging IP2. Let’s ping IP1 from home:

5 packets transmitted, 5 received, 0% packet loss, time 4000ms
rtt min/avg/max/mdev = 396.381/399.734/403.037/2.617 ms

Clearly expected latency!! :)


Explanation:

Here’s what was happening - Server has two IPs - IP1 and IP2. IP1 is default and is covered in BGP announcement of M-Online (German ISP) while IP2 is secondary and is covered in BGP announcement of datacenter itself (they recently got a ASN and run own autonomous network). M-online is relatively big ISP and has transit from multiple providers including Level3. Telia, lot of peering etc. While datacenter’s network has almost no peering but transit from Level3 and Telia. Thus IP1 (primary) is from M-online and server’s default route also points to M-Online gateway.

But since IP is from different subnet covered under different BGP announcement - it follows gateway of datacenter itself. IP1 was prefering route via Level3 which is following route as: Level3 (EU) > Level3 (US) > Reliance-FLAG (US) > BSNL (India) - hence a trip via US with high latency. But for IP2 which is from dataceneter’s network - they seem to be prefering Telia (rather then Level3). Now Telia seems to be having better relation with Tata AS6453 which is also one of upstream transit providers for BSNL other then Reliance. Also, Tata has BSNL’s prefixes covered on their border routers within London and hence followed path for IP2 is: Datacenter > Telia (Europe) > Tata (Europe) > Tata (India) > Tata-VSNL (India) > BSNL. This is better path and gives relatively low latency.

TeliaSonera Looking Glass - traceroute inet 117.200.61.87 as-number-lookup

Router: Munich
Command: traceroute inet 117.200.61.87 as-number-lookup

traceroute to 117.200.61.87 (117.200.61.87), 30 hops max, 40 byte packets
1 ffm-bb1-link.telia.net (213.155.134.12) 7.760 ms 8.963 ms ffm-bb1-link.telia.net (80.91.248.30) 9.625 ms
2 ffm-b2-link.telia.net (80.91.252.168) 8.028 ms ffm-b2-link.telia.net (80.91.246.225) 8.049 ms ffm-b2-link.telia.net (80.91.252.168) 18.213 ms
3 teleglobe-122701-ffm-b2.telia.net (213.248.69.38) 8.105 ms 8.014 ms 8.120 ms
4 if-3-2.tcore1.PVU-Paris.as6453.net (80.231.153.53) [AS 6453] 135.317 ms if-5-2.tcore1.PVU-Paris.as6453.net (80.231.153.121) [AS 6453] 137.277 ms if-3-2.tcore1.PVU-Paris.as6453.net (80.231.153.53) [AS 6453] 153.583 ms
MPLS Label=332640 CoS=0 TTL=1 S=1
5 if-2-2.tcore1.PYE-Paris.as6453.net (80.231.154.18) [AS 6453] 145.658 ms if-12-2.tcore1.PYE-Paris.as6453.net (80.231.154.69) [AS 6453] 135.219 ms if-2-2.tcore1.PYE-Paris.as6453.net (80.231.154.18) [AS 6453] 131.225 ms
MPLS Label=522146 CoS=0 TTL=1 S=1
6 if-8-1600.tcore1.WYN-Marseille.as6453.net (80.231.217.5) [AS 6453] 135.192 ms 149.243 ms 135.519 ms
MPLS Label=646803 CoS=0 TTL=1 S=1
7 if-9-5.tcore1.MLV-Mumbai.as6453.net (80.231.217.18) [AS 6453] 152.803 ms 138.552 ms 171.287 ms
8 180.87.38.74 (180.87.38.74) [AS 6453] 133.772 ms 134.308 ms 134.257 ms
9 115.114.59.170.static-Mumbai.vsnl.net.in (115.114.59.170) [AS 4755] 179.789 ms 179.780 ms 179.786 ms
10 218.248.255.101 (218.248.255.101) [AS 9829] 186.435 ms 176.686 ms 176.350 ms
11 218.248.173.41 (218.248.173.41) [AS 9829] 184.978 ms 185.500 ms 194.190 ms
12 218.248.173.41 (218.248.173.41) [AS 9829] 185.008 ms 184.857 ms 187.702 ms
13 * * *
14 * * *

This is why home > server Vs server > home latency is different. Case closed! :)

Heart of issue:

Core cause of all these strange results is BSNL itself. At one end they purchasing transit in India from Indian ISPs like Tata. This works well since they purchase transit = they announce their routes to Tata-VSNL and get global table from Tata-VSNL. Next Tata-VSNL announces all these prefixes to Tata Comm main Tier 1 backbone AS6453 which announces these prefixes consistantly across their border routers in US, Europe and Asia. While at the same time BSNL does crazy purchases of IPLC (International Private Leased Circuit) and connect to ISP’s routers outside India over a layer 2 leased circuit. Here BSNL’s router in India makes a BGP session with upstream’s router outside India. This is OK BUT should be done everywhere. Since BSNL is using IPLC setup only with US - this tends to make a situation where there are few paths to enter BSNL’s network and most of them via New York or Los Angles routers of their upstreams. Either BSNL has to stop using IPLC and simply purchase bandwidth in India or has to setup BGP sessions in Europe and Asia as well. 


Comparison of setup

Let’s pickup transit provider from whom BSNL purchases within India - Tata. Here clear identification is that BSNL connects to Tata-VSNL AS4755 and AS4755 further connects to Tata AS6453. AS4755 is only used within India and AS6453 is used everywhere else except India. In India generally AS6453 connects only to AS4755. So AS4755 > AS9829 happens only in India. :)

So lookup of routing table from Tata’s router in Mumbai, London, New York and Singapore:

Router: gin-mlv-core1
Site: IN, Mumbai - MLV, VSNL
Command: show ip bgp 117.200.61.87

BGP routing table entry for 117.200.48.0/20
Bestpath Modifiers: deterministic-med
Paths: (3 available, best #3)
Multipath: eBGP
4755 9829
mlv-tcore1. (metric 1) from cxr-tcore1. (66.110.10.113)
Origin IGP, valid, internal
Community:
Originator: 66.110.10.202
4755 9829
mlv-tcore1. (metric 1) from mlv-tcore2. (66.110.10.215)
Origin IGP, valid, internal
Community:
Originator: 66.110.10.202
4755 9829
mlv-tcore1. (metric 1) from mlv-tcore1. (66.110.10.202)
Origin IGP, valid, internal, best
Community:

Router: gin-lhx-core1
Site: GB, London - LHX, TATA COMM. HARBOR EXCHANGE
Command: show ip bgp 117.200.61.87

BGP routing table entry for 117.200.48.0/20
Bestpath Modifiers: deterministic-med
Paths: (2 available, best #1)
Multipath: eBGP
4755 9829
mlv-tcore1. (metric 3636) from ldn-mcore3. (ldn-mcore3.)
Origin IGP, valid, internal, best
Community:
Originator: 66.110.10.202
4755 9829
mlv-tcore1. (metric 3636) from l78-tcore1. (66.110.10.237)
Origin IGP, valid, internal
Community:

Router: gin-nyt-core1
Site: US, New York - NYT, TELEHOUSE
Command: show ip bgp 117.200.61.87

BGP routing table entry for 117.200.48.0/20
Bestpath Modifiers: deterministic-med
Paths: (1 available, best #1)
4755 9829
tv2-tcore2. (metric 13282) from nyy-tcore1. (66.110.10.204)
Origin IGP, valid, internal, best
Community:
Originator: 66.110.10.202

Router: gin-s9r-core1
Site: SG, Singapore - S9R, GLOBAL SW F6-SC6
Command: show ip bgp 117.200.61.87

BGP routing table entry for 117.200.48.0/20
Bestpath Modifiers: deterministic-med
Paths: (8 available, best #6)
Multipath: eBGP
21 23 24 25 26
4755 9829
mlv-tcore1. (metric 3176) from tv2-core1. (tv2-core1.)
Origin IGP, valid, internal
Community:
Originator: 66.110.10.202
4755 9829
mlv-tcore1. (metric 3176) from hk2-core3. (hk2-core3.)
Origin IGP, valid, internal
Community:
Originator: 66.110.10.202
4755 9829
mlv-tcore1. (metric 3176) from klt-tcore1. (66.110.11.12)
Origin IGP, valid, internal
Community:
Originator: 66.110.10.202
4755 9829
mlv-tcore1. (metric 3176) from pye-core1. (pye-core1.)
Origin IGP, valid, internal
Community:
Originator: 66.110.10.202
4755 9829
mlv-tcore1. (metric 3176) from l78-mcore3. (Loopback5.mcore3.L78-London.)
Origin IGP, valid, internal
Community:
Originator: 66.110.10.202
4755 9829
mlv-tcore1. (metric 3176) from mlv-tcore1. (66.110.10.202)
Origin IGP, valid, internal, best
Community:
4755 9829
mlv-tcore1. (metric 3176) from rsd-core1. (rsd-core1.)
Origin IGP, valid, internal
Community:
Originator: 66.110.10.202
4755 9829
mlv-tcore1. (metric 3176) from jsd-core1. (jsd-core1.)
Origin IGP, valid, internal
Community:
Originator: 66.110.10.202

Clearly all paths are to Tata-VSNL 4755 in India directly (not via US) and then further to BSNL. This is not true incase of their IPLC bandwidth purchase where entry point for network will be router of BSNL’s upstream in New York and Los Angles only.

This is how BSNL screws up itself. Anyone hearing? 

Oh and btw I was “made to write” a 2 page answer to question in today’s sessional test at college. Question was what is TCP/IP? I wish teacher ignores the crap I wrote there and look at this blog post. THIS IS TCP/IP! :)