26 Aug

Multiple IP's on Linux servers

One of things which people often asked me around in past was on how to have multiple IPs on Linux machine under various circumstances. I know there are ton of blog posts about this but very few explain how it works and possible options under different use cases etc.
 
I will share router side and server side config with focus on how it should be done from server end. Assuming server side config to be for Ubuntu/Debian. You can find similar concept for CentOS.
 
Say you have a router on IP 10.10.10.1 and server on IP 10.10.10.2 on a /24 (255.255.255.0) subnet. Assming that entire 10.10.10.0/24 is available for server’s connectivity. Setup would be like:
R1 - Server 01 connectivity
Configuration so far is super simple. You have got 10.10.10.1 placed on R1’s interface (g1/0) which connects to server01 and server01 has 10.10.10.2.

R1#sh run int g1/0
Building configuration...
Current configuration : 127 bytes
!
interface GigabitEthernet1/0
 description ***Link to server01***
 ip address 10.10.10.1 255.255.255.0
 negotiation auto
end
R1#

 
and on server’s config is:

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
#iface eth0 inet dhcp
#iface eth0 inet6 auto
iface eth0 inet static
address 10.10.10.2
netmask 255.255.255.0
gateway 10.10.10.1

 
Now let’s say you want to add additional IP’s to the server. There can few ways:

  1. You may add more IP’s from this same pool i.e un-used IP’s from within 10.10.10.0/24.
  2. You may add more IP’s from all together different pools say from 192.168.1.0/24.

 
When adding new IP’s/additonal IPs to server, you must understand that they would be either via layer 2 (i.e those IP’s will generate ARP packets on the interface connected to the router) or would be layer 3 i.e routed IP’s which are routed “behind” an existing working IP (like 10.10.10.2) in this case. Another case you can have is additonal IP’s which are eventually NATTed behind public IPs which I will also discuss in the end.
 

Layer 2 based addition

When IP’s are from layer 2 – they are are supposed to be present on the interface so that they reflect in ARP and hence machines on the LAN do IP to MAC conversion and route packets destination for those IPs. Currently connected interface here is eth0 and hence new IP’s should be eth0 only. Thus you can add those IP’s by creating so called “alias interface”. eth0 can have eth0:1, eth0:2 etc as alias. IP’s can also be added on same single eth0 interface.
Since entire pool is available for use between R1 and server01, this doesn’t needs any change at R1 end. On server end, we can add IP as:
 
Temporary addition (will get removed after reboot):

anurag@user-1:~$ sudo ip -4 addr add 10.10.10.3/24 dev eth0
[sudo] password for anurag:
anurag@user-1:~$
anurag@user-1:~$
anurag@user-1:~$ sudo ip -4 addr add 10.10.10.4/24 dev eth0
anurag@user-1:~$

 
So there we go – two IP’s added on eth0 itself.

anurag@user-1:~$ sudo ip -4 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.10.10.2/24 brd 10.10.10.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.10.10.3/24 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet 10.10.10.4/24 scope global secondary eth0
       valid_lft forever preferred_lft forever
anurag@user-1:~$

 
 
Let’s try to ping them from router:

R1#ping 10.10.10.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.10.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/43/48 ms
R1#ping 10.10.10.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.10.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 32/42/48 ms
R1#

 
And so it works. Now, if we examine ARP table for g1/0 interface of router (which connects to server01) we will find all these three IP’s which are in use by server.

R1#show arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  10.10.10.2             14   0800.2787.d567  ARPA   GigabitEthernet1/0
Internet  10.10.10.3              1   0800.2787.d567  ARPA   GigabitEthernet1/0
Internet  10.10.10.1              -   ca01.349f.001c  ARPA   GigabitEthernet1/0
R1#

 
Another way of doing same thing is by creating alias interface and adding IP’s on it. So we can add following in the /etc/network/interfaces:

auto eth0:1
iface eth0:1 inet static
address 10.10.10.3
netmask 255.255.255.0
auto eth0:2
iface eth0:2 inet static
address 10.10.10.4
netmask 255.255.255.0

 
Being those interfaces up using: ifup eth0:1 and ifup eth0:2. A logical question  – where to put gateway often comes up and confuses. Keep in mind as of now all IP’s are coming from same single device R1 and IP at R1 end is 10.10.10.1 and hence single gateway in eth0 config is good enough to ensure that traffic to any IP outside 10.10.10.0/24 pool can be routed via 10.10.10.1. Let’s say you want to add IP from a completely different pool (for some reason) on server like 192.168.1.0/24. Here you can do it via layer 2 by first defining an IP as secondary on R1 and add IP as alias on the server.

R1#sh run int g1/0
Building configuration...
Current configuration : 175 bytes
!
interface GigabitEthernet1/0
 description ***Link to server01***
 ip address 192.168.1.1 255.255.255.0 secondary
 ip address 10.10.10.1 255.255.255.0
 negotiation auto
end
R1#

 
On Server01 end:

auto eth0:1
iface eth0:1 inet static
address 192.168.1.2
netmask 255.255.255.0

 
This simply ensures that both R1 and Server01 get in single broadcast domain which has broadcast address 192.168.1.255 and hence can speak to each other. Again, in this case as well on router end – router gets ARP for IP and that tells how to reach. ARP table (IP to MAC address conversion) and forwarding of packets based on Mac (Mac table: Mac >> Interface conversion).

R1#show arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  10.10.10.2              0   0800.2787.d567  ARPA   GigabitEthernet1/0
Internet  10.10.10.1              -   ca01.349f.001c  ARPA   GigabitEthernet1/0
Internet  192.168.1.1             -   ca01.349f.001c  ARPA   GigabitEthernet1/0
Internet  192.168.1.2             0   0800.2787.d567  ARPA   GigabitEthernet1/0
R1#

 
Another way of layer 2 setup can be by either patching an un-used extra port and have separate network on it (separate IP / subnet mask). You can also have a setup where you send tagged VLAN till server and untag it on the server. I will put blog post about it later on.
 

Layer 3 based addition

Due to scalability as well as scarcity of IPv4 address issue, layer 2 based method isn’t the best one when it comes to adding of additional IP’s. Layer 3 setup is simply where additional IP’s are “routed” behind a working single public IP.
So e.g thought it’s better to use /30 for P2P (infact /31!) but let’s keep same case going. We have 10.10.10.1 on R1 and 10.10.10.2 on Server01 and both are in /24. Now to allocate say 10.10.10.3 to server, we can route this IP behind 10.10.10.2.
 
So setup on R1:

R1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R1(config)#ip routing
R1(config)#ip route 10.10.10.3 255.255.255.255 10.10.10.2
R1(config)#end
R1#wr
Building configuration...
*Aug 21 19:21:53.495: %SYS-5-CONFIG_I: Configured from console by console[OK]
R1#

 
This will ensure that all packets going towards 10.10.10.3/32 (single IP) are routed to 10.10.10.2 which is on server01. Next, we can add this IP on existing loopback interface lo or a new alias of loopback as lo:1.
ip -4 addr add 10.10.10.3/32 dev lo for temporary addition (removed after reboot) and
auto lo:1
iface lo:1 inet static
address 10.10.10.3
netmask 255.255.255.255

R1#ping 10.10.10.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.10.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/43/48 ms
R1#

 
So how exactly this works? It’s important to understand it as it explains key difference between IP’s added on interface Vs IP’s routed. Let’s see “sh ip route” output for 10.10.10.2 and 10.10.10.3:

R1#sh ip route 10.10.10.2
Routing entry for 10.10.10.0/24
  Known via "connected", distance 0, metric 0 (connected, via interface)
  Routing Descriptor Blocks:
  * directly connected, via GigabitEthernet1/0
      Route metric is 0, traffic share count is 1
R1#sh ip route 10.10.10.3
Routing entry for 10.10.10.3/32
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
  * 10.10.10.2
      Route metric is 0, traffic share count is 1
R1#

 
Here clearly there’s a “directly connected route” 10.10.10.2 while for 10.10.10.3 there’s a static route towards 10.10.10.2.
 
Some of key comparison point layer 2 Vs layer 3 based setup:

  1. With layer 3 method you can have as many IP’s as want on server without getting into CIDR cuts. So e.g if you want to add entirely new pool to server, you would need at least 2 IP’s (a /31). If you want just 3 IP’s then you would need a /29 (consuming 8 IPs) and so on. This approach has issue as it wastes lot of IPs and that becomes critical when we are almost out IPv4. In IPv6 that’s no issue at all.
  2. With layer 3 you can have a setup where addition of IP’s doesn’t really creates any layer 2 noise (ARP packets). So e.g you can use just 10.10.10.0/31 and then route entire 192.168.1.0/24 behind server. This will ensure that server can use 192.168.1.0/24 without generating any ARP for it and router will just have one single routing table entry for that enture /24. ARP would be just for single IP which is used to connect R1 with the server.

 
 
I hope this will help you !

17 Aug

BSNL AS9829 – A rotten IP backbone

Today I met a good friend and he has recently moved back into Rohtak (like me!) and was crying over BSNL’s issues. He has issues of unstable DSL due to last mile and I told him that even if last mile works well, BSNL still has got ton of issues with their IP backbone traffic.
 
It’s Sunday late night out here in India and I am having really pathetic connectivity with just everywhere except Google. With Google only key difference I noted is that my TCP session to Google’s services is terminating at Mumbai and not Delhi anymore.
First and formost, I did trace to spectranet.in (which is last company I was working for) to see how is my latency with server hosting it:

mtr -wrc 10 spectranet.in
HOST: Anurags-Macbook-Pro.home             Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- router.home                           0.0%    10    1.2   1.9   1.0   7.0   1.8
  2.|-- bsnl-wan-uplink.home                  0.0%    10    2.4   4.2   1.8  13.3   3.7
  3.|-- 117.220.160.1                         0.0%    10   29.9  30.4  23.3  51.6   8.8
  4.|-- 218.248.169.118                       0.0%    10   75.6  97.7  30.7 217.1  57.6
  5.|-- 218.248.235.130                       0.0%    10   32.5  35.8  32.3  48.2   5.0
  6.|-- 218.100.48.10                         0.0%    10  244.8 248.4 244.4 256.2   4.1
  7.|-- 203.122.61.147.reverse.spectranet.in  0.0%    10  256.9 250.0 246.1 257.3   4.1
  8.|-- 119.82.69.34.reverse.spectranet.in   10.0%    10  251.7 253.0 248.9 262.7   4.2
  9.|-- jane.spectranet.in                   10.0%    10  259.4 252.1 248.0 259.4   4.0

Clearly this seems to be going via NIXI but as soon as I hit NIXI IP (configured at destination network), the latency jumps up. This clearly is a symbol of bad return path. Since I do not have access to AS10029 network anymore and no one from my ex-colleagues would be awake at this time, I cannot see return trace easily.
I tried looking for my IP (coming via DHCP) is 117.220.162.110 from 117.220.160.0/20 orignated by BSNL AS9829. Let’s look for this IP at NIXI:

NIXI Looking Glass - show ip bgp 117.220.162.110
Router: NIXI Delhi (Noida)
Command: show ip bgp 117.220.162.110
% Network not in table
NIXI Looking Glass - show ip bgp 117.220.162.110
Router: NIXI Mumbai
Command: show ip bgp 117.220.162.110
% Network not in table
NIXI Looking Glass - show ip bgp 117.220.162.110
Router: NIXI Chennai
Command: show ip bgp 117.220.162.110
show ip bgp 117.220.162.110
BGP4 : None of the BGP4 routes match the display condition

 
Clearly BSNL isn’t announcing this IP at all at any of NIXI’s. This is bad and becomes “worst” because BSNL doesn’t peers with any of other networks except Google. It just buys transit from Tata, Airtel etc inside India and that’s pretty much it.
Let’s look at who BSNL is announcing this route at Oregon route views:

route-views>sh ip bgp 117.220.160.0/20 long
BGP table version is 40782466, local router ID is 128.223.51.103
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
     Network          Next Hop            Metric LocPrf Weight Path
 *   117.220.160.0/20 217.192.89.50                          0 3303 6762 9829 i
 *                    207.172.6.1              0             0 6079 3356 6762 9829 i
 *                    154.11.98.225            0             0 852 6453 9829 i
 *                    195.208.112.161                        0 3277 39710 9002 6453 9829 i
 *                    194.85.40.15                           0 3267 2603 6762 9829 i
 *                    114.31.199.1                           0 4826 2828 6453 9829 i
 *                    202.232.0.2                            0 2497 6453 9829 i
 *                    208.51.134.254        2523             0 3549 6453 9829 i
 *                    203.181.248.168                        0 7660 2516 6762 9829 i
 *                    128.223.253.10                         0 3582 3701 3356 6453 9829 i
 *                    144.228.241.130                        0 1239 6453 9829 i
 *                    203.62.252.83                          0 1221 4637 6453 9829 i
 *>                   66.110.0.86                            0 6453 9829 i
 *                    162.250.137.254                        0 4901 174 6453 9829 i
 *                    208.74.64.40                           0 19214 4436 6453 9829 i
     Network          Next Hop            Metric LocPrf Weight Path
 *                    66.59.190.221                          0 6539 577 6762 9829 i
 *                    4.69.184.193             0             0 3356 6453 9829 i
 *                    69.31.111.244            3             0 4436 6762 9829 i
 *                    89.149.178.10           10             0 3257 6453 9829 i
 *                    129.250.0.11             8             0 2914 6453 9829 i
 *                    195.66.232.239                         0 5459 6453 9829 i
 *                    207.172.6.20             0             0 6079 3356 6453 9829 i
 *                    209.124.176.223                        0 101 101 2914 6453 9829 i
 *                    104.192.216.1                          0 46450 174 6453 9829 i
 *                    103.247.3.45                           0 58511 174 6453 9829 i
 *                    216.218.252.164                        0 6939 6762 9829 i
 *                    134.222.87.1           650             0 286 6762 9829 i
 *                    12.0.1.63                              0 7018 6453 9829 i
 *                    173.205.57.234                         0 53364 3257 6453 9829 i
 *                    95.85.0.2                              0 200130 1299 6453 9829 i
 *                    206.24.210.80                          0 3561 6453 9829 i
 *                    5.101.110.2                            0 202018 1299 6762 9829 i
 *                    192.241.164.4                          0 62567 6453 9829 i
 *                    162.243.188.2                          0 393406 1299 6453 9829 i
 *                    66.185.128.48            7             0 1668 6453 9829 i
 *                    193.0.0.56                             0 3333 6762 9829 i
route-views>

We can see AS6453 which is Tata Comm’s International ASN and AS6762 which is AS6762 (Telecom Italia).

Some interesting facts:
  1. BSNL isn’t peering with any networks in India except Google (as far as I can see). This includes no large content networks or even large telcos. Yes, it does has local Akamai nodes but that’s pretty much it.
  2. BSNL is currently announcing very limited prefixes at all NIXI’s and my IP coming from 117.220.160.0/20 doesn’t seem to be announced at any of NIXI’s at all.
  3. BSNL is announcing 117.220.160.0/20 just to AS6453 Tata and AS6762 – Telecom Itlaia.
  4. Tata Communications usually does not sell any Indian capacity / Indian routing table via AS6453 and so AS6453 is used for buying transit outside India while AS4755 (VSNL) is used for domestic transit.
  5. Telecom Italia transit also is one BSNL buying outside and transporting over to India.

 
There’s nothing wrong in #4th and #5th but IP backbone design with a combination of all above is quite bad and leads to very degraded experience. As of now all non-Google traffic is getting routed to BSNL from outside India ! 
This includes traffic from India as well. So yes, India to India traffic is getting routed from outside India. Here are some traces to show that:
Trace from my friend’s ISP in Gujarat taking upstream from Tata:
BSNL
 
So clearly packets are getting routed from Gujarat to Haryana via New York!
 
Let’s look at trace from Airtel’s PoP in New Delhi and Mumbai via their looking glass:

Mon Aug 17 02:53:51 GMT+05:30 2015
traceroute 117.220.162.110
Mon Aug 17 02:54:05.849 IST
Type escape sequence to abort.
Tracing the route to 117.220.162.110
 1   *
    203.101.95.146 240 msec  266 msec
 2  182.79.224.70 [MPLS: Label 339794 Exp 0] 281 msec  272 msec  202 msec
 3  182.79.224.177 [MPLS: Label 591824 Exp 0] 114 msec
    182.79.247.50 66 msec
    182.79.224.185 150 msec
 4  182.79.222.81 131 msec
    182.79.245.161 134 msec
    182.79.245.225 145 msec
 5  213.242.116.157 153 msec
    213.242.116.165 184 msec  141 msec
 6  ae-7-3101.bar1.Marseille1.Level3.net (4.69.141.190) 165 msec  *
    ae-8-3201.bar1.Marseille1.Level3.net (4.69.141.198) 143 msec
 7  ae-7-3101.bar1.Marseille1.Level3.net (4.69.141.190) 229 msec
    ae-8-3201.bar1.Marseille1.Level3.net (4.69.141.198) 230 msec  *
 8  ix-5-1-2-0.tcore1.WYN-Marseille.as6453.net (80.231.217.81) 223 msec  222 msec  196 msec
 9  if-8-1600.tcore1.PYE-Paris.as6453.net (80.231.217.6) [MPLS: Label 388018 Exp 0] 279 msec  263 msec  278 msec
 10 if-3-6.tcore1.L78-London.as6453.net (80.231.130.85) [MPLS: Label 552881 Exp 0] 245 msec  265 msec  284 msec
 11 if-2-2.tcore2.L78-London.as6453.net (80.231.131.1) [MPLS: Label 300080 Exp 0] 267 msec  228 msec  227 msec
 12 if-20-2.tcore2.NYY-New-York.as6453.net (216.6.99.13) [MPLS: Label 713905 Exp 0] 231 msec  230 msec  228 msec
 13 if-9-0-0-19.mcore4.NYY-New-York.as6453.net (209.58.60.149) 223 msec  222 msec  225 msec
 14 ix-0-0-0.mcore4.NYY-New-York.as6453.net (209.58.60.6) 268 msec  269 msec  265 msec
 15 218.248.235.129 267 msec  302 msec  301 msec
 16 218.248.169.117 279 msec  251 msec  248 msec
 17 218.248.169.117 251 msec  251 msec  250 msec
 18 117.220.162.110 276 msec  275 msec  276 msec
 19 117.220.162.110 273 msec  271 msec  273 msec
RP/0/8/CPU0:DEL-ISP-MPL-ACC-RTR-9#

 
 
trace from Airtel Mumbai to BSNL Haryana

Mon Aug 17 02:56:09 GMT+05:30 2015
 traceroute 117.220.162.110
traceroute to 117.220.162.110 (117.220.162.110), 30 hops max, 40 byte packets
 1  182.79.245.134 (182.79.245.134)  105.600 ms 182.79.245.161 (182.79.245.161)  95.984 ms 182.79.222.74 (182.79.222.74)  95.944 ms
 2  213.242.116.161 (213.242.116.161)  121.413 ms  121.727 ms  122.907 ms
 3  ae-8-3201.bar1.Marseille1.Level3.net (4.69.141.198)  114.783 ms  120.834 ms  113.081 ms
 4  ae-8-3201.bar1.Marseille1.Level3.net (4.69.141.198)  113.883 ms  120.905 ms  152.539 ms
 5  ix-5-1-2-0.tcore1.WYN-Marseille.as6453.net (80.231.217.81)  122.523 ms  120.899 ms  120.186 ms
 6  if-8-1600.tcore1.PYE-Paris.as6453.net (80.231.217.6)  223.372 ms  193.584 ms  203.268 ms
     MPLS Label=388018 CoS=0 TTL=1 S=1
 7  if-3-6.tcore1.L78-London.as6453.net (80.231.130.85)  183.382 ms  197.114 ms  184.079 ms
     MPLS Label=552881 CoS=0 TTL=1 S=1
 8  if-2-2.tcore2.L78-London.as6453.net (80.231.131.1)  194.520 ms  193.295 ms  193.164 ms
     MPLS Label=300080 CoS=0 TTL=1 S=1
 9  if-20-2.tcore2.NYY-New-York.as6453.net (216.6.99.13)  200.612 ms  185.092 ms  192.285 ms
     MPLS Label=713905 CoS=0 TTL=1 S=1
10  if-9-0-0-19.mcore4.NYY-New-York.as6453.net (209.58.60.149)  192.508 ms if-15-0-0-20.mcore4.NYY-New-York.as6453.net (209.58.60.133)  195.585 ms  192.936 ms
11  ix-0-0-0.mcore4.NYY-New-York.as6453.net (209.58.60.6)  243.632 ms  244.137 ms  242.869 ms
12  218.248.235.129 (218.248.235.129)  244.654 ms  245.075 ms  244.035 ms
13  218.248.169.121 (218.248.169.121)  221.043 ms  219.843 ms  220.626 ms
14  218.248.169.121 (218.248.169.121)  221.508 ms  222.072 ms  223.851 ms
15  117.220.162.110 (117.220.162.110)  244.912 ms  244.448 ms  243.947 ms
16  117.220.162.110 (117.220.162.110)  246.776 ms  245.829 ms  243.037 ms
{master}
lookingglass@MUM-SC-ISP-IGW-RTR-116>

Clearly traffic coming from outside.
 
 

Some of fixes for this issue:
  1. BSNL keeps announcing routes at NIXI.
  2. BSNL keeps announcing routes to domestic transit and not just an International one.
  3. A better and open IXP model in India which removes the burden of “x-y” pricing as followed by NIXI on a inbound heavy network like that of BSNL.
  4. Likely BSNL is having capacity issues at NIXI Noida since NIXI just moved off to new location and BSNL is still working to build out transport to that. Even if that works, the trouble would be still with Western India / Southern India etc.

 
I have pretty much lost all hopes with BSNL that it will ever work. With hope that my new leased line circuit would be ready in upcoming days, time for me to get some sleep and get prepared for another day of high latency internet!
 
Disclaimer: This blog post (and blog as whole) is in my personal capacity and has nothing to do with my employer. It does not necessarily reflect views of my employer. And to be true this blog post is mine post as a frustrated customer of BSNL!