26 Aug

Multiple IP's on Linux servers

One of things which people often asked me around in past was on how to have multiple IPs on Linux machine under various circumstances. I know there are ton of blog posts about this but very few explain how it works and possible options under different use cases etc.
I will share router side and server side config with focus on how it should be done from server end. Assuming server side config to be for Ubuntu/Debian. You can find similar concept for CentOS.
Say you have a router on IP and server on IP on a /24 ( subnet. Assming that entire is available for server’s connectivity. Setup would be like:
R1 - Server 01 connectivity
Configuration so far is super simple. You have got placed on R1’s interface (g1/0) which connects to server01 and server01 has

R1#sh run int g1/0
Building configuration...
Current configuration : 127 bytes
interface GigabitEthernet1/0
 description ***Link to server01***
 ip address
 negotiation auto

and on server’s config is:

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
#iface eth0 inet dhcp
#iface eth0 inet6 auto
iface eth0 inet static

Now let’s say you want to add additional IP’s to the server. There can few ways:

  1. You may add more IP’s from this same pool i.e un-used IP’s from within
  2. You may add more IP’s from all together different pools say from

When adding new IP’s/additonal IPs to server, you must understand that they would be either via layer 2 (i.e those IP’s will generate ARP packets on the interface connected to the router) or would be layer 3 i.e routed IP’s which are routed “behind” an existing working IP (like in this case. Another case you can have is additonal IP’s which are eventually NATTed behind public IPs which I will also discuss in the end.

Layer 2 based addition

When IP’s are from layer 2 – they are are supposed to be present on the interface so that they reflect in ARP and hence machines on the LAN do IP to MAC conversion and route packets destination for those IPs. Currently connected interface here is eth0 and hence new IP’s should be eth0 only. Thus you can add those IP’s by creating so called “alias interface”. eth0 can have eth0:1, eth0:2 etc as alias. IP’s can also be added on same single eth0 interface.
Since entire pool is available for use between R1 and server01, this doesn’t needs any change at R1 end. On server end, we can add IP as:
Temporary addition (will get removed after reboot):

anurag@user-1:~$ sudo ip -4 addr add dev eth0
[sudo] password for anurag:
anurag@user-1:~$ sudo ip -4 addr add dev eth0

So there we go – two IP’s added on eth0 itself.

anurag@user-1:~$ sudo ip -4 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    inet scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet brd scope global eth0
       valid_lft forever preferred_lft forever
    inet scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet scope global secondary eth0
       valid_lft forever preferred_lft forever

Let’s try to ping them from router:

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to, timeout is 2 seconds:
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/43/48 ms
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to, timeout is 2 seconds:
Success rate is 100 percent (5/5), round-trip min/avg/max = 32/42/48 ms

And so it works. Now, if we examine ARP table for g1/0 interface of router (which connects to server01) we will find all these three IP’s which are in use by server.

R1#show arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet             14   0800.2787.d567  ARPA   GigabitEthernet1/0
Internet              1   0800.2787.d567  ARPA   GigabitEthernet1/0
Internet              -   ca01.349f.001c  ARPA   GigabitEthernet1/0

Another way of doing same thing is by creating alias interface and adding IP’s on it. So we can add following in the /etc/network/interfaces:

auto eth0:1
iface eth0:1 inet static
auto eth0:2
iface eth0:2 inet static

Being those interfaces up using: ifup eth0:1 and ifup eth0:2. A logical question  – where to put gateway often comes up and confuses. Keep in mind as of now all IP’s are coming from same single device R1 and IP at R1 end is and hence single gateway in eth0 config is good enough to ensure that traffic to any IP outside pool can be routed via Let’s say you want to add IP from a completely different pool (for some reason) on server like Here you can do it via layer 2 by first defining an IP as secondary on R1 and add IP as alias on the server.

R1#sh run int g1/0
Building configuration...
Current configuration : 175 bytes
interface GigabitEthernet1/0
 description ***Link to server01***
 ip address secondary
 ip address
 negotiation auto

On Server01 end:

auto eth0:1
iface eth0:1 inet static

This simply ensures that both R1 and Server01 get in single broadcast domain which has broadcast address and hence can speak to each other. Again, in this case as well on router end – router gets ARP for IP and that tells how to reach. ARP table (IP to MAC address conversion) and forwarding of packets based on Mac (Mac table: Mac >> Interface conversion).

R1#show arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet              0   0800.2787.d567  ARPA   GigabitEthernet1/0
Internet              -   ca01.349f.001c  ARPA   GigabitEthernet1/0
Internet             -   ca01.349f.001c  ARPA   GigabitEthernet1/0
Internet             0   0800.2787.d567  ARPA   GigabitEthernet1/0

Another way of layer 2 setup can be by either patching an un-used extra port and have separate network on it (separate IP / subnet mask). You can also have a setup where you send tagged VLAN till server and untag it on the server. I will put blog post about it later on.

Layer 3 based addition

Due to scalability as well as scarcity of IPv4 address issue, layer 2 based method isn’t the best one when it comes to adding of additional IP’s. Layer 3 setup is simply where additional IP’s are “routed” behind a working single public IP.
So e.g thought it’s better to use /30 for P2P (infact /31!) but let’s keep same case going. We have on R1 and on Server01 and both are in /24. Now to allocate say to server, we can route this IP behind
So setup on R1:

R1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R1(config)#ip routing
R1(config)#ip route
Building configuration...
*Aug 21 19:21:53.495: %SYS-5-CONFIG_I: Configured from console by console[OK]

This will ensure that all packets going towards (single IP) are routed to which is on server01. Next, we can add this IP on existing loopback interface lo or a new alias of loopback as lo:1.
ip -4 addr add dev lo for temporary addition (removed after reboot) and
auto lo:1
iface lo:1 inet static

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to, timeout is 2 seconds:
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/43/48 ms

So how exactly this works? It’s important to understand it as it explains key difference between IP’s added on interface Vs IP’s routed. Let’s see “sh ip route” output for and

R1#sh ip route
Routing entry for
  Known via "connected", distance 0, metric 0 (connected, via interface)
  Routing Descriptor Blocks:
  * directly connected, via GigabitEthernet1/0
      Route metric is 0, traffic share count is 1
R1#sh ip route
Routing entry for
  Known via "static", distance 1, metric 0
  Routing Descriptor Blocks:
      Route metric is 0, traffic share count is 1

Here clearly there’s a “directly connected route” while for there’s a static route towards
Some of key comparison point layer 2 Vs layer 3 based setup:

  1. With layer 3 method you can have as many IP’s as want on server without getting into CIDR cuts. So e.g if you want to add entirely new pool to server, you would need at least 2 IP’s (a /31). If you want just 3 IP’s then you would need a /29 (consuming 8 IPs) and so on. This approach has issue as it wastes lot of IPs and that becomes critical when we are almost out IPv4. In IPv6 that’s no issue at all.
  2. With layer 3 you can have a setup where addition of IP’s doesn’t really creates any layer 2 noise (ARP packets). So e.g you can use just and then route entire behind server. This will ensure that server can use without generating any ARP for it and router will just have one single routing table entry for that enture /24. ARP would be just for single IP which is used to connect R1 with the server.

I hope this will help you !

17 Aug

BSNL AS9829 – A rotten IP backbone

Today I met a good friend and he has recently moved back into Rohtak (like me!) and was crying over BSNL’s issues. He has issues of unstable DSL due to last mile and I told him that even if last mile works well, BSNL still has got ton of issues with their IP backbone traffic.
It’s Sunday late night out here in India and I am having really pathetic connectivity with just everywhere except Google. With Google only key difference I noted is that my TCP session to Google’s services is terminating at Mumbai and not Delhi anymore.
First and formost, I did trace to spectranet.in (which is last company I was working for) to see how is my latency with server hosting it:

mtr -wrc 10 spectranet.in
HOST: Anurags-Macbook-Pro.home             Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- router.home                           0.0%    10    1.2   1.9   1.0   7.0   1.8
  2.|-- bsnl-wan-uplink.home                  0.0%    10    2.4   4.2   1.8  13.3   3.7
  3.|--                         0.0%    10   29.9  30.4  23.3  51.6   8.8
  4.|--                       0.0%    10   75.6  97.7  30.7 217.1  57.6
  5.|--                       0.0%    10   32.5  35.8  32.3  48.2   5.0
  6.|--                         0.0%    10  244.8 248.4 244.4 256.2   4.1
  7.|--  0.0%    10  256.9 250.0 246.1 257.3   4.1
  8.|--   10.0%    10  251.7 253.0 248.9 262.7   4.2
  9.|-- jane.spectranet.in                   10.0%    10  259.4 252.1 248.0 259.4   4.0

Clearly this seems to be going via NIXI but as soon as I hit NIXI IP (configured at destination network), the latency jumps up. This clearly is a symbol of bad return path. Since I do not have access to AS10029 network anymore and no one from my ex-colleagues would be awake at this time, I cannot see return trace easily.
I tried looking for my IP (coming via DHCP) is from orignated by BSNL AS9829. Let’s look for this IP at NIXI:

NIXI Looking Glass - show ip bgp
Router: NIXI Delhi (Noida)
Command: show ip bgp
% Network not in table
NIXI Looking Glass - show ip bgp
Router: NIXI Mumbai
Command: show ip bgp
% Network not in table
NIXI Looking Glass - show ip bgp
Router: NIXI Chennai
Command: show ip bgp
show ip bgp
BGP4 : None of the BGP4 routes match the display condition

Clearly BSNL isn’t announcing this IP at all at any of NIXI’s. This is bad and becomes “worst” because BSNL doesn’t peers with any of other networks except Google. It just buys transit from Tata, Airtel etc inside India and that’s pretty much it.
Let’s look at who BSNL is announcing this route at Oregon route views:

route-views>sh ip bgp long
BGP table version is 40782466, local router ID is
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
     Network          Next Hop            Metric LocPrf Weight Path
 *                          0 3303 6762 9829 i
 *                        0             0 6079 3356 6762 9829 i
 *                      0             0 852 6453 9829 i
 *                                  0 3277 39710 9002 6453 9829 i
 *                                     0 3267 2603 6762 9829 i
 *                                     0 4826 2828 6453 9829 i
 *                                      0 2497 6453 9829 i
 *                  2523             0 3549 6453 9829 i
 *                                  0 7660 2516 6762 9829 i
 *                                   0 3582 3701 3356 6453 9829 i
 *                                  0 1239 6453 9829 i
 *                                    0 1221 4637 6453 9829 i
 *>                                     0 6453 9829 i
 *                                  0 4901 174 6453 9829 i
 *                                     0 19214 4436 6453 9829 i
     Network          Next Hop            Metric LocPrf Weight Path
 *                                    0 6539 577 6762 9829 i
 *                       0             0 3356 6453 9829 i
 *                      3             0 4436 6762 9829 i
 *                     10             0 3257 6453 9829 i
 *                       8             0 2914 6453 9829 i
 *                                   0 5459 6453 9829 i
 *                       0             0 6079 3356 6453 9829 i
 *                                  0 101 101 2914 6453 9829 i
 *                                    0 46450 174 6453 9829 i
 *                                     0 58511 174 6453 9829 i
 *                                  0 6939 6762 9829 i
 *                     650             0 286 6762 9829 i
 *                                        0 7018 6453 9829 i
 *                                   0 53364 3257 6453 9829 i
 *                                        0 200130 1299 6453 9829 i
 *                                    0 3561 6453 9829 i
 *                                      0 202018 1299 6762 9829 i
 *                                    0 62567 6453 9829 i
 *                                    0 393406 1299 6453 9829 i
 *                      7             0 1668 6453 9829 i
 *                                       0 3333 6762 9829 i

We can see AS6453 which is Tata Comm’s International ASN and AS6762 which is AS6762 (Telecom Italia).

Some interesting facts:
  1. BSNL isn’t peering with any networks in India except Google (as far as I can see). This includes no large content networks or even large telcos. Yes, it does has local Akamai nodes but that’s pretty much it.
  2. BSNL is currently announcing very limited prefixes at all NIXI’s and my IP coming from doesn’t seem to be announced at any of NIXI’s at all.
  3. BSNL is announcing just to AS6453 Tata and AS6762 – Telecom Itlaia.
  4. Tata Communications usually does not sell any Indian capacity / Indian routing table via AS6453 and so AS6453 is used for buying transit outside India while AS4755 (VSNL) is used for domestic transit.
  5. Telecom Italia transit also is one BSNL buying outside and transporting over to India.

There’s nothing wrong in #4th and #5th but IP backbone design with a combination of all above is quite bad and leads to very degraded experience. As of now all non-Google traffic is getting routed to BSNL from outside India ! 
This includes traffic from India as well. So yes, India to India traffic is getting routed from outside India. Here are some traces to show that:
Trace from my friend’s ISP in Gujarat taking upstream from Tata:
So clearly packets are getting routed from Gujarat to Haryana via New York!
Let’s look at trace from Airtel’s PoP in New Delhi and Mumbai via their looking glass:

Mon Aug 17 02:53:51 GMT+05:30 2015
Mon Aug 17 02:54:05.849 IST
Type escape sequence to abort.
Tracing the route to
 1   * 240 msec  266 msec
 2 [MPLS: Label 339794 Exp 0] 281 msec  272 msec  202 msec
 3 [MPLS: Label 591824 Exp 0] 114 msec 66 msec 150 msec
 4 131 msec 134 msec 145 msec
 5 153 msec 184 msec  141 msec
 6  ae-7-3101.bar1.Marseille1.Level3.net ( 165 msec  *
    ae-8-3201.bar1.Marseille1.Level3.net ( 143 msec
 7  ae-7-3101.bar1.Marseille1.Level3.net ( 229 msec
    ae-8-3201.bar1.Marseille1.Level3.net ( 230 msec  *
 8  ix-5-1-2-0.tcore1.WYN-Marseille.as6453.net ( 223 msec  222 msec  196 msec
 9  if-8-1600.tcore1.PYE-Paris.as6453.net ( [MPLS: Label 388018 Exp 0] 279 msec  263 msec  278 msec
 10 if-3-6.tcore1.L78-London.as6453.net ( [MPLS: Label 552881 Exp 0] 245 msec  265 msec  284 msec
 11 if-2-2.tcore2.L78-London.as6453.net ( [MPLS: Label 300080 Exp 0] 267 msec  228 msec  227 msec
 12 if-20-2.tcore2.NYY-New-York.as6453.net ( [MPLS: Label 713905 Exp 0] 231 msec  230 msec  228 msec
 13 if-9-0-0-19.mcore4.NYY-New-York.as6453.net ( 223 msec  222 msec  225 msec
 14 ix-0-0-0.mcore4.NYY-New-York.as6453.net ( 268 msec  269 msec  265 msec
 15 267 msec  302 msec  301 msec
 16 279 msec  251 msec  248 msec
 17 251 msec  251 msec  250 msec
 18 276 msec  275 msec  276 msec
 19 273 msec  271 msec  273 msec

trace from Airtel Mumbai to BSNL Haryana

Mon Aug 17 02:56:09 GMT+05:30 2015
traceroute to (, 30 hops max, 40 byte packets
 1 (  105.600 ms (  95.984 ms (  95.944 ms
 2 (  121.413 ms  121.727 ms  122.907 ms
 3  ae-8-3201.bar1.Marseille1.Level3.net (  114.783 ms  120.834 ms  113.081 ms
 4  ae-8-3201.bar1.Marseille1.Level3.net (  113.883 ms  120.905 ms  152.539 ms
 5  ix-5-1-2-0.tcore1.WYN-Marseille.as6453.net (  122.523 ms  120.899 ms  120.186 ms
 6  if-8-1600.tcore1.PYE-Paris.as6453.net (  223.372 ms  193.584 ms  203.268 ms
     MPLS Label=388018 CoS=0 TTL=1 S=1
 7  if-3-6.tcore1.L78-London.as6453.net (  183.382 ms  197.114 ms  184.079 ms
     MPLS Label=552881 CoS=0 TTL=1 S=1
 8  if-2-2.tcore2.L78-London.as6453.net (  194.520 ms  193.295 ms  193.164 ms
     MPLS Label=300080 CoS=0 TTL=1 S=1
 9  if-20-2.tcore2.NYY-New-York.as6453.net (  200.612 ms  185.092 ms  192.285 ms
     MPLS Label=713905 CoS=0 TTL=1 S=1
10  if-9-0-0-19.mcore4.NYY-New-York.as6453.net (  192.508 ms if-15-0-0-20.mcore4.NYY-New-York.as6453.net (  195.585 ms  192.936 ms
11  ix-0-0-0.mcore4.NYY-New-York.as6453.net (  243.632 ms  244.137 ms  242.869 ms
12 (  244.654 ms  245.075 ms  244.035 ms
13 (  221.043 ms  219.843 ms  220.626 ms
14 (  221.508 ms  222.072 ms  223.851 ms
15 (  244.912 ms  244.448 ms  243.947 ms
16 (  246.776 ms  245.829 ms  243.037 ms

Clearly traffic coming from outside.

Some of fixes for this issue:
  1. BSNL keeps announcing routes at NIXI.
  2. BSNL keeps announcing routes to domestic transit and not just an International one.
  3. A better and open IXP model in India which removes the burden of “x-y” pricing as followed by NIXI on a inbound heavy network like that of BSNL.
  4. Likely BSNL is having capacity issues at NIXI Noida since NIXI just moved off to new location and BSNL is still working to build out transport to that. Even if that works, the trouble would be still with Western India / Southern India etc.

I have pretty much lost all hopes with BSNL that it will ever work. With hope that my new leased line circuit would be ready in upcoming days, time for me to get some sleep and get prepared for another day of high latency internet!
Disclaimer: This blog post (and blog as whole) is in my personal capacity and has nothing to do with my employer. It does not necessarily reflect views of my employer. And to be true this blog post is mine post as a frustrated customer of BSNL!

13 Aug

What is BCP38 and why it is important?

BCP38 – also known as “Network Ingress Filtering” is concept where we filter incoming packets from end customers and allow packets ONLY from IP’s assigned to them.
Before going to BCP38, let’s first understand how packets forwarding work:
Here User 1 is connected to User 2 via a series of router R1, R2 and R3. Here R1 and R3 are ISP’s edge routers while R2 is a core router. In typical way the network is setup, entire effort is given on logic of routing table i.e for packets to reach from User 1 to User 2, we need to ensure that User 1 has default route towards R1, knows that User-2’s IP is behind R3 which is reachable via R2. So path User 1 > R1 > R2 > R3 > User 2 comes up. And same for User 2 > R3 > R2 > R1 > User 1 as return path.
Now e.g IP pool for User-1 is and is using out of it while IP pool for User-2 is and is using out of it.
End to end traces
user2 User1
This is pretty much how most of network setups are. Now say User-1 get’s fishy and tries spoofing packets . User-1 can add (not allocated to him) on his loopback interface.
ip -4 addr add dev lo0 and done!
Now if User 1 tries to ping user-2 with as in source then:
ping -I -c 2
User 1 will not got any packets in reply since return packets will not come but forward packets will go. Let’s run
tcpdump -i eth0 -n ‘icmp’ on User-2’s  machine:
Clearly user 1 is able to spoof packets and send them to User-2. While user 1 may not get back replies but this half communication in itself makes this very dangerous. This makes it prone to so many security issues like triggering a potential DoS attack with invalid IP’s or triggering DDoS attacks with DNS amplification or NTP amplification.


So what is way to deal with it? – That’s BCP38 i.e filtering on the edge. The reason it’s important to filter on edge is because this cannot be done once packets leave edge. So e.g R2 might not be familiar on which IP’s R1 has allocated to whom and hence R2 cannot prevent spoofing of IPs allocated to R1 (it can however deny all non-R1 belong IP’s though). As we go into the core networks, IP filtering becomes harder and more of a problem then solution. Hence, it is expected that networks filters their edge and large networks filter small network.
So take e.g in this case R1 can put a filter to allow packets with only in source and not anything else (like
Create an access list and permit:

access-list 1 permit
interface GigabitEthernet1/0
 description Link to User-1
 ip address
 ip access-group 1 in
 negotiation auto

This completely prevents any spoofed packets from entering R1 from user 1. Ending this post with an interesting slide from Martin Levy from Cloudflare on extent of heavy amplification attacks done via spoofing.

Have fun and stay safe!