20 Sep

IPv6 allocations to downwards machine with just one /64

IPv6

One of my friend went for a VM with a German hosting provider. He got single IPv4 (quite common) and a /64 IPv6. Overall /64 per VM/end server used to be ok till few years back but now these days running applications inside LXC containers (OS level virtualization) make more sense. This gives option to maintain separate hosting environment for each application. I personally do that a lot and infect blog which you are reading right now itself is on a LXC container.

anurag@server7:~$ sudo lxc-ls -f |grep websrv1
[sudo] password for anurag:
websrv1.server7.core.anuragbhatia.com RUNNING 1 - 10.20.70.3 2402:b580:1:4:1:1:1:1, 2402:b580:1:4::abcd
anurag@server7:~$

 

So my friend tried to do similar setup but it went tricky for him because of just one single /64 from upstream. For me I have a /32 and I originate a /48 from this location giving me over 65k /64s of IPv6 for any testing and random fun application deployments.

 

The challenge in his setup was following: 

  1. One can use available 18 quintillion IPv6 address in the /64 by bridging the internal container interface with it. That’s ok for IPv6 but fails terribly for IPv4 as many people do not need dedicated IPv4 per container while it’s fun to have that for IPv6 and gives so much flexibility. For IPv4 a custom setup makes more sense with specific DST NAT and reverse proxy for port 80 and port 443 traffic.
  2. For NATing IPv4 a separate virtual interface (veth) makes sense so that one can run private IPv4 addressing. Now here firstly subnetting of /64 sounds stupid and weird but even if one does that it won’t work because main /64 allocation is via layer 2 and not a routed pool. This doesn’t works, read further on why.

 

workaround

So after our discussion my friend decided to use a /112 for container (ugly I know but datacenter provider quoted 4-5Euro/month for additional /64!). A /112 out of 128 IPv6 addressing gives one 2^16 i.e 65k IPv6 addresses to use on containers which is good number of IPv6 with few limitations like:

  1. Many things support /64 only like for instance use of IPv6 in OpenVPN sticks with that due to Linux Kernel implentation.
  2. IPv6 auto conf heavily depends on it. In my own personal setup I have a dedicated /64 for the container interfaces and radvd takes care of autoconfig via router advertisements. With anything less then /64 that’s not possible.

 

So we broke the allocated /64 into a /112 and allocated first IP our of that on veth based interface and next used 2nd IP on a container. IPv4 was working fine on container with SRC NAT in this case but IPv6 connectivity wasn’t. Container was able to reach host machine but nothing beyond that. I realised issue was of layer 2 based allocation which was relying on IPv6 NDP. So the container’s IPv6 had internal reachability with host machine but whenever any packet came from internet, the L3 device of VM provider wasn’t able to send packets further because of missing entry of that IP in their NDP table. Problem wasn’t just with IPv6 of container but with just any IPv6 used on any interface of the VM (whether that virtual veth or even loopback). Adding IPv6 on eth0 (which was connected to upstream device) was making IPv6 to work but not possible to use it further on a downstream device like a container. The datacenter provider offered to split /64 into /65s and route 2nd /65 for a monthly charge (ugly!!!). So we ended up with a nasty workaround – use of proxy NDP. This is very similar concept to proxy arp as in case of  IPv4. So that required enabling proxy arp by enabling in sysctl.conf and next doing proxy NDP for specific IPv6 using: ip neigh add proxy xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx/64 dev eth0

 

Hurricane Electric datacenter

 

This works and thus with an extra step of adding proxy NDP entry for each IPv6 in use. In general routed pool is way better and if I had a make a choice on behalf of his datacenter provider, I would have gone for use of /64 for point to point connectivity and a routed /48. At Hurricane Electric (company I work for) we offer IPv6 free of charge so that networks can grow without having to worry about address space or do nasty things like one I described above. 😉

Haven’t deployed IPv6 on your application yet? Do it now!

Time to get back to work and do more IPv6 🙂

09 Jun

Updates from life, blog and more

Some updates from personal life…

I have joined Fremont based IP backbone & colocation provider – Hurricane Electric and would be working on some cool things at AS6939. 🙂

 

Updates on blog…

I have changed theme and entire look of blog and re-designed it with new plugins, more tweaking etc. As of now blog has more cleaner while theme which gives more space for posting, improved security with some ACLs, forced HTTPS to avoid telcos from injecting iframe in readers on 3G networks (which is very bad and worrying). Also, with use of bunch of plugins, now my I am hosting all static media content on AWS S3 to avoid local storage on server, it’s backup etc. Running it on AWS S3 with Geo replication + Cloudfront for CDN/efficient delivery made much more sense. Though sad that there’s no easy way for integration of Google Cloud storage with wordpress. S3 being more mature product makes it easier.

 

 

Have fun and keep commenting & sending me direct messages.

31 Mar

Dark spot in Global IPv6 routing

 

 

Fest time at college – Good since I get lot of free time to spend around looking at routing tables. It’s always interesting since last week was full of some major submarine cable cuts and has huge impact on Indian networks.

Anyways, an interesting issue to post today about Global IPv6 routing . There are “dark spots” in global IPv6 routing because of peering dispute between multiple tier 1 ISPs involving Hurricane Electric (AS6939) & Cogent Communications (AS174).  What’s happening here is that both tier 1 providers failed to reach on agreement to keep peering up in case of IPv6. This has resulted in parts of global IPv6 internet where packets from one network (and it’s downstream) can’t reach other network or their downsteam singled hommed networks. 

Only publicly known information about de-peering of Cogent from HE is Mr Mike Leber’s email to NANOG mailing list here. Overall Hurricane Electric seems pretty much open in peering and networking community knows this well. So it is not hard to believe in to Mr Mike’s mail. Infact they even baked a cake to cheer Cogent up at NANOG meeting 47 at Dearborn, Michigan in 2009.

 

 

Why IPv6 Internet is broken when simply two providers de-peered? 

Answer of this lies in fundamental theory of a Tier 1 network i.e a “transit free” network. Hurricane Electric is world’s biggest IPv6 backbone in terms of number of interconnections while Cogent Communications is a big ISP in US and Europe with significant last mile fiber in many areas of US. It is a popular choice for cheap datacenter upstream transit. 

Now since both ISPs are tier 1 i.e transit free network in case of IPv6 internet, they simply do not pay to anyone (on layer 3) to reach any network. Packets from HE can’t go to Cogent simply because there’s no transit provider for HE in IPv6 (infact it is the transit provider to lot of networks!). At the same time Cogent is also not having any transit provider in IPv6. Transit here is important because there are many networks in world which are not connected. Say e.g Indian BSNL doesn’t connects to Hurricane Electric or say Tulip Telecom doesn’t connects to AT&T directly but packets can be routed because in both cases they have transit from an upstream network which eventually connects to AT&T or peers with AT&T. 

 

 

Looking at Cogent’s IPv6 prefix – 2001:0550::/32 announced from AS174 from Hurricane Electric’s route server:

 

route-server> show bgp ipv6 2001:0550::/32
% Network not in table
route-server>

 

There is no public route server from Cogent, thus I am using their looking glass to reach IPv6 address of he.net to test connectivity:

PING he.net(he.net) 56 data bytes
From 2001:550:1:31f::1 icmp_seq=2 Destination unreachable: No route
From 2001:550:1:31f::1 icmp_seq=3 Destination unreachable: No route

— he.net ping statistics —
5 packets transmitted, 0 received, +2 errors, 100% packet loss, time 14003ms

 

 

Is dark spot just in case of IPv6? What about their IPv4?

Yes, this problem in IPv6 specific only. HE and Cogent do not peer in case of IPv4 too but since HE is not a tier 1 in case of IPv4, it rather has a couple of transit providers who seem to be having peering relation with Cogent.

Looking at Cogent’s IPv4 38.100.128.10  from HE’s route server:

route-server> show ip bgp 38.0.0.0/8 long
BGP table version is 0, local router ID is 64.62.142.154
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, R Removed
Origin codes: i – IGP, e – EGP, ? – incomplete

Network Next Hop Metric LocPrf Weight Path
* i38.0.0.0 213.248.92.33 48 70 0 1299 174 i
* i 213.248.92.33 60 70 0 1299 174 i
* i 216.218.252.174 48 70 0 1299 174 i
* i 213.248.86.53 48 70 0 1299 174 i
* i 213.248.93.81 48 70 0 1299 174 i
* i 213.248.93.81 48 70 0 1299 174 i
* i 213.248.67.105 48 70 0 1299 174 i
* i 213.248.96.177 48 70 0 1299 174 i
* i 213.248.67.125 48 70 0 1299 174 i
* i 213.248.70.37 48 70 0 1299 174 i
* i 213.248.92.33 48 70 0 1299 174 i
* i 213.248.101.145 48 70 0 1299 174 i

(short extracted view of long output)

 

So clearly HE seems using AS1299 which Telia Global Network – one of IPv4 Tier 1 ISPs to reach Cogent. I can guess it is transit provider for HE. At the same time I can see a route from Cogent to HE in IPv4 via Global Crossing:

traceroute to 216.218.186.2 (216.218.186.2), 30 hops max, 60 byte packets
1 vl99.mag01.ord01.atlas.cogentco.com (66.250.250.89) 0.497 ms 0.444 ms
2 te0-5-0-3.ccr21.ord01.atlas.cogentco.com (154.54.45.193) 0.437 ms 0.569 ms
3 te0-5-0-5.ccr22.ord03.atlas.cogentco.com (154.54.44.162) 0.647 ms te0-5-0-1.ccr22.ord03.atlas.cogentco.com (154.54.43.230) 0.821 ms
4 Tenge4-4-10000M.ar3.CHI2.gblx.net (64.212.107.73) 0.554 ms 0.562 ms
5 Hurrican-Electric-LLC.Port-channel100.ar3.SJC2.gblx.net (64.214.174.246) 54.313 ms 54.016 ms
6 10gigabitethernet1-1.core1.fmt1.he.net (72.52.92.109) 54.792 ms 55.231 ms
7 * *
8 * *

 

So clearly networks have connectivity in IPv4 via HE’s upstreams Global Crossing (which is now Level 3) & Telia. In IPv6 HE simply is not having a customer relationship with Gblx and Telia. And so the dark spot remains there. 

 

The other fact which confirms that Telia and Gblx are transit for HE is via RADB records of AS1299.

 

Anurags-MacBook-Pro:~ anurag$ whois -h whois.radb.net as1299 | grep 6939
import: from AS6939 action pref=50; accept AS-HURRICANE
export: to AS6939 announce ANY
mp-import: afi ipv6 from AS6939 accept AS-HURRICANE
mp-export: afi ipv6 to AS6939 announce AS-TELIANET-V6
Anurags-MacBook-Pro:~ anurag$

 

Clearly it is announcing ANY ie 0.0.0.0/0 to HE on IPv4 while for IPv6 it is announcing only AS-TELIANET-V6 i.e transit in IPv4 while peering in IPv6.

 

With hope that this issue is resolved in near future, time for me to get some sleep! 🙂

 

Disclaimer: Focus of this blog post is not about who is responsible for not peering & creating such situation but rather a technical analysis of what happens when big Tier 1 ISPs de-peer.

Comments are personal and have nothing to do with my employer. I know most of people I mentioned in this post personally and this fact has nothing to do with this blog post!