Understanding headscale/tailscale ACL
Over the last few days, I spent some time playing around with various VPN options. For a couple of years, I have been using tailscale (with a self-hosted coordination server - headscale) across my devices. Besides that, I played around a bit with performance on native wireguard, netbird, etc.
It’s impressive that while on a 10G link between two dedicated servers, I get 2.43Gbps on wireguard and 4.98Gbps on tailscale. As someone pointed out on Twitter, this post documents performance tweaks on the tailscale.
Understanding Tailscale ACLs
ACLs in tailscale as well as Netbird work with the idea that:
- Everything is denied by default
- There is no way to “deny” in a specific rule.
- Rules are statful (if user1 can connect to server1 on port 22, the return traffic from server’s port 22 to user’s src port is permitted)
Thus one has to open ports with specific source/destination, protocol, port match etc. It’s easy to wrap your head around the fact that default action is DENY but it’s hard to get a head around that it’s not possible to deny in any specific rule. Thus e.g if one wants to open all ports in a server from everywhere except SSH port 22, in a typical firewall like say iptables one would do the:
iptables -I INPUT -s <TRUSTED_SOURCE_ADDR> -m tcp -p tcp --dport 22 -j ACCEPT
iptables -I INPUT -m tcp -p tcp --dport 22 -j DROP
iptables -I INPUT -j ACCEPT
Here rule 1 permits SSH from trusted IP, rule 2 denies SSH from everywhere else while rule 3 opens the rest of the ports. By design SSH traffic from non-trusted IP would hit rule 2 and get dropped. In tailscale design, there is no way to deny/drop/reject. Thus rule 2 is just not possible. Thus an open rule in the end would simply open everything including port 22 which makes whole firewalling useless. The workaround here is better design and not to open all ports anyway. So assuming this is running a web service on port 80/443, in tailscale hujson (human readable json) one would do the:
// Allow SSH from devices tagged trusted
{
"action": "accept",
"src": ["tag:trusted"],
"proto": "tcp",
"dst": ["*:22" ]
},
// Allow TCP port 80 and 443 from everywhere
{
"action": "accept",
"src": ["*:*"],
"proto": "tcp",
"dst": ["*:80,443" ]
}
Tags and Groups
To make ACL more dynamic and easy to use tailscale makes use of tags. These tags can be applied to the participating machines. A machine can have multiple tags and different policies can make use of these tags. One can tag machines based on trust or service or both like e.g tag:trusted
used above to allow SSH from machines which are trusted.
Groups on the other hand are used to group users together. So one can associate multiple users with a group. E.g I can put myself and say user named john as admin:
"groups": {
"group:admin": ["anurag","john"]
}
And next, use this group in a firewall policy to allow SSH from this group:
// Allow SSH from devices tagged trusted
{
"action": "accept",
"src": ["group:admin"],
"proto": "tcp",
"dst": ["*:22" ]
}
One tricky thing with headscale setup in past was giving access to internet exit nodes only. E.g let’s say I want my family to be able to use my VMs as an exit node for VPN with zero access to everything else i.e. services running on the server itself. In traditional firewall mode, it would: DENY RFC1918 & bunch of other known IPs and allow everything else. But again due to no way to DENY, it’s tricky. In past I made it work using this overly complicated rule to essentially allow all IPs except private IPs:
// Family can access internet via exit nodes but not private IPs
{
"action": "accept",
"src": ["group:family"],
"dst": [ "0.0.0.0/8:*","2.0.0.0/8:*","3.0.0.0/8:*","4.0.0.0/6:*","8.0.0.0/7:*","11.0.0.0/8:*",
"12.0.0.0/6:*","16.0.0.0/4:*","32.0.0.0/3:*","64.0.0.0/2:*","128.0.0.0/3:*",
"160.0.0.0/5:*","168.0.0.0/6:*","172.0.0.0/12:*","172.32.0.0/11:*","172.64.0.0/10:*",
"172.128.0.0/9:*","173.0.0.0/8:*","174.0.0.0/7:*","176.0.0.0/4:*","192.0.0.0/9:*","192.128.0.0/11:*",
"192.160.0.0/13:*","192.169.0.0/16:*","192.170.0.0/15:*","192.172.0.0/14:*","192.176.0.0/12:*","192.192.0.0/10:*",
"193.0.0.0/8:*","194.0.0.0/7:*","196.0.0.0/6:*","200.0.0.0/5:*","208.0.0.0/4:*" ]
}
But now headscale does support autogroup:internet as documented here. Thus above rule can now be replaced with the simple:
// Family can access the internet via exit nodes but not private IPs
{
"action": "accept",
"src": ["group:family"],
"dst": ["autogroup:internet"]
}
The use of wildcard <- when to avoid
Another interesting part is that while these firewall rules are applied inside tailscale clients on both source & destination. If for some reason there is no rule to make two devices talk, then even the VPN public keys are not exchanged. This can be useful, especially in cases where you want to hide other VPN members. I do workshops on routing, DNS, and automation across NOGs and there are times when I want to make all participating (untrusted VMs) part of the VPN network but don’t want to expose my servers.
Let’s say I got a ACL which SSH from me (group:admin) to everyone including tag:trusted
(my own machines) and tag:untrusted
- attendees VM.
// Allow ICMP from everyone
{
"action": "accept",
"src": ["*:*"],
"proto": "icmp",
"dst": ["*:*" ]
},
// Allow SSH
{
"action": "accept",
"src": ["group:admin"],
"proto": "tcp",
"dst": ["*:22" ]
}
This rule will share everyone’s keys with everyone just to facilitate ICMP while SSH would happen from all machines in the group:admin. Instead, let’s tweak it to:
// Allow ICMP only between trusted devices
{
"action": "accept",
"src": ["tag:trusted"],
"proto": "icmp",
"dst": ["tag:trusted" ]
},
// Allow SSH
{
"action": "accept",
"src": ["tag:control"],
"proto": "tcp",
"dst": ["*:22" ]
}
This would simply allow ICMP from trusted to trusted, removing possibility of sharing keys outside tag:trusted
. And next rule would allow SSH only from a specific control node.
What if we want all workshop attendees to see each other? Let’s say it’s a sanog workshop.I can create a user “sanog” and put all machines under that user.
Next, add a rule:
{
"action": "accept",
"src": ["sanog"],
"dst": ["sanog:*"]
}
This will result in key exchange only within user sanog while hiding the rest of my existing nodes.
Note
- Headscale does not apply ACLs unless restarted. Documentation claims kill -HUP works but in my experience, it did not. Anyways, restarting headscale is super fast and has zero impact on already connected peers.
- Makes sense to use both username/group and tag-based filtering. If one does only username-based, one would have to associate untrusted nodes under different usernames.
- tag:
applies only to the traffic from/towards tailscale IP and not any other LAN IP / loopback IP etc. For that one has to have a specific rule covering it. I use loopback IP on each server instead of tailscale IP for communication. That gives me the option to easily replace the VPN option without bothering about interface IPs as well as ensures container bound on loopback IP comes up if for some reason VPN fails.