16 Oct

Ultra fast automated DDoS detection & mitigation

 

A few weeks back an Indian ISP contacted me via a contact form on my blog. That ISP has been struggling with a targetted DDoS attack. For the reason of privacy as well as the stability of their network, I will not put their name or AS number. The attack on that ISP was much higher than their bandwidth levels. Their upstream did not really share the volume of attack but I could tell from the screenshots they shared was that it was distributed volumetric attack choking their upstream bandwidth.

I suggested that ISP get the blackholing option from his upstream (preferred way) or buy a cheap server/VM somewhere outside India with BGP (and BGP blackholing) and manually blackhole traffic when the attack comes. They were able to get blackholing enabled from their upstream and it did work. They started blackholing traffic and it helped to manually drop traffic going towards the IPs which were being attacked. It’s important to have BGP blackholing because it helps a network to signal upstream ISP about the pools which are under attack and to drop traffic towards them. ISPs further signal the same to their upstream and larger networks typically drop that traffic on all their edge routers i.e closer to the entry fo attack.

 

Next, the problem which hit that ISP was that it was a pain to manually find IPs which were under attack and quickly drop them. I suggested them to try fastnetmon. I heard of fastnetmon from the presentation of Job Snijders at NLNOG (slides here & video here). Fastnetmon is developed and maintained by Pavel Odintsov (who works at Cloudflare AS13335).

 

Here’s how CLI looks

Fastnetmon is an excellent open source package which offers a daemon running on a Linux server. It supports a number of capture engines like port mirroring, NetFlow, sFLOW, IPFIX etc to feed it info about incoming traffic. It can detect an attack on specific IPs in the network based on bandwidth, packet per the second count or flow count. One can define and tweak these parameters based on attack profile. Next part of it is to signal router about dropping it and related rules on export BGP policy to tag that specific route with blackhole community of upstream. Fastnetmon offers options where one can define about for how long an IP stays blocked and when it can be allowed again. That ISP wasn’t familiar with Linux and I ended up in helping him with the setup of fastnetmon to detect and mitigate.

 

 

Feeding the fastnetmon

We initially used NetFlow with their Mikrotik CCR router and experience was not very good. It took almost 2 mins (literally 120 seconds+) time to detect an attack on an IP and that was dead slow considering that attacks were choking all available upstream bandwidth on that ISP. From Job’s presentation, it was clear that port mirroring makes sense and we ended up in setting up a Cisco switch where all transits were being dropped and tagged with specific VLANs. Cisco switch started mirroring all these VLANs to the fastnetmon server. This helped to save ports needed on the server as well and helped in feeding all transits to the fastnetmon card via a single port. This solution worked very well in detection. It was as fast as 2-3 seconds which is nice.

 

Detecting attack

In terms of detection, rules were straightforward. One can drop traffic based on bandwidth, flow sessions or pps. We ended up in using only bandwidth limits and that too only for traffic coming from outside to inside (i.e incoming traffic and not outgoing traffic).

 

Blocking attack

To block attack one can use standard BGP based signalling i.e where fastnetmon would maintain a BGP session with a router and will originate /32s which are under attack to the router. This solution uses exabgp and can work as standard since all routers support BGP. I tried it and it worked after a bit of effort. I didn’t like it much against other solution was dedicated plugin for Mikrotik which uses PHP based API to make changes in the router. It simply adds (and removes). Here’s the plugin in official GitHub project. Apart from being a single packaged solution, the plugin has a great feature that it adds a comment in the blackhole route announcement that fastnetmon has added it.

 

Some Misc notes:

  1. This solution of saving from DDoS (or DoS) attacks isn’t seamless. It simply blackholes IPs which are being attacked and thus an outage occurs for customers on those IPs but it saves bandwidth choking effect during attack and impact on other IP pools is minimised from very high to just 7-8 seconds which it all takes to detect, block and do a blackhole BGP announcement.
  2. My experience with exabgp integration was poor. I would suggest using the plugin if that’s supposed by the project. Check project’s GitHub page for support of it. If not, go for exabgp as last option.
  3. Prefer port mirroring over flow as long as you can support it. It’s easier at smaller levels, very hard at higher levels. Plus, try using a switch which can mirror multiple VLANs to same physical port.
  4. Impact of attack wasn’t zeroed during attack since once an IP pool is blackholed, fastnetmon does need to un-blackhole it to see if the attack has stopped or not. In can do that at fixed intervals. In this case, I set it up for 5 mins. So after every 300 seconds i.e at the 301st second it removes the blackhole, same is immediately updated to upstream. So by 303rd or 304th-second traffic comes (along with attack if any) and it takes another 7-8 second to detect and mitigate. So this means an impact of 7-8 seconds in every 5mins. It is visible but yes much better than no system at all or manual blackholing.
  5. While different streams offer different BGP communities, I setup standard one for the ISP i.e ASN:666 and did the translation on the export BGP session on the edge. These filtering policies simply looked for routes tagged with ASN:666 and swapped the tag with upstream’s blackhole community. This ensures that target IP is blackholed on all upstream with minimal config.
  6. We used Ubuntu 16.04 LTS release for this setup. Compiling package was fairly straightforward on it.
  7. Fastnetmon monitors & takes actions for IPs which are mentioned in /etc/networks_list and one can also put certain IPs in whitelist in /etc/networks_whitelist. Rest all config of the daemon is in /etc/fastnetmon.conf, daemon log in /var/log/fastnetmon.log. It also offers an option to log IPs which are under attack for further analysis. If enabled, they are saved in /var/log/fastnetmon_attacks/ with IP & timestamp. This can be quite useful for later analysis after the attack.
  8. One can view live status of fastnetmon daemon from /opt/fastnetmon/fastnetmon_client (location may vary based on installation).

DDoS and Hanuman Chalisa 🙂

In one of my past job, once a vendor showed up offering DDoS protected bandwidth. The head of Network dug for further details and it all boiled down to questions like “What if I get 1Gbps attack?”. Sales guy replied – “Fine sir, it can bear that!”. Network head increased intensity and asked – “what if 5Gbps?” and reply was “Yes, it would be ok”. Next, network head asked – “What if the attack is 10Gbps and all capacity in my transit is choked?”
The local sales guy replied in Hindi – Sir phir toh router se fiber nikal dena, aagar baati jalana and hanuman chalisa padna!  (Sir, in that case, plug the fibre out, light up a joss stick and pray Hanuman Chalisa).  🙂

What he was saying was actually correct. No appliance can help you if attack chokes your upstream bandwidth. Also blackholing may or may not work depending on the application. If the objective is to keep a server live in all conditions, blackholing will simply defeat that objective. Hence none whether it’s an appliance, or scrubbing or blackholing is a “one size fits all” solution. It totally depends on the application, level of SLA, cost which end user is ready to pay for protection etc.

So that’s all about it. Do write back or comment if you have faced DDoS in past, and solutions you used.

06 Apr

Route filter generation for Mikrotik RouterOS via IRR

A while back I posted about routing filter generation via bgpq3 for Cisco (ios and XR) and Juniper JunOS based routers. I have received a number of emails in last few months about automated filter generation for Mikrotik routeros. Since Mikrotik’s CCRs are getting quite popular across small to mid-sized ISPs.

So this blog post is about ways for generating filter config for a given ASN via IRR. One can use such logic with some kind of remote login mechanism like rancid (look for mtlogin here).

I tried building around bgpq3 but it seems more easy with another popular tool in the domain called IRR Power Tools. Once IRR Power Tools (IRRPT) is setup, it allows us to fetch prefixes based via Internet Routing Registries and also aggregates them.

 

So, for instance, let’s pick AS54456:

 

So now we have got prefixes and this includes both basic route objects as well as aggregates.

 

It offers a nice interface for generation of config for Cisco, Juniper, Extreme, Foundry and Force10. Example:

 

So I put a routeros instance in a VM to test and create config from their CLI. Config looks something like this:

This seems logical and can be scripted. So one can have a script to read the aggregate file and if aggregate says /24 one can put it directly in the filter else allow filter up to /24 from whatever range the pool starts and similar logic in IPv6.

So here’s the script:

 

So the script works except with a small bug in IPv6 aggregation which is the issue with IRRPT and I have reported same on their GitHub project page here.

 

An example of the script in progress for Cloudaccess AS54456:

 

Here’s another example of it in action with NPCI’s AS132351

 

 

 

 

Thinking to automate? 

The config between ***start*** and ***end*** can be pasted directly in CLI with Mikrotik. I would not recommend using it for manual filtering of any larger network. Automated filtering where filters are generated regularly makes sense but manual filtering without automation can be damaging. One can use a script like this for connecting to smaller networks. Also, IRRPR offers diff management via CVS (I hope they come up with git on that part) and it comes with an option to trigger email update so Network admins can know when to manually update. I would prefer that for non-commit based platforms since with Cisco ios or Mikrotik routeros it can be tricky to auto update prefix list. If one does no ip prefix list before triggering update it will cause a major noticeable impact. So ideal way to manage that on non-commit based devices would be to maintain a list of prefixes separately in the plain text file or a database and diff it against old one & only push for changes. Do-able and should be preferred that way instead of deleting and re-adding the whole list while automating.

 

Time to get back to work! 🙂