16 Oct

Ultra fast automated DDoS detection & mitigation

 

A few weeks back an Indian ISP contacted me via a contact form on my blog. That ISP has been struggling with a targetted DDoS attack. For the reason of privacy as well as the stability of their network, I will not put their name or AS number. The attack on that ISP was much higher than their bandwidth levels. Their upstream did not really share the volume of attack but I could tell from the screenshots they shared was that it was distributed volumetric attack choking their upstream bandwidth.

I suggested that ISP get the blackholing option from his upstream (preferred way) or buy a cheap server/VM somewhere outside India with BGP (and BGP blackholing) and manually blackhole traffic when the attack comes. They were able to get blackholing enabled from their upstream and it did work. They started blackholing traffic and it helped to manually drop traffic going towards the IPs which were being attacked. It’s important to have BGP blackholing because it helps a network to signal upstream ISP about the pools which are under attack and to drop traffic towards them. ISPs further signal the same to their upstream and larger networks typically drop that traffic on all their edge routers i.e closer to the entry fo attack.

 

Next, the problem which hit that ISP was that it was a pain to manually find IPs which were under attack and quickly drop them. I suggested them to try fastnetmon. I heard of fastnetmon from the presentation of Job Snijders at NLNOG (slides here & video here). Fastnetmon is developed and maintained by Pavel Odintsov (who works at Cloudflare AS13335).

 

Here’s how CLI looks

Fastnetmon is an excellent open source package which offers a daemon running on a Linux server. It supports a number of capture engines like port mirroring, NetFlow, sFLOW, IPFIX etc to feed it info about incoming traffic. It can detect an attack on specific IPs in the network based on bandwidth, packet per the second count or flow count. One can define and tweak these parameters based on attack profile. Next part of it is to signal router about dropping it and related rules on export BGP policy to tag that specific route with blackhole community of upstream. Fastnetmon offers options where one can define about for how long an IP stays blocked and when it can be allowed again. That ISP wasn’t familiar with Linux and I ended up in helping him with the setup of fastnetmon to detect and mitigate.

 

 

Feeding the fastnetmon

We initially used NetFlow with their Mikrotik CCR router and experience was not very good. It took almost 2 mins (literally 120 seconds+) time to detect an attack on an IP and that was dead slow considering that attacks were choking all available upstream bandwidth on that ISP. From Job’s presentation, it was clear that port mirroring makes sense and we ended up in setting up a Cisco switch where all transits were being dropped and tagged with specific VLANs. Cisco switch started mirroring all these VLANs to the fastnetmon server. This helped to save ports needed on the server as well and helped in feeding all transits to the fastnetmon card via a single port. This solution worked very well in detection. It was as fast as 2-3 seconds which is nice.

 

Detecting attack

In terms of detection, rules were straightforward. One can drop traffic based on bandwidth, flow sessions or pps. We ended up in using only bandwidth limits and that too only for traffic coming from outside to inside (i.e incoming traffic and not outgoing traffic).

 

Blocking attack

To block attack one can use standard BGP based signalling i.e where fastnetmon would maintain a BGP session with a router and will originate /32s which are under attack to the router. This solution uses exabgp and can work as standard since all routers support BGP. I tried it and it worked after a bit of effort. I didn’t like it much against other solution was dedicated plugin for Mikrotik which uses PHP based API to make changes in the router. It simply adds (and removes). Here’s the plugin in official GitHub project. Apart from being a single packaged solution, the plugin has a great feature that it adds a comment in the blackhole route announcement that fastnetmon has added it.

 

Some Misc notes:

  1. This solution of saving from DDoS (or DoS) attacks isn’t seamless. It simply blackholes IPs which are being attacked and thus an outage occurs for customers on those IPs but it saves bandwidth choking effect during attack and impact on other IP pools is minimised from very high to just 7-8 seconds which it all takes to detect, block and do a blackhole BGP announcement.
  2. My experience with exabgp integration was poor. I would suggest using the plugin if that’s supposed by the project. Check project’s GitHub page for support of it. If not, go for exabgp as last option.
  3. Prefer port mirroring over flow as long as you can support it. It’s easier at smaller levels, very hard at higher levels. Plus, try using a switch which can mirror multiple VLANs to same physical port.
  4. Impact of attack wasn’t zeroed during attack since once an IP pool is blackholed, fastnetmon does need to un-blackhole it to see if the attack has stopped or not. In can do that at fixed intervals. In this case, I set it up for 5 mins. So after every 300 seconds i.e at the 301st second it removes the blackhole, same is immediately updated to upstream. So by 303rd or 304th-second traffic comes (along with attack if any) and it takes another 7-8 second to detect and mitigate. So this means an impact of 7-8 seconds in every 5mins. It is visible but yes much better than no system at all or manual blackholing.
  5. While different streams offer different BGP communities, I setup standard one for the ISP i.e ASN:666 and did the translation on the export BGP session on the edge. These filtering policies simply looked for routes tagged with ASN:666 and swapped the tag with upstream’s blackhole community. This ensures that target IP is blackholed on all upstream with minimal config.
  6. We used Ubuntu 16.04 LTS release for this setup. Compiling package was fairly straightforward on it.
  7. Fastnetmon monitors & takes actions for IPs which are mentioned in /etc/networks_list and one can also put certain IPs in whitelist in /etc/networks_whitelist. Rest all config of the daemon is in /etc/fastnetmon.conf, daemon log in /var/log/fastnetmon.log. It also offers an option to log IPs which are under attack for further analysis. If enabled, they are saved in /var/log/fastnetmon_attacks/ with IP & timestamp. This can be quite useful for later analysis after the attack.
  8. One can view live status of fastnetmon daemon from /opt/fastnetmon/fastnetmon_client (location may vary based on installation).

DDoS and Hanuman Chalisa 🙂

In one of my past job, once a vendor showed up offering DDoS protected bandwidth. The head of Network dug for further details and it all boiled down to questions like “What if I get 1Gbps attack?”. Sales guy replied – “Fine sir, it can bear that!”. Network head increased intensity and asked – “what if 5Gbps?” and reply was “Yes, it would be ok”. Next, network head asked – “What if the attack is 10Gbps and all capacity in my transit is choked?”
The local sales guy replied in Hindi – Sir phir toh router se fiber nikal dena, aagar baati jalana and hanuman chalisa padna!  (Sir, in that case, plug the fibre out, light up a joss stick and pray Hanuman Chalisa).  🙂

What he was saying was actually correct. No appliance can help you if attack chokes your upstream bandwidth. Also blackholing may or may not work depending on the application. If the objective is to keep a server live in all conditions, blackholing will simply defeat that objective. Hence none whether it’s an appliance, or scrubbing or blackholing is a “one size fits all” solution. It totally depends on the application, level of SLA, cost which end user is ready to pay for protection etc.

So that’s all about it. Do write back or comment if you have faced DDoS in past, and solutions you used.

3 thoughts on “Ultra fast automated DDoS detection & mitigation

  1. Anurag, we were bombarded with crazy attack in between Aug 15 till Aug 21 like every night from 11pm till 2am or so to particular IP of our radius server or ookla server etc, and all I did was, stopped the internet access to those IP for few hours and then it was getting calm.
    Horrible experience, I was so frustrated.

    Whole network was dancing and going down every second as the core router CPU was at 100%

  2. Fun read as always!

    Our network has a lot of DDoS problems but with inverted perspective. A lot of compromised hosts sending out 100s of megs of UDP floods or TCP SYN floods.

    These get rate limited at subscriber QoS and mostly get dropped at the stateful firewall but it puts a lot of strain on the firewall. Even connection tracking on CCR16 gets choked in worst cases.

    Currently there is no automatic DDoD mitigation in our network and this is a good reminder and guide to get it up soon.

    It’ll be fun doing it for the sending side of DoS!

Leave a Reply