How to protect a website using Citrix NetScaler?
Well it seems to be easy. A nonsense question. We may use AppQoE (Application level Quality of Experience), a feature introduced with NetScaler version 10, so it’s quite an old feature. Let’s start.
My first starting point was E-Docs. Let’s be honest: the guy in charge did not really know what he had been talking about. No break through from there. Citrix Blogs! Unfortunately AppQoE is not their pet child. I found just one single article about this subject (telling me AppQoE is great). Raghu Varma Tirumalaraju did not even mention this feature in his book. Marius Sandbu did, however he did not even go as far as E-Docs.
So there seems to be no information on a feature everyone needs?
That’s a bit of surprise as DDOS attacks had been very prominent in media. Think of Twitter, Spotify and some more being down last summer?
So let’s start the other way round: I will (D)DOS a test website. I googled for tools and found LOIC. LOIC is widely used by groups like Anonymous to DDOS servers. LOIC is a quite simple tool sending simple HTTP requests to a server of choice. I loaded LOIC (don’t even think of doing so in UK, they’ll send you to prison immediately, most countries would allow you to use it against your own servers) and fired some thousand requests to my test server, but the server was still responsive. In fact, it was hardly affected. I found some thousand messages in
Mar 2 13:28:06 <local0.info> 10.255.0.250 03/02/2017:13:28:06 GMT 82e6de130138 0-PPE-0 : default APPFW AF_400_RESP 49586 0 : 172.19.54.5 559567-PPE0 - test.lab http:/// Bad request headers.No host header <blocked>
Yes! NetScaler blocked all LOIC’s requests, they didn’t pass through. It had been my WAF (Citrix NetScaler Web Application Firewall) protecting my web server.
Damn good news, thanks, WAF!
Protecting from LOIC is an easy one, you could also protect your web server using Citrix NetScaler responder policies on standard edition. Simple do a drop if
HTTP.REQ.HEADER("Host").EXISTS.NOT. There may be some other headers missing like User-Agent, Accept, Accept-Language, Accept-Encoding. So you might end with an expression like
HTTP.REQ.HEADER("Host").EXISTS.NOT || HTTP.REQ.HEADER("User-Agent").EXISTS.NOT || HTTP.REQ.HEADER("Accept-Language").EXISTS.NOT. In the end, you would check if requests coming is are in compliance with RFC 2616.
Checking for compatibility to RFC 2616 may also be done by using built in HTTP profile “nshttp_default_strict_validation”
Just a, maybe stupid, question: Why do they use a simple tool like that? It’s easy to use, free, can get triggered via IRC. It’s not that high sophisticated, but they don’t care, they just hack some more machines (or IOT devices). It does not need the web server to crash, it’s enough to overload the ISP. Mastercard, Visa, PayPal and some more might have survived the operation Avenge Assange attack in 2010, but who cares if their internet connections suffers under endless congestion queues causing all packets to be dropped? So you see: Even a great thing like a NetScaler would not have been of any help.
Anyway LOIC was not my tool. I changed to HULK. HULK is just some smart lines of Python 2.7 code. The requests look pretty much like legitimate requests, so they pass my WAF easily and kill my server. They can do even more, you have to specify a target URL. HULC ist launched like that:
c:\hulk.py http://test.training.lab/search/showallitems.php. It’s quite easy to change the code, if you need to do so. HULC also adds random parameters, and uses random headers, so successive requests don’t look the same. It would be hard for a NetScaler to defend a website from a HULK attack by simply using responder policies, especially if someone would randomize the URL in use. This would be just some more lines of code in HULC, not a big deal, even for a non experienced Python programmer like me. (My approach would be a responder policy using the Citrix NetScaler Rate Limiting feature to find the most frequently used Source IP or URLs and block them).
So let’s go into Citrix NetScaler AppQoE.
Citrix NetScaler AppQoE in a nut shell
AppQoE means Application level Quality Of Experience. It’s well hidden as you’d suppose it to be in Security, however it’s in App Expert.
In AppQoE Citrix blended some features like priority queuing. Queues are an important part of AppQoE. It simply uses two queues: one for not known legitimate requests, one for well known legitimate requests. Policies are used to assign users to certain queues. There are four levels:
During regular work queues are not used, all requests get treated equally. If the NetScaler considerers itself to be under attack it starts using priority queues. So we have to tell NetScaler how much traffic we can (or are willing to) handle. We turn on AppQoE if traffic exceeds these limits. Maybe its the most difficult thing! Be careful to not cut down your Christmas business!
If we are under attack we have to find out, if the request is legitimate, or not. It’s a bit tricky, as some of these bad guys out there are quite smart. We have two methods: one is transparent to the user (a Java script is run on client side), however there is a chance for the attacker, the other one presents a captcha to the user, so the user has to prove he got a brain. This is a huge obstacle to overcome. The method is defined inside of policy actions. If the user is a legitimate one we start a session for this user and assign it to a privileged queue.
If you click on AppQoE feature on the left plane you see Configure AppQoE Parameters.
AppQoE general Parameters
What do these parameters mean?
- Session Life (secs) We will remember this user as a legitimate user for XXX seconds. It defaults to 300s, 5 minutes. Don’t set the session life parameter too low. Setting it lower than a minute may result in endless loops of testing weather this client is legitimate or not, as the session timed out during testing, so the results are no more valid. I tend to not change it. Never go below 60. On the other hand, an attacker would just neet to solve the captcha once per bot id the value is too high.
- Average Waiting Client queue would not grow beyond a very low number in normal use: Your web servers should be fast enough to handle all legitimate traffic. That’s why I tend to lower this value. Sizing depends very much on web site useage. How many client requests may wait until we consider this website to be under attack? It defaults to 1000000.
- Alternate Response Bandwidth Limit(Mbps) how much bandwidth may all responses waste (default 100 MBt.). We send alternate responses as soon as we exceed this value.
- DOS Attack Threshold The everage number of clients that can queue up without triggering DOS protection. It defaults to 2000. If your website won’t need to support more than 50 you would add 50 (don’t forget to add some spare users for Christmas business!). Don’t set it too tight, exceeding this value may influence user experience.
Like any policy it got an expression and an action. The expression may either be a simple true value, or, if you need to be a bit more granular, be based on type of content or URL. Be sure to not exclude content on your server. This would allow a successful attack!
There are three types of actions, and you can’t change the type after creating the policy, so spend a moment to think of it:
- None means no action is taken when a threshold is reached.
- ACS means there is an alternate content source (an other LB vServer), so NetScaler won’t handle that traffic. This is what my screen-shot shows.
- NS means, traffic is handled by NetScaler. I will go with this one.
We may use an other TCP profile than usual in case of an attack. It can be used to use advanced TCP features to mitigate damages.
The (high, medium, low, lowest) may get used to split up traffic and handle more important traffic prior to unimportant one. So, for example, we could create a policy handling movies (
HTTP.REQ.URL.ENDSWITH("AVI")) and put them into low priority, as wa are under attack and suffer under a lack of bandwidth, and more important data into medium or high. Keep aray from lowest as lowest is also used for malicious requests.
When the policy queue size, that means the number of requests queued for this policy, reaches Policy Queue Depth threshold value, subsequent requests are dropped to the lowest priority level. That means, they don’t respect your priority settings any more. Different to Queue Depth this one is about a policy, not about a vServer as a hole.
The Queue Depth threshold value is a per priorirty level. If the queue size (number of requests in the queue of that particular priorirty) on the virtual server increases to the specified queue depth value, subsequent requests are dropped to the lowest priority level. That means, they don’t respect your priority settings any more. Different to Policy Queue Depth this one is about a vServer, not about this specific policy.
If the Maximum Connections number is exceeded we place even legitimate requests into lowest priority queue. Either this one or Delay has to be specified.
If the Delay (microseconds) is exceeded we place even legitimate requests into lowest priority queue. Either this one or maximum connections has to be specified.
The Alternate Content Server Name* is the name of a vServer if you use ACS. Requests will get sent to this server instead of using the original one. It’s a kind of sorry server. The Alternate Content Path* is the path on this server, for example
The Custom File is is a file displayed to users considered to be illegitimate. It’s an optional parameter. You don’t specify it. Use the menu item under AppQoE on the left side to import a file.
The DOS Action can be a SimpleResponse, or a HICResponse:
- SimpleResponse is a response containing a Java script. The client needs to have Java scripts enabled. This script dies the maths and returns the result to the NetScaler. The NetScaler will treat this user as a legitimate one if the session is not already timed out. Advantage here: The user won’t see what’s going on. Disadvantage: A bot may also be able to run the script.
- HICResponse is a response containing a rather simple captcha. The user has to solve the problem. It’s quite difficult to trick this one.
There is a prerequisite for this. You need to open up a putty session, then:
> shell Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. root@82e6def331a4# nsapimgr -ys appqos_captcha=1 Changing cfg_appqoe_captcha_enable from 0 to 1 ... Done. root@82e6def331a4# echo "nsapimgr -ys appqos_captcha=1" >> /flash/nsconfig/rc.netscaler root@82e6def331a4# > _
I did not read anything about logging. That’s a pity. In fact I could not find out weather my NetScaler supposes itself to be under attack or not. Wait, there is one thing:
stat appqoe policy <policyname>
Output may look like that:
> stat appqoe policy AppQoE_pol_2delete AppQoE Cumulative Policy Statistics Summary Cumulative Policy Stats: Counter Rate(/s) Total Server TCP connections 0 0 Client TCP connections 0 0 Requests received 0 0 Requests bytes 0 0 Responses received 0 0 Response bytes 0 0 ThroughPut(Kbps) 0 0 Alternate responses sent 0 0 Alternate responses bytes sent 0 0 Timing Stats: Counter Rate(/s) Total Average Client TTLB 0 0 Average Server TTLB 0 0 Average Server TTFB 0 0 Done >
> stat appqoe policy AppQoE_pol_2delete AppQoE Cumulative Policy Statistics Summary Cumulative Policy Stats: Counter Rate(/s) Total Server TCP connections 4 2051 Client TCP connections 4 2154 Requests received 4 2051 Requests bytes 1251 733826 Responses received 4 1916 Response bytes 9489 5048057 ThroughPut(Kbps) 84 587 Alternate responses sent 113 576640 Alternate responses bytes sent 414706 1610897508 Timing Stats: Counter Rate(/s) Total Average Client TTLB 0 0 Average Server TTLB 0 0 Average Server TTFB 0 0 Done >
Means: under Attack. Same policy, same amount of traffic, but different parameters.
Unfortunately we are not yet finished, continue reading with part 2
I hope this helps by a little bit. And I hope Citrix will do a better job in future about explaining this – rather important – feature!