(a nice but partly failed try)
Complex web applications may lead to complex NetScaler configuration. And sometimes an administrator may get lost troubleshooting complex websites, especially sites using content switching.
This is an example of a real world website: The portal page is assembled of several independent web applications. Each application is hosted on a specific group of load balanced servers. There are rewriting policies replacing some content on a website, there are also rewriting policies on a global base (and responders, URL transformation, FEO optimization, app firewall, caching, …). Some of the global and some of the server specific content, was not replaced like desired, but some content gets replaced. The current configuration is confusing the admins, and it also confused me.
Main problem here: I can’t look into traffic between a content switching and a load balancing vServer, so I can’t see what’s actually going on in here. Second problem: there is a total of 800 rewriting policies. That’s confusing me, there are too many for me, I can’t keep track of all these policies, I simply don’t remember what they are good for and where they got bound too!
The current solution also used NetScaler MAC based forwarding, but MAC based forwarding had partly undesired influence on some of the load balancing vServers, and on the NetScaler as a hole as it blows up the TCP connection tables (by adding MAC addresses to it).
That’s where admin partition came in my focus!
We got admin partitions in NetScaler 11 (10.5e), a possibility to split up a NetScaler into several “virtual” ones. That’s great. I made up my mind to put each load balancing server into a specific admin partitions while I let the content switching vServer in the default (root) partition.
This is a sketch of solution I desired:
The first big problem I faced: two partitions can’t connect into the same subnet. This had been a must have as I would not have been able to change the current networking and routing configuration in a 10,000+ server data centre without an excessive change process lasting for several month. So we stopped here, almost a year ago.
The new version 11.1 offers a feature called partition shared vLan; this seemed to be the solution! So I tried to set up vLan 1 as Partition shared vLan. This was impossible. I guess, vLan 1 is not a real vLan at all. It’s not comparable to the rest of vLans, but I actually don’t really know.
But I could create a vLan, make it a Partition shared vLan, and bind it to the interface.
add vLan 1000 -sharing ENABLED -aliasName PartitionShared_vLan
(so we add vLan 1000 with partition sharing enabled. You may skip the alias name, but I always like to add some documentation)
bind vlan 1000 -ifnum 1/2
(we bind this vLan to the designated interface)
Next step: Let’s create the partitions
add partition WebServerApp1
(This partition will be used for a webserver of app1, so I’ll call it WebServerApp1)
Open this partition, scroll down to network isolation, click add binding and select vLan 1000
click on VLANS
and bind vLan 1000
bind partition WebServer -vlan 1000
Currently you can’t unbind vLan 1.
I repeat this step for all admin partitions desired. Now I can put all of my load balancing servers into dedicated admin partitions.
Currently there are several restrictions about NetScaler basic and advanced features in admin partitions:
Restrictions about admin partitions in NetScaler 11.1 build 48.10
|default partition||Admin partition|
|SSL Offloading||SSL Offloading|
|Load Balancing||Load Balancing|
|Authentication, Authorization, Auditing|
|HTTP compression||HTTP compression|
|Content Switch||Content Switch|
|Integrated Caching||Integrated Caching|
|Web Logging||Web Logging|
|RIP Routing||RIP Routing|
|IPv6 Protocol Translation||IPv6 Protocol Translation|
|EdgeSight Monitoring (HTML Injection)||EdgeSight Monitoring (HTML Injection)|
|ISIS Routing||ISIS Routing|
|Content Accelerator||Content Accelerator|
|Http Dos Protection|
|Global Server Load Balancing|
|OSPF Routing||OSPF Routing|
|BGP Routing||BGP Routing|
|NetScaler Push||NetScaler Push|
|Front End Optimization||(missing in GUI)|
|Large Scale NAT||Large Scale NAT|
|RDP Proxy||RDP Proxy|
A comparison of features may be found here. (Thanks, Balaji, to provide this link)
So there are currently serious ones missing in admin partitions! I highlighted some I was interested in. To me the ones I miss most are App Firewall and Front End Optimization. I would have put this into admin partitions, as this is done on a per application base. I don’t miss Surge Protection, Http Dos Protection and Priority Queuing as this is done during connect on the content switching vServer.
This project does not use NetScaler Gateway. So NetScaler Gateway missing is no problem for me, however I missed the chance to isolate NetScaler Gateway in many other projects. NetScaler Gateway is usually governed by other departments, so it should be in a separate admin partition. Our beloved NetScaler will degenerate into a battle ground between the application delivery and the network group, if we can’t completely isolate it.
I suddenly faced a strange problem (why did it not work?):
Simple: I could not communicate from default partition to WebServerApp1 admin partition. It was a completely impossible thing to do. I tried to send ICMP packets from default to WebServerApp1 admin partition, but without success. Even ARP didn’t work at all.
I started monitoring, both from NetScaler using NSTrace and from a switch board (an other restriction here: NSTrace is only available from command line inside an admin partition, it does not exist in GUI).
I set up a switch board for monitoring. Pinging from default partition to 10.0.1.10 (the vServer inside the admin partition), I saw ARP requests going out of NetScaler, but no ARP replies coming back from the admin partition. Same the other way round. However I could ping all IPs from both partitions from an external server (i.e. 10.0.1.100) and vice versa. My networking problems seem to be internal to NetScaler only.
I added a static ARP entry into default partition for 10.0.1.10 and 10.0.1.1 into the WebServerApp1 partition and tried again. No success.
Sending packets between admin partitions is currently not possible!
I also added virtual MAC addresses to the partition. No success either. There is something spooky going on inside a NetScaler’s internal networking logic making admin partition to admin partition traffic an impossible thing to do.
My current work around is a router VM based on VyOS. I could fix all of my problems by now, I love my deployment, but I hate this tiny little VM: it should simply not be there!
Comments (and a possible solution) are highly welcome …