Eager readers want to know if EqualLogic array firmware upgrades are non-disruptive, so we tested it in our mighty remote environment!
Bottom line: The firmware update process halts I/Os during cut-over from one controller to the other. This takes something like 27 seconds (26969 ms). As most operating systems have a disk timeout value of 30 seconds or longer, no one really notices.
Some background and how we tested:
If you have more than one array, and at least the largest array’s worth of free room, then the official way to do a firmware update is to use a “Maintenance Pool”. When you assign a member array to the Maintenance Pool, the volumes on that array will migrate to the other member(s) of your production pool. Depending on how much you have to move this may take a few hours or overnight. No one notices the moves. Once the member is evacuated, update it, reboot it, and put it back into the production pool, and move to the next array. We do this in production and there is indeed no interruption whatsoever, using a maintenance pool works well.
In a single member pool or if you are short of room, using a maintenance pool is not an option. We were curious how bad the interruption would be in a single member pool, so we tested it. In our test vCenter environment, we have one ESXi host connected to one dual controller PS6000 EqualLogic array. VMs live on datastores provided by the array. This is a minimalistic all-your-eggs-in-one-basket configuration.
We followed the usual EqualLogic firmware update procedure. (EqualLogic docs are locked behind their support site, which is unfortunate. If you have an active support contract you can get the docs and firmware there.) We logged into the support site and downloaded the firmware.
Next we started a I/O Analyzer disk benchmark run to put some I/O load against the array.
Then we ran the update procedure from the Group Manager GUI. This is basically 2 parts.
The first part just uploads and stages the firmware. This is not at all disruptive. I/O Analyzer didn’t notice anything.
Then to activate the staged firmware, you have to reboot the array. We did this from the command prompt.
The reboot happens intelligently. First it reboots the (non-active) secondary controller, and applies the update. The was-secondary, soon-to-be-primary, controller reboots with the new firmware.
Once the updated secondary comes up, the array fails over to it. This is where a small hiccup is noticeable. I/Os stopped for about 27 seconds during the cut-over (26929 ms). This is short enough that no actual grief ensued.
The now-secondary, was-primary, controller then reboots and applies the firmware.
The I/O Analyzer continued to happily chug along during the hiccup, as well as the vCenter Windows VM, both of which were active during the update. Neither seemed to notice the hiccup, neither logged anything related to disk issues.
All in all the update process was pretty painless, even with the array under load.