What Would Brad Do: 2014

Thursday, July 24, 2014

iLO 4 causes PSOD

A majority of the environments I work on are HP, Cisco UCS, or Dell servers these days. Not in any particular order, but that just seems to be what I see the most of. I have been seeing a lot of activity related to an issue I have not personally run into but HP has released a tech bulletin on. That issue is related to HP's iLO 4 firmware. The issue seems to be the following;

"HP iLO 4 may experience Intermittent Non-Maskable Interrupt (NMI) Events on Proliant G8 Servers with HP iLO 4 firmware versions 1.30, 1.32, 1.40 and 1.50."

If you experience these NMI events it is possible for bad things to happen on your OS

VMware ESXi hosts may experience Purple Screen of Death (PSOD)
Linux Operating Systems will display a Message indicating an NMI occurred
Microsoft Windows will experience a Blue Screen of Death (BSOD)

HP's Tech Bulletin concerning this issue can be located here

HP's solution to this issue is to update to iLO 4 version 1.51 or later, 1.51 can be found here

Friday, July 18, 2014

VTUG New England Summer Slammer 2014!!

Big thanks to everyone for coming to the Focus booth at VTUG yesterday. Also a special thanks to those who sat in on my Troubleshooting tips for View breakout session.

-Brad

Tuesday, April 15, 2014

Heartbleed issues in virtual environments

I've had a lot of customers email in asking about openSSL heartbleed related concerns after the announcement last week. There is an informational website setup here to explain the issue in detail.

First thing I want to say here is that this is information that I've been able to track down, but in no way is this a complete list of everything effected. It is merely a list of items I currently work with that happen to be a part of a lot of my customers solutions. This issue is widespread and evolving quickly so there will obviously be changes to what is listed in this article.

OpenSSL released a security advisory on this issue;

"OpenSSL Security Advisory [07 Apr 2014]
========================================

TLS heartbeat read overrun (CVE-2014-0160)
==========================================

A missing bounds check in the handling of the TLS heartbeat extension can be
used to reveal up to 64k of memory to a connected client or server.

Only 1.0.1 and 1.0.2-beta releases of OpenSSL are affected including
1.0.1f and 1.0.2-beta1.

Thanks for Neel Mehta of Google Security for discovering this bug and to
Adam Langley <agl@chromium.org> and Bodo Moeller <bmoeller@acm.org> for
preparing the fix.

Affected users should upgrade to OpenSSL 1.0.1g. Users unable to immediately
upgrade can alternatively recompile OpenSSL with -DOPENSSL_NO_HEARTBEATS.

1.0.2 will be fixed in 1.0.2-beta2."

Basically this issue allows somebody to grab 64k chunks of data out of memory on a server utilizing OpenSSL. As a result this data could be used to figure out the private key associated with certificates used to secure content. The result of this would be the private key being used to de-crypt a datastream and view what is supposed to be a secure encrypted transmission between two endpoints. Most of the manufacturers are scrambling to figure out a solution to this issue. I will provide some info and useful links to the Vendors we commonly work with to supply our customers with top of the line solutions.

VMware

VMware has released information on this issue including which products are affected by the security vulnerability. This official VMware blog post has some info, however you'll want to view this KB article to see the products and info affected.

Products Affected include

EMC

EMC has posted an advisory here you'll need powerlink/support credentials to get in to view the advisory. The vast majority of the products are NOT affected. The list of effected products is

Cisco

Cisco has also released a statement on this issue as well. They have a preliminary list of devices that are affected, seen below, but follow the link in above to get to the statement. The list will likely changed, they have gone through the entire portfolio yet. Most notably the Cisco UCS platform seems to be in the clear.

Hewlett-Packard

HP also released a statement, not with much detail, however they have ruled out some of the product line. The statement can be found here.

Teradici

Anyone using VMware View and Zero Clients should note that the PCoIP Management Consoles from version 1.9.0 to 1.10.0 are effected. An upgrade will fix this. More info here

Microsoft

Last, but not least this time is Microsoft. Not surprisingly Microsoft products seem to be uneffected because they don't use openssl typically for anything. IIS, among other secured products in their portfolio, do not use OpenSSL and is therefore uneffected.

In closing, I would recommend following the advice of the manufacturer in resolving the issue. Also if you have management devices that are affected that are on a private VLAN don't worry as much about them because you have physical control over who's accessing them. Start with your most public facing devices and work your way back into the network.

Wednesday, March 19, 2014

VMware View 5.3 with NVIDIA Grid Technologies with vSGA

I recently had the experience of working with the NVIDIA Grid technologies within VMware View 5.3 using the, now production, vSGA technologies. vSGA is basically the ability to share GPU's and VRAM (I know everyone hates this term from the licensing debacle VMware had, but now VRAM means video RAM, forget about the old use of the term). It took a bit of research to figure out the basic steps to make the Grid cards useful within the ESXi host.

Surprisingly enabling the 3D graphics support was really easy. There's an option on the VM to enable 3D graphics support, then you need to enable it on the View Desktop pool that you want to use it with. Before any of this gets done we have to prep the ESXi host to be able to use the adapter. You can take a read through the deployment guide to get all the details but I'll give you the quick and dirty version;

First we need to build our host and get the driver loaded, the driver can be downloaded from NVIDIA

Install ESXi 5.5
Put host into maintenance mode
Load the NVIDIA Grid VIB

esxcli software vib install –-maintenance-mode –d /vmfs/volumes/VNX_SAS_ISO/NVIDIA-VMware-x86_64-304.59-bundle.zip” (replace file path if necessary)

Reboot host
SSH to host
Check to see if the “Xorg” service started

Next Check to see if the Driver associates with the correct card

Next we want to check that the GPU VM sees the card and is loaded and managing the card

This process has to be done for each host and basically at this point you can deploy your parent VM (be sure to take your snapshot and that the View agent is loaded). Enable the 3D graphics on the virtual video card of the VM and enable the 3D graphics on you desktop pool. After all that is done you can check to see if utilization goes up and down on the GPU's in the card. You should see the volatile GPU-Util % should fluctuate as GPU is needed.

I've found this makes a significant improvement on graphics performance for a view session.

Hope this is helpful

-Brad

Saturday, February 1, 2014

Ever wonder if you could change an EMC VNX Unified to a VNX Block

I want to start by saying I know this is kind of crazy and doesn't sound like it's even supported. It also probably leaves you wondering why would anyone do this? Well the situation was interesting and the quick answer is normally you wouldn't. That said, I recently had a customer who had a VNX unified and they were re-purposing the unit for something other than what they originally bought it for. That new use case had no need for the unified components, which also consumed about four additional rack units. The rack units mattered in this case because the device is in a colocation and space costs lots of money.

I figured we should start by looking at what makes up a Unified VNX 5300, the model that I worked with. Below is an image that outlines the base components involved with a VNX 5300.

Basically what we were looking to do was to remove all the components in the blue box. The control stations and DME take up four rack units, and also consume additional PDU outlets. The problem is if we just unplug these devices Unisphere will alert us that the Data Movers are down and the control stations are unreachable. Also with a VNX unified we have additional option that aren't required for block functionality. Bottom line is just unplugging the components would not be feasible.

Now lets get into the process I used to solve this issue. First I knew that if we wanted to get the array to come up as block only we were going to have to reset the device back to factory defaults. This was really the main process to getting the unified parts removed. The process for re-imaging a VNX is documented and typically used when you have had some sort of problem with configuration coming from the factory. It is technically a field service task that can be done. Several of our engineers have had to do this before, which is how I got the idea to do it in the first place. If you can't get the instructions or required files for re-imaging you probably should contact your EMC Service provider (hopefully Focus) and get some help. The key to remember with the re-imaging is to disconnect your DME and CS1/CS2 before doing the re-image. This way when the re-image is complete it'll never know the Unified parts were there.

After your re-image is done you'll have to run the unisphere storage system initialization utility to basically perform the initial install as if it was a new array, because it thinks it is! After your initialization you'll likely have to update the OE to latest version. Also beware that when you do this you'll need to have all of your enablers on hand because the re-image wipes those out, so they'll need to be re-installed. That's pretty much it for the hard stuff. You'll still have to setup your cache and storage groups but you're basically at square one.

One other big deal here, especially if you're trying to save rack space like I was, you'll need to order replacement rails for the DPE and SPS. The reason for this is on a unified the main systems components have an integrated rail system that doesn't break into pieces.

-Brad

Tuesday, January 14, 2014

VTUG Winter Warmer

Hope everyone is enjoying the warm weather up here in the Boston area. Just wanted to post a reminder that the Virtualization Technical Users Group is coming up on Thursday. I hope to see you all there, please stop by our Focus Technologies booth. Also I will be doing a breakout session in the afternoon as well. My breakout will be on the integration of the EMC Storage Analytic Suite for VNX block, complete with Demo. I will also have some supporting slide materials and screenshots on VMAX and VNX Unified.

-Brad