What Would Brad Do: 2012

Wednesday, August 15, 2012

PowerCLI for the rest of us.....

So if you're not using it already powershell can be a very powerful and useful utility. Powershell can save you many hours of digging through lists and trying to export data out of graphical interfaces. It can also be used to make configuration changes to a large group of objects you can't normally select all together. I'm going to give a few basic examples of useful powershell commands. First we'll cover what tools you should get in order to properly use powershell and make it as easy as possible to write and run your own scripts.

The very first thing you need to do is download the latest version of VMware PowerCLI, which can be found at the VMware site in the downloads section.

PowerCLI Download

Second thing I would recommend is bookmarking the PowerCLI command reference guide

PowerCLI Command Reference guide

Third thing is optional but something that I personally do. Get a script editor that has some integration with PowerCLI powershell snapin's. The one I use is open source and called PowerGUI. This will allow you to "tab complete" some commands etc.

PowerGUI

So lets take a look at some of the basics. The one of the better definitions I've seen out there comes from the Powershell Wiki;

Windows PowerShell is Microsoft's task automation framework, consisting of a command-line shell and associated scripting languagebuilt on top of, and integrated with the .NET Framework. PowerShell provides full access to COM and WMI, enabling administrators to perform administrative tasks on both local and remote Windows systems.

In PowerShell, administrative tasks are generally performed by cmdlets (pronounced command-lets), specialized .NET classesimplementing a particular operation. Sets of cmdlets may be combined together in scripts, executables (which are standalone applications), or by instantiating regular .NET classes (or WMI/COM Objects).^[2]^[3] These work by accessing data in different data stores, like the filesystem or registry, which are made available to the PowerShell runtime via Windows PowerShell providers.

Basically you can use this object orient platform to make calls from all sorts of different things and use them to perform all kinds of different tasks. As far as vSphere is concerned this can be very useful for SRM command integration or simple tasks such as figuring out what VM's have a snapshot and how old is it. For purposes of this post we're going to look at some basic commands and cmdlets. I'll also give a few simple examples.

Before we can do anything we'll need to open a powershell window which can be done by going to start menu > all programs > vmware > vmware vsphere powercli > vmware vsphere powercli. This brings up what looks like a normal command window. It has the ability to do powershell windows commands right now but you will also see a message at the top of the screen as depicted below

This is normal it's just programmed to add the VMware powershell snapin every time, in case it's not loaded. There are a few commands that we should set to make sure we get good operation for the tasks that we perform;

Set-ExecutionPolicy unrestricted -Confirm:$false

The set-execution policy command will allow us to run scripts that aren't "trusted" if we create them on our own. See Microsoft for more detail.

Set-PowerCLIConfiguration -DefaultVIServerMode Multiple -Confirm:$false

The Set command above will allow us to connect to multiple vcenters or esxi hosts

We can also use these same commands in our powergui as well. If we open powergui you'll see three panes one for the script you are constructing, one for the commands that are run and to run/test commands in, and the other has some syntax info in it.

Next we need to connect to something to run commands against. You can connect to either a vCenter or an ESX(i) host directly by using the following command

connect-viserver <ip of host> -User <username> -Password <password>

Now that we are connected the first basic task that we need to perform is collecting some objects to do something with. This type of cmdlet usually starts with a "get-". There are a variety of these, see the powercli cmdlet reference for the full list, but lets do something simple like "get-vm". This will return a list of vm's

You can see a full list of VM's and this is what's refered to as the default view, meaning these are the default things that are displayed to you. Name, Powerstate,NumCPU, and Memory. Let's say we wanted to see something different about the VM's. You can use the command "get-vm | get-member" the "get-vm" part gives the list of the VM's and the "|" passes that list to the "get-member" command. This will prompt powershell to give you a list of all the valid Methods and Properties on the "get-vm" objects.

Now we can change the command to make our own custom list. Let's say we want the names of the VM's and the esxi host they reside on;

get-vm | select-object Name,VMHost

This gives you a different view of the same list as before, notice the columns are different and all the values displayed back to me are only what I selected to see with the "select-object" option

Lets say I wanted to share this list with some co-workers. Powershell offers the ability to export this output to something easy to work with such as a CSV file saving it in the reports folder on my C drive, this can be accomplished by issuing the following command.

get-vm | select-object Name,VMHost | export-csv "C:\reports\vm-host.csv"

If you look in C:\reports\vm-host.csv you'll see two columns and one will have names the other will have hosts in it.

Now lets take a look at our list of VM's and figure out how much memory is allocated to all of the VM's we have built. We can use a cmdlet called "measure-object" which will allow us to do all sorts of statistical calculations on the numbers that are returned. We can "-count" the number of objects, "-sum" the objects, "-average" the objects, or find the "-min"/"-max" values.

get-vm | measure-object MemoryMB -sum

this default view shows how many objects there were and what the sum was.

There are many great ways to use this data, and many more advanced functions you can perform. Apply any scripting knowledge you have to this and it can help to accelerate the tasks you would normally have to perform by hand. Below are a few other single line useful commands I've found helpful;

Gets a list of VM's that have snapshots and the list of snapshots/powerstate of vm/Name of snap/Size of snap in MB;
Get-VM | Get-Snapshot | Select-Object ParentSnapshot,Powerstate,Name,SizeMB

Gets a list of VM's who's CD-ROM's are connected;
Get-VM | Where-Object {$_ | Get-CDDrive | Where-Object { $_.ConnectionState.Connected -eq "true" } } | Select-Object Name

Set DNS servers on esxhosts
Get-VMHost | Get-VMHostNetwork | Set-VMHostNetwork -DnsAddress <DNS1>,<DNS2>

Tuesday, August 7, 2012

Teradici PCoIP Firmware 4.x GoDaddy.com Certificate issues

I came across an interesting issue with a GoDaddy.com certificate. As described in previous posts you need to upload your Root and Intermediate certificates to the PCoIP devices connecting to VIEW. If you do not you'll see an error message when connecting to your connection servers saying "the certificate is not rooted". Typically when you see this message all you have to do is locate the intermediate and rootCA that signed your broker certificate and upload it, however people are seeing issue with some intermediate/root CA's. I think that these messages in the thin client log are the link to the problem;

08/06/2012, 16:15:25> LVL:1 RC:-510 X509_UTIL :get_issuer() failed!

08/06/2012, 16:15:25> LVL:1 RC:-510 MGMT_CERT :ERROR: tera_x509_util_get_tree failed for certificate 1

08/06/2012, 16:15:25> LVL:1 RC:-510 MGMT_CERT :ERROR: certificate is not valid (tera_mgmt_cert_add_certificate_by_index)

08/06/2012, 16:15:25> LVL:1 RC:-510 MGMT_CMI :ERROR: tera_mgmt_cert_add_certificate failed!

08/06/2012, 16:15:26> LVL:2 RC: 0 GSOAP :SOAP 1.2 fault: SOAP-ENV:Sender [no subcode]

08/06/2012, 16:15:26> LVL:2 RC: 0 GSOAP :"Failed to add certificate to certificate store" Detail: [no detail]

08/06/2012, 16:15:26> LVL:0 RC: 12 MGMT_CMI :Error serving SOAP request!

It appears that the Teradici Firmware is expecting content in certain fields of the Certificate, and GoDaddy is not providing them in this case. In fact when you upload the godaddy cert to the teradici management appliance they look incomplete compared to a verisign, as you can see in the image below

After a bit of research I've found that this is a known issue with the 4.x release of the PCoIP firmware. You can find the KB article here. The issue is not limited to just Godaddy Certs, a VMware Community article found here shows others having this issue with other cert vendors. The community article also contains the fix, which is opening a ticket with Teradici Support. Apparently the only way to resolve this is using a Release Candidate of the next firmware revision.

Wednesday, June 13, 2012

Teradici Firmware version 4 and Certificates

After upgrading to VMware VIEW version 5.1 it has become apparent that certificate configurations are a huge part of getting the environment functioning at all. The VIEW Installation guides are good resources just read carefully. This post is relative to the teradici thin clients like the Wyse P20's or Samsun All in One monitors. I have seen a lot of posts in the communities on this certificate topic and what certificates are required to be stored on the Teradici based devices.

First we need to understand how certificates work. The commercially signed certificate we get from Godaddy, Geotrust, Verisign,etc. is usually signed by an intermediate certificate, which is signed by a root certificate. This is commonly referred to as the certificate "Chain". By default in Windows we are blind to this because most major commercial certificates are pre-populated in our "Trusted Certificate Store" within windows, because Microsoft decided that we should trust those people they pre-populate for us. Trusting somebody's SSL cert is all about validating the chain. No exception is made in VIEW.

In the Teradici firmware they are of the mindset that we, the administrators, will decide who to trust. As a result they have included 0 pre-populated root or other certificates. If you do not do this you may recieve teh error message that says something to the effect of "the certificate is not rooted in the local devices certificate store" on the PCoIP thin client. We need to put whatever certs we want to trust into the device. This can be done in two ways;

Through the device directly - we would login to the web interface of the device and select the Upload > Certificate at the top of the screen.

This will bring us to a selection screen to upload the certificates of our choice

Second we could do the same thing in the PCoIP management console by importing the certificates into a profile

At this point you may be thinking, this is great but where do I find these certificates? And here is my answer. It depends. I'll give what I think is going to be the scenario you'll find in most VMware VIEW deployments but there are a variety of ways to obtain the Root, Intermediate and client cert. By the time you get to this point you will more than likely have upgraded your view environment. If this is the case you'll have imported the Commercially Signed Certificate into your connection brokers. This is a great place to get this information. Visit the URL for your connection server, as if you were connecting to it to be provisioned a desktop.

Click the little lock in the address bar. It will be in different places depending on the browser I'm using Chrome
Next you'll see a certificate information link and you'll want to click that, which brings up a familiar box containing certificate information
Now we want to grab the two certificates listed on the top two lines. These are the intermediate and the root certificate, which is the whole certificate chain. You'll do this by selecting each one, one at a time and choosing view certificate
Choose the Details tab and select copy to file
This launches a wizard which will allow you to export the certificate. Choose the correct file type, like in the picture below.
After the file is saved it will save as a .cer file, simply rename it to .pem. It needs to be .pem for the teradici appliances to understand it. Follow the instructions above to upload and you should be good to go.

Monday, April 9, 2012

VMware VIEW connection broker sizing

I've been working with VMware VIEW since it was initially released and as you can imagine the system requirements have changed quite considerably over the years. I remember in version 1.0 when a 1vCPU VM with 2GB of memory was just fine. Upon review of the documentation associated with VIEW 5.0 it turns out the reconfigured requirements are much much higher. If sizing is not properly done it can actually cause application issues that require the re-installation of the application even if you correct the vCPU count and Memory count on the VM. The current broker recommendation in VIEW 5 is 4 vCPU's and 10GB of memory for any VIEW connection server servicing over 50 VM's. Now the big question I had was why the requirement is so high. I was recently alerted to the problem you can encounter while working with a customer on an upgrade.

It doesn't tell you in the documentation why the 10GB is the number they chose, but I think we may have stumbled across the reason later in the documentation around page 60 or 65 of the VIEW installation guide. It appears that if you start out very small with your memory at say 4GB when you install the VIEW connection server, some java components get installed and configured, and there is a glaring disadvantage. Something called the java heap size. It appears that this is the amount of memory that can be allocated to the JVM for processing transactions related to VIEW. If you use anything under 10GB of memory with a Connection Server your heap size is 512MB. If you size with 10GB or more memory the heap size is 2GB. There is a huge difference here. It appears the only supported way to fix this is to actually uninstall and re-install the VIEW connection server software. As long as you are not in a single connection server environment this shouldn't be an issue. I suspect we can uninstall VIEW and re-install to modify the Java heap size to the 2GB. All configuration data is stored in the ADAM database, so we shouldn't have much to change after re-install. The only thing I can think of is KIOSK mode, just making sure you re-enable it after re-installation. See the KB Article below for more information.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2009877

Screen lag issue with PCoIP on dual monitor

I ran into an issue, that I haven't seen before, with a customer. Big thanks to Karl for helping me out with the solution to this one. Basically the issue is that when you are using a Multi-Monitor system, and you want to drag a windowed application from one monitor to the other. The screen has "Lag" meaning the windows is jumpy or glitchy as you drag from screen 1 to screen 2. This issue can occur when you have;

Windows 7 (32 or 64 bit)
Virtual Machine hardware 8
VIEW agent 5
A multi-display client

The resolution to this is to simply add an entry into the *.vmx file of the VM. The entry is to be added to the end of the VMX file and below is the entry.

mks.poll.headlessRates="1000 100 2"

You can find more details in the VMware KB article located here

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2010359

Friday, March 30, 2012

VMware SRM vSphere replication VRMS issues

I have started to do some testing with the new vSphere replication technology that is now part of VMware Site Recovery Manager. I began working on this in our lab, and encountered what I believe is a bug that many people will run into in their production environments.

In the new version of SRM there is a new feature called vSphere replication, which gives the user the ability to replicate VM's on a per VM basis. This is very beneficial if you do not have simmilar storage arrays or array based replication software at either end. It can also come in handy if you do not desire to do per volume replication and fail over all VM's on a volume.

vSphere replication is done using a vSphere Replication Management Server (VRMS) at each site, which communicates with vCenter. At each site you will also need several vSphere Replication Servers (VR's) to facilitate the actual replication. The problems I have encountered occurs with the communication between the VRMS and the vCenter.

Problem number one - The first issue I encountered was related to the fact that the Certificate that was generated by my vCenter installation a few years ago had expired(by default the installer cert is only valid for two years). I looked up in VMware's KB how to attack this problem and came across the following article, performed the tasks in the article and generated a nice new self-signed certificate and all was good.

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1009092

When I began deployment of my VRMS the OVF file uploaded perfectly and the OS started just fine. I logged into the VRMS and configured a database connection and told it where my vCenter was. When I selected "save and restart service" it told me there was something wrong with the certificate, which is pretty normal for self signed so I accepted it. I went back to the vSphere Client and clicked on the SRM button and selected the vSphere replication section. Then I went and clicked "Configure VRMS Connection on the right side, which popped this message;

I did not read the certificate error message close enough. After things didn't work I had to start investigating in the logs to find out what was wrong. Below is a link to the VMware KB article describing the issue

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2013087

****************************************************************************
Disclaimer: The following procedure is likely not supported by VMware and is a documented process of my own experience to fix an issue that I have personally encountered. Please use this information at your own risk and be careful doing so in production environments.
****************************************************************************

The documentation on how these VRMS's work is a little lacking, I'm sure it's because the technology is brand new. I had to login to the linux kernel of the appliance and do some poking around. Open a console and login with root and the <password> that you specified during the OVF deployment.To save some work for you run this command after logging into the appliance

less /opt/vmware/logs/hms/hms.log

hit the end key and it'll take you to the end of the logs, which you can use the up arrow to scroll through.

We noticed that there were a variety of "MD5 hash" errors in the logs. If you go through the release notes of SRM you will find that SRM does not support use of an RSA MD5 hash algorithm certificate with vCenter, it has to be RSA sha1. In the link that I posted above for the KB article, on generating a new self signed certificate, guess what certificate hashing algorithm the commands they give you generate for your new certificate .......yup..... MD5. Below is the excerpt

openssl.exe req -new -x509 -days 3650 -md5 -nodes -key rui.key -out rui.crt -subj "fqdn_of_VC"

What you need to do is go back and generate a new SSL certificate with the command below

openssl.exe req -new -x509 -days 3650 -sha1 -nodes -key rui.key -out rui.crt -subj "fqdn_of_VC"

Once this has been completed login to the web interface of the VRMS and select the config tab, and unregister from vcenter. Then register with vcenter again. This will popup the new certificate and select accept.

That takes care of one problem. On to problem two

I went back into the SRM console thinking I had resolved the issue and tried the "Configure VRMS Connection" link again, same error message. I went back into the same log again and discovered the below different error messages.

So if I interpreted these errors correctly the VRMS is having trouble validating the certificate chain of the vCenter, but why? I don't have the box checked that all certificates have to be trusted by a CA.

We tried a bunch of different things here from generating new certificates using new keys, to changing types of certificates, nothing worked out well. We took a break and I started digging around to see if I could find the place where the certificates were being checked against. These looked like Java error messages to me so we started looking for Java Key Stores or JKS files. It just so happens that if you look in the following path

/opt/vmware/hms/security

do an ls and you should see some files

hms-keystore.jks
hms-truststore.jks

Apparently the hms-truststore.jks is where the certificates are stored that the VRMS appliance trusts. Basically what you need to do at this point is add the vCenter certificate to this keystore.

1. Download and install an SCP client like WinSCP. This will allow you to copy the rui.cert file off the vCenter to the appliance.

2. Modify the appliance to allow root remote login. VI /etc/ssh/sshd_config and edit the line seen below to say yes.

3. restart the ssh services by running this command

service sshd restart

4. connect to the appliance by IP with the WinSCP client and upload the rui.cert file from each of your vcenter servers (production and DR) into the /opt/vmware/hms/security folder. This can be found in it's default location of "C:\\ProgramData\VMware\VMware VirtualCenter\SSL".

5. Import the certificate into the hms-truststore.jks using the below commands

cd /opt/vmware/hms/security
keytool -import -trustcacerts -alias hq -file hq-rui.crt -keystore hms-truststore.jks
keytool -import -trustcacerts -alias dr -file dr-rui.crt -keystore hms-truststore.jks

it will ask for the keystore password which is "vmware" no quotes

then it will ask if you're sure you want to trust the cert, type "yes" no quotes. You should see something like this

6. Log back into the web interface of your VRMS and select the config tab and unregister, then reregister your VRMS with vcenter.

After a few minutes you should see

check the install certificate box and click ignore, after a few minutes you should see the below changes in the console

This indicates the VRMS is partnered up with the vCenter. The build number and status of "VRMS Servers are not paired" means they are talking, before it said disconnected.

I am now stuck at the point of trying to actually pair the VRMS servers when I select "Configure VRMS Connection" I get the below popup message;

Below is a log excerpt from the SRM logs on the vCenter

2012-03-30T16:48:43.831-04:00 [11884 verbose 'HbrProvider'] Dr::Replication::HbrProviderImpl::SetRemoteInfoFailed: Unable to get remote HMS server info, error=
--> (dr.hbrProvider.fault.HmsServersNotPaired) {
--> dynamicType = <unset>,
--> faultCause = (vmodl.MethodFault) null,
--> localHmsUuid = "d2f5461c-59e4-43fa-b6a0-8fa5f198e9a3",
--> localHmsName = "HQ-VRMS",
--> remoteHmsUuid = "5521a563-205b-4a77-adff-920ead24d7ad",
--> remoteHmsName = "DR-VRMS",
--> msg = "",
--> }
2012-03-30T16:48:46.174-04:00 [08380 verbose 'SessionManager'] Logging out remote site 'site-1024' for session '523a6'
2012-03-30T16:48:46.174-04:00 [08380 verbose 'RemoteSite'] Logging out remote site 'Site Recovery for dr-vc.ftsi.lab', session '523a6'
2012-03-30T16:48:46.174-04:00 [08380 info 'RemoteSite'] Logged out from remote site 'Site Recovery for dr-vc.ftsi.lab', session '523a6'
2012-03-30T16:48:46.174-04:00 [08380 info 'SessionManager'] Remote site 'site-1024' successfully logged out for session '523a6'.
2012-03-30T16:48:46.174-04:00 [08876 verbose 'SessionManager'] Removing session ID for session '52889', remote site 'site-1024'
2012-03-30T16:48:46.174-04:00 [08876 verbose 'RemoteSite'] Removing session ID for remote site 'Site Recovery for dr-vc.ftsi.lab', session '52889'
2012-03-30T16:48:46.174-04:00 [08876 verbose 'RemoteVC'] Shutting down connection
2012-03-30T16:48:46.174-04:00 [08876 verbose 'RemoteVC'] [PCM] Stopping...
2012-03-30T16:48:46.174-04:00 [08876 warning 'VixVcDomain'] VIX connection already logged out
2012-03-30T16:48:46.174-04:00 [08876 info 'RoleRegistry'] Shutting down...
2012-03-30T16:48:46.174-04:00 [08876 error 'RemoteVC'] [PM] Cannot unregister callback for filter token '1' because PropertyMonitor is stopped
2012-03-30T16:48:46.174-04:00 [08876 verbose 'RemoteDR'] Shutting down connection
2012-03-30T16:48:46.174-04:00 [08876 verbose 'RemoteDR'] [PCM] Stopping...
2012-03-30T16:48:46.190-04:00 [08876 info 'RemoteSite'] Removed session ID for remote site 'Site Recovery for dr-vc.ftsi.lab', session '52889'
2012-03-30T16:48:46.190-04:00 [08876 info 'SessionManager'] Remove session ID for session '52889', remote site 'site-1024' is successful
2012-03-30T16:48:46.190-04:00 [09152 info 'vmomi.soapStub[17]'] Resetting stub adapter for server TCP:dr-vc.ftsi.lab:80 : Closed
2012-03-30T16:48:46.206-04:00 [10140 verbose 'SessionManager'] Logging out user 'FTSI\administrator', session '52889'
2012-03-30T16:48:46.206-04:00 [10140 verbose 'Default'] CloseSession called for session id=52889fa2-40dd-b53d-b8b3-0556a7871eb4
2012-03-30T16:48:46.206-04:00 [10140 verbose 'SessionManager'] Closing session '52889'
2012-03-30T16:48:46.206-04:00 [10140 verbose 'LocalVC'] Shutting down connection
2012-03-30T16:48:46.206-04:00 [10140 verbose 'LocalVC'] [PCM] Stopping...
2012-03-30T16:48:46.206-04:00 [10140 warning 'VixVcDomain'] VIX connection already logged out
2012-03-30T16:48:46.206-04:00 [10140 info 'RoleRegistry'] Shutting down...
2012-03-30T16:48:46.206-04:00 [10140 error 'LocalVC'] [PM] Cannot unregister callback for filter token '1' because PropertyMonitor is stopped
2012-03-30T16:48:46.206-04:00 [10140 info 'SessionManager'] Closed session '52889'
2012-03-30T16:48:46.206-04:00 [06880 info 'vmomi.soapStub[19]'] Resetting stub adapter for server TCP:hq-vc.ftsi.lab:80 : Closed
2012-03-30T16:48:48.753-04:00 [05572 verbose 'SessionManager'] Logging out user 'FTSI\administrator', session '523a6'
2012-03-30T16:48:48.753-04:00 [05572 verbose 'Default'] CloseSession called for session id=523a6f71-17e6-c0bb-820c-37ea5e081477
2012-03-30T16:48:48.753-04:00 [05572 verbose 'SessionManager'] Closing session '523a6'
2012-03-30T16:48:48.753-04:00 [05572 verbose 'LocalVC'] Shutting down connection
2012-03-30T16:48:48.753-04:00 [05572 verbose 'LocalVC'] [PCM] Stopping...
2012-03-30T16:48:48.753-04:00 [05572 warning 'VixVcDomain'] VIX connection already logged out
2012-03-30T16:48:48.753-04:00 [05572 info 'RoleRegistry'] Shutting down...
2012-03-30T16:48:48.753-04:00 [05572 error 'LocalVC'] [PM] Cannot unregister callback for filter token '1' because PropertyMonitor is stopped
2012-03-30T16:48:48.753-04:00 [05572 info 'SessionManager'] Closed session '523a6'
2012-03-30T16:48:48.753-04:00 [03668 info 'vmomi.soapStub[1]'] Resetting stub adapter for server TCP:hq-vc.ftsi.lab:80 : Closed
2012-03-30T16:48:48.862-04:00 [05320 verbose 'HbrProvider'] Dr::Replication::HbrProviderImpl::SetRemoteInfoFailed: Unable to get remote HMS server info, error=
--> (dr.hbrProvider.fault.HmsServersNotPaired) {
--> dynamicType = <unset>,
--> faultCause = (vmodl.MethodFault) null,
--> localHmsUuid = "d2f5461c-59e4-43fa-b6a0-8fa5f198e9a3",
--> localHmsName = "HQ-VRMS",
--> remoteHmsUuid = "5521a563-205b-4a77-adff-920ead24d7ad",
--> remoteHmsName = "DR-VRMS",
--> msg = "",
--> }
2012-03-30T16:48:53.878-04:00 [15636 verbose 'HbrProvider'] Dr::Replication::HbrProviderImpl::SetRemoteInfoFailed: Unable to get remote HMS server info, error=
--> (dr.hbrProvider.fault.HmsServersNotPaired) {
--> dynamicType = <unset>,
--> faultCause = (vmodl.MethodFault) null,
--> localHmsUuid = "d2f5461c-59e4-43fa-b6a0-8fa5f198e9a3",
--> localHmsName = "HQ-VRMS",
--> remoteHmsUuid = "5521a563-205b-4a77-adff-920ead24d7ad",
--> remoteHmsName = "DR-VRMS",

VMware has not confirmed with me yet that this is in fact a bug. I will update the post when I have a resolution.

*****************************************************************************
UPDATE With Resolution
*****************************************************************************

I don't know if you would necessarily call this a bug or just something of an installation nuance, but here it is.

So VMware Engineering had the opportunity to review this issue and it turned out to ultimately be related to the vCenter Certificate. In addition to the MD5 hash issue described above you also need to make sure your certificate (Trusted CA Signed or Self-signed) is generated with RSA 2048 encryption. Earlier installations of vCenter are MD5 certificates with RSA 1024 encryption keys. If you encounter this issue the remediation is simple. Uninstall SRM all together, because SRM imports the vCenter certificate and uses it for site pairing and other tasks. Be sure to unregister your VRMS server before uninstalling SRM Server. Next step is to generate new certificates, I used openssl here are the commands;

Generate new key and CSR (CSR is needed if requesting a trusted CA Cert):

openssl req -out CSR.csr -new -newkey rsa:2048 -nodes -keyout rui.key

Generate the certificate files
openssl req -new -x509 -days 3650 -sha1 -nodes -key rui.key -out rui.crt -subj "/C=US/ST=NH/L=Seabrook/CN=hq-vc.ftsi.lab"

openssl pkcs12 -export -in rui.crt -inkey rui.key -name rui -passout pass:testpassword -out rui.pfx

Basically this will generate 4 files that you'll want to copy to C:\ProgramData\VMware\VMware VirtualCenter\SSL (default location)

Make sure that you save your old certificates, just in case, also when it asks for a key password the default is "testpassword" without the quotes

After you have replaced the vCenter certificates go through the SRM install as we did above and you'll be all set to go. No need to login to the VRMS's and add certificates to the keystores or anything, it just works. Do the tasks in this order;

1. Setup order is VERY specific.

a. Deploy Site “A” VRMS server from VI Client to Site “A”.

b. Open web browser to IP of VRMS server “A”.

c. Generate new SSL Certificate and install.

d. Add settings for VC / DB / etc, BY IP address ONLY.

e. Hit “Save and Restart Service” when setup.

f. On Site “A” VI Client, open “vCenter Solutions Manager”

i. Click VR Management. You should be soon prompted to accept a certificate. Do so, and click the ignore button.

g. Open SRM. Click vSphere Replication for the appropriate site.

i. Wait for a certificate prompt. Accept it.

ii. In about a minute, you should see the VRMS server log in to SRM.

h. Close VI Client for site “A”, open VI Client to site “B”, and repeat prior steps on Site “B”.

i. Connect VRMS servers together.

j. Deploy VR servers on required systems.

k. Add VR servers.

Tuesday, March 27, 2012

Protect yourself against VMware HA Outages

Something that I see out in the field a lot, or at least more than I should, are clusters that have all the defaults. This is a common cause of unexpected outages. HA was developed to help automate the recovery of VM's when they become unavailable. The thing we have to define is what is unavailable. Technically speaking if something is isolated on the network it is unavailable. So that's one feature VMware has built into HA. I'm not going to dig too deep on this because I wouldn't do nearly as good a job as Duncan Epping does on his blog yellow-bricks.com in his HA Deepdive section;

http://www.yellow-bricks.com/vmware-high-availability-deepdiv/

One of the most underutilized option in HA is the ability to control what is used to define host isolation. For this first part lets assume that we're talking vSphere 4.x and earlier. By default host isolation is determined by the hosts ability to simply ping the default gateway from a management interface. Now if you look at the image below it shows the default settings if you just check the HA box on the cluster;

You notice that the default isolation response at the top of the page is set to "shutdown" the VM's on the host. The reason why this is the most common cause of an accidental outage is because the default gateway is commonly not controlled by the guys who manage the VMware environment. Reference the image below;

Imagine that the network team asked you to use the core switch as the default gateway on your ESXi hosts. Well if the network admin does a reboot of the core switch for a code upgrade or some sort of maintenance, guess what happens. The ESXi hosts all think they are isolated and start shutting down VM's. So something that was a four or five minute outage for the switch to reboot and initialize the new code now turns into an hour or more of trying to make sure all the VM's come back up and are power on in the correct order etc. This is obviously something we want to avoid.

How do we fix this? The easiest way to fix this is to identify a secondary and/or tertiary device to use as a host isolation address. What device should we use? Well in the example above we have a few options. I would probably use the management interface on the top of rack switch, the Firewall address, or try to find something in the same rack (like a pair of load balancers or something). This will ensure that if the rack becomes isolated, due to an upstream link failure, your VM's don't just shutdown. If the whole rack is isolated there's no point in shutting down the VM's. Having these three devices would provide the best level of protection we can get. If you plan to do a reboot of the top of rack switch there's no getting around an isolation response and you should disable HA when doing such maintenance.

Implimenting HA Advanced options are quite easy, and that is how we would remedy this issue. We would use a couple HA options in this scenario

1. das.usedefaultisolationaddress - we'll set this to false, so we can define multiple isolation addresses even if we do decide to use the default gateway

2. das.isolationaddress<X> - Where <X> is the numbered entry of the isolation address, so if we had three isolation addresses we'd use das.isolationaddress1, das.isolationaddress2, das.isolationaddress3

To set these values go to the VMware Virtual Infrastructure Client, right click your HA cluster, and choose the HA section, and the advanced options button on the bottom right of the window. We will then fill it out as depicted in the image below;

This will set your cluster to use these addresses to determine host isolation. Keep in mind that if any of these devices IP's change your cluster needs to be updated. Again this applies mostly to vSphere 4.x and earlier there are a few more protection mechanisms in vSphere 5 that I will cover in a later post.

-Brad

VMware VIEW PCoIP optimization

I recently attended VMware Partner Exchange, which is an event that's geared towards partners obviously. We get a lot of great information out of the event and get to interact with a lot of the guys that work on the development teams of a variety of VMware products. I had the pleasure of attending a discussion with the VMware End User Computing (EUC). This was one of the more informative sessions I attended while out there. One of the Senior Consultants, by the name of Chuck Hirstius, from that group in particular did a great job at presenting a bunch of information on PCoIP. If you don't already follow Chuck's blog which can be found at the link below;

http://mindfluxinc.net/

Chuck has developed a few tools that are worth checking out for troubleshooting and managing PCoIP. They are not officially endorsed by VMware, but from my experience with the tools they work great.

First up is the PCoIP Log Viewer - http://mindfluxinc.net/?p=195

Basically this is a Java based tool that you can download to your PC or a Server in the environment.

Note that when you use this tool to monitor a PCoIP session you are pointing the tool at the machine running the PCoIP Server process, which is typically the device you are connecting to, not the thin client or end user device.

The tool will ultimately allow you to watch a PCoIP session and see what is consuming bandwidth on that session, how much latency the session sees, etc. This gives you clues on what might need to be changed, if anything, to optimize PCoIP. Now as Chuck calls out in his blog, most people's first reaction is to say "well isn't PCoIP adaptive" the answer is yes, however by setting some of the defaults we can help the protocol adapt more fluidly. From a consultants perspective this tool has come in handy in trying to fit a bunch of PCoIP sessions down a small pipe, in the most efficient way possible.

So to be completely honest here I haven't used the log parser yet, which Chuck lists in the post above. The concept of the log parser is to be able to take PCoIP logs and collect them so they can be read by the log viewer. The live session capabilities are fantastic, and I can't wait to dig into the parser, because I'm sure it'll provide excellent longer term troubleshooting, or trying to identify root cause of something that's already happened.

Check out Chuck's video regarding usage of the tool;

Just to give a real world example, by default audio in a PCoIP session is allowed to spike to 500kbps. To give a frame of reference a phone call over IP is typically 64k - 80k in size. Now obviously we may not want mono-voice quality, however it certainly doesn't need 500k for audio, in most use cases. In a project I recently worked on we had deployed PCoIP over a WAN link that was relativity small. we had the log viewer up and running watching a test user session with PCoIP defaults selected. We noticed that when a user erroneously picked something in an application the windows error sound played and promptly spiked consuming 350k of bandwidth to shove that audio down the link as soon as possible. One of Chucks first recommendations, which I agree with, is to drop the Max Audio Settings to 100k.

Enter Chuck's second tool, the PCoIP configuration tool (beta) - http://mindfluxinc.net/?p=338

This tool allows you to change PCoIP settings to a particular device using a graphical interface. This is incredibly handy for testing out new settings, prior to deployment. It also allows you to tweak them before rolling them out to end users very easily. Chuck has even included some defaults, so you can use his "WAN" profile as a starting point for testing.

Note: You do have to disconnect and reconnect for these settings to take effect, however a reboot of the VM is not required.

After reading the information above you're probably wondering what settings to change. There is definitely an art to getting this right. There is no chart that says if you see this happen do this and change this setting. A lot of what we deal with in VIEW, or any virtual desktop technology, is related to end user experience. There are things we can change to solve bandwidth or latency constraints but it does have an effect on the end user experience. Our goal as Engineers/Consultants is to make the setting changes not have as minimal an impact as possible. Below is a link to the VMware KB that describes what each PCoIP setting is and what it does;

VMware KB for PCoIP

Lastly we need to know how to change these settings. From an administrative perspective it is possible to set these settings in the registry or with Chucks tool, however that is not the recommended approach. This is something that can be done through Active Directory GPO's. If you don't know this already there are ADM templates available that can be imported into your AD Group Policy Management console for use in deploying PCoIP through a GPO. They are part of the default installation of a VIEW Connection Server and can be found in the C:\Program Files\VMware\VMware View\Server\extras\GroupPolicyFiles. You'll notice other ADM templates here, and all are useful, the one we want is "pcoip.adm". After you've imported the ADM template you can create a GPO and change the settings for any of the objects listed in the VMware KB article above. This will allow you to change the PCoIP settings per desktop pool if you'd like, as long as you put each desktop pool in it's own AD OU.

As discussed above there are a great many settings we can change in PCoIP. The challenge is to figure out which ones are appropriate to change. I believe that the tools discussed above will help us to make informed decisions regarding modification of these settings, and hopefully provide our users with a better end user experience.

-Brad

Monday, March 26, 2012

VMware VIEW - Optimizing Windows 7

A question I often get when doing a VMware VIEW implementation is "What should I do to my Windows 7 image to optimize it for use over the LAN/WAN?". My response is usually to follow the Windows 7 optimization guide for VMware VIEW. Before we get into it to much below is a link to the document;

http://www.vmware.com/files/pdf/VMware-View-OptimizationGuideWindows7-EN.pdf

I must admit the first time I had used this document I skipped right over the "About this Guide" section to try and get right into the meat of things. Please don't make the same mistake I did, take a look at that section, there is invaluable information in it. They lay out a process for installation of the VIEW agents and implementation of the scripts.

When you open the link above you may be wondering where to find the script files. Many of us use an alternative browser like Google Chrome, which has a built in PDF viewer of sorts. The downside of this is when we open a guide like we're discussing here we may not see the attachments on the PDF. For this reason be sure to download the guide and open it with Adobe Reader. You should see something similar to below with an attachment window on the side bar.

You'll notice there are three scripts. The difference between them is related to VMware VIEW profile management, below is there function and how it relates to profile management;

1. CommandsDesktopsReadyForPersonaManagement.txt - Use this script for any parent image you have previously run the "CommandsNoPersonManagement.txt" script on, but would now like to use Persona Managment on.

2. CommandsNoPersonaManagement.txt - Use this script for any parent image you would like to use WITHOUT VIEW Persona Management

3. CommandsPersonaManagement.txt - Use this script for any parent image you would like to use with VIEW Persona Management

The big difference between these scripts is really two services that are either left on for using persona management, or turned off if you are not planning on using persona management.

Basically what you want to do here is save the attachments on the PDF to hard disk and rename them to *.bat. This will turn them into valid batch files for execution on your parent machines. One thing I would highly recommend is to run them in a command windows that you open(as administrator). If you just double click them the commands will run and then the command window will close. If you open a command window, browse to the directory where the batch files are located, and run the appropriate one, the script will execute, but the windows will remain open, and you can analyze which commands were successful and which were not. This allows for better troubleshooting.

-Brad