ESXi 5.5, Cisco B440 M2 with 1TB+, and Cisco VIC 1280 Bug

The other day we noticed some odd behavior from some blades in a somewhat large environment.  The environment is a mix of B200 M3 and B440 M2 blades.  The oddity was random and intermittent disconnections from storage and/or full unresponsiveness in vCenter.  Something that caught my eye was the fact that it was specific to the B440 blades in the environment.  In all actuality, it was spread between multiple sites as well which lead us to believe that it was related to ESXi or the B440 blades.  We ruled out anything with storage and networking because the effects would be more widespread if that were the case.

Me being the VMware guy, I proceeded to look at it from my side of the fence.  Jumping into the logs I found a numerous amount of odd errors on the B440 blades in the vmkernel.log.  Here is a snippet of those errors:

Read More…

vCNS Manager Users and vCenter Group Authentication

Figured I would post this since it didn’t seem that this was documented anywhere.  This is in regards to utilizing a vCenter Group for authentication into the vCNS Manager UI.  Most of the time, with AD credentials, we tend to use the shortname.  For instance, VSENTIAL\james or VSENTIAL\vCNS Admins, is what we would assume would be proper for use.  When adding a vCenter User to the vCNS Manager Users it works with the shortname.  Alas, when using the same method for vCenter Group it doesn’t allow anyone to login who is part of that group.  Come to find out you need to use the FQDN in the group string.  For example, vsential.lab\vCNS Admins.

Just figured I would document this since it isn’t stated in the vCNS 5.5 documentation or anywhere else I could find.  Hope this saves someone from frustration and wasting time!  Enjoy!

SSO High-Availability Single Site and vCenter Linked-Mode

I came across an interesting bug today when deploying a linked-mode vCenter and SSO in High-Availability mode.  The installation of SSO, Inventory Service, and vCenter all went as planned…initially.  During my deployment I decided to reboot the secondary vCenter server.  Once the server came back online the VMware VirtualCenter Service failed to start along with the Management Webservices service.  OMG!  Are you serious????

So being the good engineer I went and looked at the logs to find out why.  Looking at the vpxd.log file I found the following:

2013-11-15T14:16:04.657-08:00 [05048 error 'HttpConnectionPool-000001'] [ConnectComplete] Connect failed to ; cnx: (null), error: class Vmacore::Ssl::SSLVerifyException(SSL Exception: Verification parameters:
–> PeerThumbprint: 12:03:EF:EE:17:10:29:2B:A7:14:20:8E:4E:F6:D3:88:A7:09:5F:19
–> ExpectedThumbprint:
–> ExpectedPeerName: dcamgmtvc.diginsite.net
–> The remote host certificate has these problems:
–>
–> * A certificate in the host’s chain is based on an untrusted root.
–>
–> * self signed certificate in certificate chain)
2013-11-15T14:16:04.657-08:00 [01956 error '[SSO][SsoFactory_CreateFacade]‘] Unable to create SSO facade: SSL Exception: Verification parameters:
–> PeerThumbprint: 12:03:EF:EE:17:10:29:2B:A7:14:20:8E:4E:F6:D3:88:A7:09:5F:19
–> ExpectedThumbprint:
–> ExpectedPeerName: dcamgmtvc.diginsite.net
–> The remote host certificate has these problems:
–>
–> * A certificate in the host’s chain is based on an untrusted root.
–>
–> * self signed certificate in certificate chain.
2013-11-15T14:16:04.657-08:00 [01956 error 'vpxdvpxdMain'] [Vpxd::ServerApp::Init] Init failed: Vpx::Common::Sso::SsoFactory_CreateFacade(sslContext, ssoFacadeConstPtr)
–> Backtrace:
–> backtrace[00] rip 000000018018cd7a
–> backtrace[01] rip 0000000180106c48
–> backtrace[02] rip 000000018010803e
–> backtrace[03] rip 00000001800907f8
–> backtrace[04] rip 00000000006f5bac
–> backtrace[05] rip 0000000000716722
–> backtrace[06] rip 000007f6c0cbddfa
–> backtrace[07] rip 000007f6c0cb795c
–> backtrace[08] rip 000007f6c0ee80ab
–> backtrace[09] rip 000007fb6f3cbaa1
–> backtrace[10] rip 000007fb6f0e1832
–> backtrace[11] rip 000007fb6fb2d609
–>
2013-11-15T14:16:04.658-08:00 [01956 error 'Default'] Failed to intialize VMware VirtualCenter. Shutting down…

As we can see above, it is apparent that there is some type of certificate failure.  Checking the standard things for certificate troubleshooting, I decided to verify a couple of things first.  Have a look at your vpxd.cfg and check the section of the config file.  Make sure it is pointing to the appropriate primary SSO server.  Make sure that your DNS resolution is working both forwards and reverse.  Once you have verified this, do the following:

  1. Go to C:\ProgramData\VMware\SSL
  2. Rename the ca-certificates.crt to ca-certificates.crt.old
  3. Copy the ca-certificates.crt file from the primary SSO server to this folder.  (You will find the file in the same location on your primary SSO server’s filesystem.)
  4. Restart the Inventory Service
  5. Attempt to start the VMware VirtualCenter and VMware VirtualCenter Management Webservices services

If everything was done appropriately, you should see your services come back online.  Come to find out this is a known bug with vCenter 5.5 SSO High-Availability deployments.  Basicly, the certificate that gets put in that directory is not the same certificate as the one from the primary where it should be coming from.  Hope this helps out!

SSO Issue when using Windows Server 2012 AD Identity Source

One of the cool things about doing new installations is the ability to pick up on some of the latest gotchas.  During a recent deployment of a greenfield I came across an oddity when adding the SSO Identity Source for the AD infrastructure.  I went through the normal motions of adding the identity source and everything seemed fine then…BAM…try to login with AD credentials and suddenly received an error that the client was not authenticated to the VMware Inventory Service.

That definitely is not good…empty inventory when trying to get around in vCenter.  I went through the typical motions of troubleshooting and found that it only happened during the AD credential logins.  That leads one to believe that the issue is directly related to SSO and AD authentication, right?  Exactly!  So what happens next?  You run to the AD guys and say, AD authentication is failing between SSO and AD…they say everything is fine (Of course).  When digging into the logs we see the following error in ds.log:

[2013-11-15 11:05:25,593 pool-11-thread-1  ERROR com.vmware.vim.vcauthenticate.servlets.AuthenticationHelper] Invalid user
com.vmware.vim.dataservices.ssoauthentication.exception.InvalidUserException: Domain does not exist: NT AUTHORITY

NT AUTHORITY????  That isn’t a domain.  Next we need to dig a little deeper into another log file.  Let’s take a look at vmware-sts-idmd.log.  We would see a similar error like so:

2013-11-15 11:35:25,326 WARN   [ActiveDirectoryProvider] obtainDcInfo for domain [NT AUTHORITY] failed Failed to get domain controller information for NT AUTHORITY(dwError – 1212 – ERROR_INVALID_DOMAINNAME)

Ok, so we now see this NT AUTHORITY domain reference in multiple places.  So let’s take a look at the SAML strings being passed.  To find this we needed to look into the vmware-identity-sts.log file.  In this file we can search for NT AUTHORITY or the login credentials you were using and we see the following:

<saml2:AttributeValue xsi:type=”xs:string”>vsential.net\Domain Adminsvsential.net\ESX AdminsNT AUTHORITY\Claims Validsaml2:AttributeValue>vsential.net\Denied RODC Password Replication Groupvsphere.local\Administratorsvsphere.local\Everyone

There is the NT AUTHORITY culprit…great!  So now that we found it, what do we do to fix it?  Come to find out the NT AUTHORITY\Clams Valid addition to the SAML token is caused by something new to Server 2012 Active Directory Group Policy.  If you go into your GPO Editor and look in your Default Domain Policy and look at:  Computer Configuration -> Administrative Templates -> System -> KDC -> KDC support for claims, compound authentication and Kerberos armoring.  This policy will be enabled.  From what I have been told, this policy is new to Server 2012 Active Directory.  This particular policy is what is adding that NT AUTHORITY\Claims Valid to the SAML strings.  Disable this policy and refresh the GPO on the vCenter management VMs and VIOLA!!!!  Everything works like normal!

Now this may not be a complete fix but it at least seems complete.  More testing will find if this fix is completely valid but it did fix my environment.  Before disabling this policy be sure to touch base with your local AD gurus and get an impact assessment before doing so.  Hope this helps you out as this took me a while to find and fix.  Good luck!

1 2 3 25  Scroll to top