Earlier this week we were greeted with this awesome message:
webclient-fail-sso.

It’s so descriptive we knew exactly where to start! Okay, yeah, not really. Sarcasm aside, you’d think the culprit would be SSO. I began checking the two SSO servers we have in an HA configuration and they appeared fine. What’s even more strange is the fat clients were all authenticating fine. I started checking logs on the SSO servers and saw several things similar to this:

2015-08-25 23:20:49,538 INFO [ActiveDirectoryProvider] Failed to find user snip@snipPrincipal id not found: {Name: snip, Domain: snip} via ldap search
and
2015-08-26 00:29:37.709:t@21945040:ERROR: ldap simple bind failed. Error(4294967295)

So I assumed it was SSO again, maybe related to the domain we auth against.

Great! So now what?


Well, that’s where things get fuzzy. Andre (@AndreSmith_, http://pshell.info/), Alex (@alexpearson), and I dug through the logs, but we didn’t find anything pointing at a suspect. I opened a ticket with VMware, uploaded five (5) total log bundles, still haven’t heard anything back.

It was an interesting phone call at first because I was told using vCNS (vShield Edge Load Balancer) to load balance the HA’d SSO servers was not supported. I’m going to leave this right here: “Deploying a Centralized VMware vCenter™ Single Sign-On™ Server with a Network Load Balancer Technical Reference”

I digress, so moving on…

Did you check the logs?

We checked log locations outlined in this KB: Location of vSphere Web Client 5.x service logs (2004090)

Also the SSO log here: C:\ProgramData\VMware\CIS\logs\vmware-sso\vmware-sts-idmd.log

This was taken from the Web Client virgo log: C:\ProgramData\VMware\vSphere Web Client\serviceability\logs\vsphere_client_virgo.log
[2015-08-26 09:38:31.329] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.VcServiceImpl Vmomi initialization successful for vc - https://snip:443/sdk
[2015-08-26 09:38:31.329] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Processing peer vc servers - https://snip:443/sdk
[2015-08-26 09:38:31.329] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Peer vc servers processed successfully - https://snip:443/sdk
[2015-08-26 09:38:31.329] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Logging into start up VC: https://snip:443/sdk
[2015-08-26 09:38:31.369] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.VcServiceImpl loginByToken successful on vc https://snip:443/sdk
[2015-08-26 09:38:31.369] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.VcServiceImpl logged in VC with id 3000078
[2015-08-26 09:38:31.369] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.VcServiceImpl Successfully logged in for session: 100025
[2015-08-26 09:38:31.369] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Login successful to start up VC: https://snip:443/sdk
[2015-08-26 09:38:31.370] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Login to peer vc servers - https://snip:443/sdk
[2015-08-26 09:38:31.370] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Peer vc servers for https://snip:443/sdk:
[]
[2015-08-26 09:38:31.370] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Login to peer vc servers completed successfully - https://snip:443/sdk
[2015-08-26 09:38:31.377] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vsphere.client.common.util.VcProxyServiceUtil Registering client proxy for https://snip:9443/vsphere-client/j_spring_security_check
[2015-08-26 09:38:31.381] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vsphere.client.common.util.VcProxyServiceUtil Client proxy already registered for https://snip:9443/vsphere-client/j_spring_security_check
[2015-08-26 09:38:31.381] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vsphere.client.common.util.VcProxyServiceUtil unregistering client proxy with spec: com.vmware.vim.binding.vim.ProxyService.EndpointSpec:
serverNamespace = /vsphere-client,
accessMode = httpsWithRedirect
inherited from com.vmware.vim.binding.vim.ProxyService$EndpointSpec@36118d64
[2015-08-26 09:38:31.385] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vsphere.client.common.util.VcProxyServiceUtil Client proxy was unregistered
[2015-08-26 09:38:31.390] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vsphere.client.security.VimAuthenticationHandler Updated proxy end point to snip:9443
[2015-08-26 09:38:31.390] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vsphere.client.security.VimAuthenticationHandler LinkedVcGroupRegistry login complete 100025
[2015-08-26 09:38:31.390] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vsphere.client.security.VimAuthenticationHandler Authentication successful for session : 100025
[2015-08-26 09:38:31.390] [INFO ] http-bio-9443-exec-25 70000244 100025 200006 com.vmware.vsphere.client.security.VimAuthenticationHandler 6 authenticated VIM session(s).
[2015-08-26 09:38:31.390] [INFO ] http-bio-9443-exec-25 70000244 100027 200006 com.vmware.vise.util.session.SessionUtil Mapping: /vsphere-client: sessionid to -> 100027 with clientId: 200006
[2015-08-26 09:38:31.393] [INFO ] http-bio-9443-exec-25 70000244 100027 200006 com.vmware.vise.security.DefaultAuthenticationProvider Successful authentication handler running for sessionId 100027, clientId 200006
[2015-08-26 09:38:31.393] [INFO ] http-bio-9443-exec-25 70000244 100027 200006 com.vmware.vise.security.DefaultAuthenticationProvider Starting authhandlers for sessionId 100027, clientId 200006
[2015-08-26 09:38:31.394] [INFO ] http-bio-9443-exec-25 70000244 100027 200006 com.vmware.vise.security.DefaultAuthenticationProvider Running initSession for sessionId 100027, clientId 200006
[2015-08-26 09:38:31.394] [INFO ] http-bio-9443-exec-25 70000244 100027 200006 com.vmware.vise.security.DefaultAuthenticationProvider Initializing Session: sessionId 100027, clientId 200006
[2015-08-26 09:38:31.394] [INFO ] http-bio-9443-exec-25 70000244 100027 200006 com.vmware.vise.security.DefaultAuthenticationProvider Retrieving session listeners for sessionId 100027, clientId 200006
[2015-08-26 09:39:16.600] [INFO ] http-bio-9443-exec-16 70000245 100012 ###### org.springframework.flex.servlet.MessageBrokerHandlerAdapter Channel endpoint amf received request.
[2015-08-26 09:39:17.042] [INFO ] http-bio-9443-exec-16 70000246 100012 ###### org.springframework.flex.servlet.MessageBrokerHandlerAdapter Channel endpoint amf received request.

That may not look like anything out of the ordinary, but if you look at a properly functioning SSO server log, it looks like this:

[2015-08-26 14:50:49.007] [INFO ] vc-service-pool-13 70000058 100002 200001 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Login successful to start up VC: https://snip:443/sdk
[2015-08-26 14:50:49.007] [INFO ] vc-service-pool-13 70000058 100002 200001 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Login to peer vc servers - https://snip:443/sdk
[2015-08-26 14:50:49.007] [INFO ] vc-service-pool-13 70000058 100002 200001 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Peer vc servers for https://snip:443/sdk:
[]
[2015-08-26 14:50:49.007] [INFO ] vc-service-pool-13 70000058 100002 200001 com.vmware.vise.vim.commons.vcservice.impl.LinkedVcGroupImpl Login to peer vc servers completed successfully - https://snip:443/sdk
[2015-08-26 14:50:49.017] [INFO ] vc-service-pool-13 70000058 100002 200001 com.vmware.vsphere.client.security.VimAuthenticationHandler Logging in from localhost or same IP. Will not update proxy.
[2015-08-26 14:50:49.018] [INFO ] http-bio-9443-exec-10 70000058 100002 200001 com.vmware.vsphere.client.security.VimAuthenticationHandler LinkedVcGroupRegistry login complete 100002
[2015-08-26 14:50:49.018] [INFO ] http-bio-9443-exec-10 70000058 100002 200001 com.vmware.vsphere.client.security.VimAuthenticationHandler Authentication successful for session : 100002
[2015-08-26 14:50:49.018] [INFO ] http-bio-9443-exec-10 70000058 100002 200001 com.vmware.vsphere.client.security.VimAuthenticationHandler 1 authenticated VIM session(s).
[2015-08-26 14:50:49.020] [INFO ] http-bio-9443-exec-10 70000058 100003 200001 com.vmware.vise.util.session.SessionUtil Mapping: /vsphere-client: sessionid to -> 100003 with clientId: 200001
[2015-08-26 14:50:49.020] [INFO ] http-bio-9443-exec-10 70000058 100003 200001 com.vmware.vise.security.DefaultAuthenticationProvider Successful authentication handler running for sessionId 100003, clientId 200001
[2015-08-26 14:50:49.020] [INFO ] http-bio-9443-exec-10 70000058 100003 200001 com.vmware.vise.security.DefaultAuthenticationProvider Starting authhandlers for sessionId 100003, clientId 200001
[2015-08-26 14:50:49.020] [INFO ] http-bio-9443-exec-10 70000058 100003 200001 com.vmware.vise.security.DefaultAuthenticationProvider Running initSession for sessionId 100003, clientId 200001
[2015-08-26 14:50:49.020] [INFO ] http-bio-9443-exec-10 70000058 100003 200001 com.vmware.vise.security.DefaultAuthenticationProvider Initializing Session: sessionId 100003, clientId 200001
[2015-08-26 14:50:49.021] [INFO ] http-bio-9443-exec-10 70000058 100003 200001 com.vmware.vise.security.DefaultAuthenticationProvider Retrieving session listeners for sessionId 100003, clientId 200001
[2015-08-26 14:50:49.025] [INFO ] session-init-pool-17 70000058 100003 200001 com.vmware.vsphere.client.security.sso.SsoTokenLifetimeManager Registering session : 100003
[2015-08-26 14:50:49.027] [INFO ] session-init-pool-17 70000058 100003 200001 com.vmware.vsphere.client.security.sso.SsoTokenLifetimeManager sessionId 100003, clientId 200001, Token expiration time: Fri Sep 25 14:50:47 BST 2015
Token renewal scheduled approximately for: Fri Sep 25 14:45:47 BST 2015

That’s a lot of log!

At first glance those two look about the same, right? The key takeaway is the last few lines containing “Registering session : 100003”, “sessionId 100003, clientId 200001, Token expiration time: Fri Sep 25 14:50:47 BST 2015” and “Token renewal scheduled approximately for: Fri Sep 25 14:45:47 BST 2015”

After those lines the successful log will continue on to show communication with the inventory servers.  That turned out to be some red herring.

Andre stumble across this directory: C:\Program Files\VMware\Infrastructure\vSphereWebClient\server\serviceability\logs\byUser

That shows logs for each user who logs into the web client. It’s also not documented in the KB article I referenced above. Now, here’s where Andre found the problem, take a look at this log:
[2015-08-27 07:59:07.076] [ERROR] session-init-pool-19 70000059 100004 200001 com.vmware.vise.vim.extension.VcExtensionManager Error while downloading or deploying package com.vmware.vcops from https://snip/vcops-ngc.zip, check the URL and the root exception. java.lang.NullPointerException: null
at java.util.AbstractCollection.addAll(Unknown Source)
at com.vmware.vise.vim.extension.VcExtensionManager.addPackages(VcExtensionManager.java:350)
at com.vmware.vise.vim.extension.VcExtensionManager.sessionStarted(VcExtensionManager.java:156)
at com.vmware.vise.security.DefaultAuthenticationProvider$1.run(DefaultAuthenticationProvider.java:484)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

[2015-08-27 07:59:07.317] [WARN ] session-init-pool-19 70000059 100004 200001 com.vmware.vise.util.session.SessionUtil IllegalStateException in getClientId because session was already terminated.

So we found a fix!

So see the URL there where it’s trying to download the vcops plugin? As it turns out the plugin had become corrupt and was causing the web client to fail, NOT SSO. After unregistering the vcops plugin, everything began to work again.

In the event you have a situation where the webclient is saying SSO is failing, and SSO is actually functioning, you may want to check one of these logs and possibly unregister all plugins.

I can’t take the credit for this, it was all Andre, but I wanted to share so I can hopefully help someone else in the same boat we were!