Hello! Long time, no scripting! I’ve been blowing through VCF, deploying, redeploying, and built some scripts to help me with this. Sharing is caring, read on to see what I’ve done…
Hello! Long time, no scripting! I’ve been blowing through VCF, deploying, redeploying, and built some scripts to help me with this. Sharing is caring, read on to see what I’ve done…
Today I am midway through setting up my lab and realized the reason VMware Cloud Foundation (VCF) is failing is because I set the wrong password in my JSON file for the root account on my vCenter appliance.
No big deal, right? Just SSH in and change it. I tried, and got this:
1 2 3 4 |
New password: BAD PASSWORD: it is based on a dictionary word passwd: Authentication token manipulation error passwd: password unchanged |
The bypass was actually easy. Presumably you’re already SSH’d in as root, so you just need to edit /etc/pam.d/system-password
1 2 3 4 5 6 7 8 |
# Begin /etc/pam.d/system-password # use sha512 hash for encryption, use shadow, and try to use any previously # defined authentication token (chosen password) set by any prior module password requisite pam_cracklib.so dcredit=-1 ucredit=-1 lcredit=-1 ocredit=-1 minlen=6 difok=4 enforce_for_root password required pam_pwhistory.so debug use_authtok enforce_for_root remember=5 password required pam_unix.so sha512 use_authtok shadow try_first_pass # End /etc/pam.d/system-password |
Remove enforce_for_root from the first line with pam_cracklib.so. Save the file, no need to restart any services, and retry passwd.
1 2 3 4 |
New password: BAD PASSWORD: it is based on a dictionary word Retype new password: passwd: password updated successfully |
After that, I re-added enforce_for_root to the file and clicked RETRY back in VCF and all things are happy once again.
Create a local user in the NSX Manager’s CLI, then use the API to grant CLI privileges to that user.
Here’s how using a linux machine:
ssh admin@[nsxmanagerIP]
enable
config t
user vrops-readonly password plaintext notrealpassword
user vrops-readonly privilege web-interface
Log out of the NSX Manager (type exit) and stay logged into the linux machine.
Create cli-auditor.xml that contains this (replace brackets with greater/less than):
[?xml version="1.0" encoding="ISO-8859-1" ?]
[accessControlEntry]
[role]auditor[/role]
[resource]
[resourceId]globalroot-0[/resourceId]
[/resource]
[/accessControlEntry]
Add the user as an auditor in the NSX Manager as a CLI user:
curl -i -k -u 'admin:password' -H "Content-Type: application/xml" -X POST --data "@cli-auditor.xml" https://nsxmanagerip/api/2.0/services/usermgmt/role/vrops-readonly?isCli=true
Add your domain/vCenter user as an auditor in the NSX Manager (NOT as a CLI user):
curl -i -k -u 'admin:password' -H "Content-Type: application/xml" -X POST --data "@cli-auditor.xml" https://nsxmanagerip/api/2.0/services/usermgmt/role/[email protected]?isCli=false
While building a new environment for my lab, I ran across an interesting thing yesterday.
I looked at my cluster’s VSAN health and saw this error:
It’s complaining that my hosts don’t have matching Virtual SAN advanced configuration items.
If you click on that error, you’ll see at the bottom where it shows comparisons of hosts and the advanced configurations:
It shows VSAN.DomMaxLeafAssocsPerHost and VSAN.DomOwnerInflightOps as being different between a few of my hosts. Looking at the image above, you’ll see node 09 has values of 36000 and 1024, respectively, while the other nodes 10-12 show 12000 and 0.
I immediately went to the host configuration advanced settings in the web client, searched VSAN and don’t see either of those. I even checked through PowerCLI and can’t see those:
Have you ran into one of these errors before:
1 2 3 4 5 6 7 8 9 |
[exec] AxisFault [exec] faultCode: ServerFaultCode [exec] faultSubcode: [exec] faultString: fault.drextapi.fault.ConnectionLimitReached.summary [exec] faultActor: [exec] faultNode: [exec] faultDetail: [exec] {urn:srm0}SrmFaultConnectionLimitReachedFault:<connectionLimit>10</connectionLimit> [exec] fault.drextapi.fault.ConnectionLimitReached.summary |
Or
1 2 3 4 5 6 7 8 9 10 11 12 |
[exec] AxisFault [exec] faultCode: ServerFaultCode [exec] faultSubcode: [exec] faultString: dr.fault.SessionLimitExceeded [exec] faultActor: [exec] faultNode: [exec] faultDetail: [exec] {urn:srm0}MethodFaultFault:<vim25:reason>Invalid fault</vim25:reason> [exec] dr.fault.SessionLimitExceeded [exec] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [exec] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) [exec] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) |
Or in the GUI:Lost connection to remote SRM server. Unable to login. The maximum number of SRM users has been reached.
RJ, from RJApproves.com, & I had been plagued by these messages for weeks, maybe even months. Well, we finally got it all figured out!
On May 25th, I published this post covering some scenarios on how to use Site Recovery Manager & Active Directory. Michael White from VMware responded with some good info. He had an awesome suggestion of using a script to cold clone a DC daily to use for testing.
Let’s take a look at some ways we can get this done:
I’ve been reading a lot around VMware’s Site Recovery Manager and considerations surrounding Active Directory. Most of what you will read says ‘NEVER’ protect AD with SRM, only use native AD replication, especially since SRM & vCenter at your Recovery Site require AD to be running anyway.
But what if you have multiple domains for different uses? This is where the lines become blurred. Think about this for a second:
Now here is where I say ‘why wouldn’t you protect AD with SRM?’ In a true disaster, the protected site is gone, no AD exists anywhere, so using SRM to bring them up on the recovery site makes sense. Is my logic flawed?
However, if I had my application AD living at both sites, using native replication, I agree 100% in not including your Domain Controllers in your SRM Recovery Plan. This leads to my concern…
To me, the only way to really test your DR plan (in this instance, your SRM Recovery Plan) is to not have anything different between them.
I recently ran into an issue when installing SRM and thought I’d share. I didn’t get a screenshot, but the error was something like this:
Failed to Initialize – dbmanager could not initialize vdb connection – odbc error
If you click skip from there, it’ll fail to create the tables, and eventually get to the point where you’ll have to roll back.
As it turns out, it was due to a c0mp73x”P@s$w0rd! that caused the problem. I’m not sure what characters killed it, but going to a less complex pAs5w0rd worked fine. ODBC worked fine, user & permissions were set up properly, it just came down to SRM not being able to handle the special characters. What’s strange is a similarly complex password works for vCenter.
Hope this helps, have fun out there!
I recently had this problem, but forgot to take a screenshot for the blog, sorry guys.
I was patching an HA/DRS cluster using VUM and none of the VMs would migrate off one specific host. The error it gave was “A general system error occured: Failed to start migration pre-copy Error 0xbad010d. The Esx host failed connect over the VMotion network”.
So you’re still using ESX 3.5 and need to patch it manually? Bummer, I know, I’m in that boat right now, or was. I ran “esxupdate -a query” to find out the latest patch and saw, “ESX Server 3.5.0 Update 4”. Then went to VMware’s downloads site to download Update 5a, and when it prompted me to download all dependencies, I did.
What did that give me? Nineteen (19) bundles/depots/zip files, one of which was ‘ESX350-Update05a.zip’