vCenter UCS Alarm: IPMI SEL, SEL_FULLNESS

Summary:
This alarm means that server's CIMC system event log has filled up.  Below you will find the steps to clear this type alarm.

PreReq:
This assumes you are utilizing B-series UCS servers.  C-series may be slightly different in practice.

Resolution:

  1. If you're vCenter is configured w/ default alarms, you'll likely see something like pictured below in vCenter under the hardware status tab:
  2. To clear this alert, you'll need to empty the SEL Logs in UCS of the blade related to your service profile.  You are not likely to find SEL Logs as part of the service profile.

  3. Once you've opened the related blade, select the SEL Logs tab.  Review and/or export the logs so you simply do not clear something that may need to be investigated.  Once done, you can safely clear the logs:
  4. Once the SEL Logs have cleared, the alert in vCenter should reset to green in a few minutes.

How to: tcpdump UCS Management traffic.

Rather than regurgitate all the information whole here is the skinny:

  1. SSH into your UCS chassis (aka primary fabric interconnect)
  2. connect nxos
  3. ethanalyzer local interface mgmt limit-captured-frames 2000 write volatile:/mycapture.cap
    1. ethanalyzer is the command
    2. local is default
    3. interface so we can tell it where we want to capture packets from.
    4. mgmt is the one I'm interested in
    5. limit-capture-frames is there because it limits to 10 by default and is way too fast when troubleshooting.
    6. write to output a capture file located in volatile memory (deletes when FI is rebooted.)
  4. Exit
  5. connect local-mgmt
  6. cp volatile:/mycapture.cap scp://username@linuxservername/somepath
    1. Where 'scp' is defined, can be ftp, sftp, tftp, volatile, or workspace as well.
    2. The capture file can be read in applications like wireshark.
This helped me figure out my LDAP Authentication issues.

Full article and explanation of how to do what I've outlined above was found here:

Thanks to Jeff for his write-up, otherwise I would've never gotten anywhere with TAC.

Command to search for LDAP related commands.
ethanalyzer local interface mgmt capture-filter "tcp port 389" limit-captured-frames 2000 write volatile:/mycapture.cap

or if you're using LDAPS you need to scan port 636, although not sure if it will be useful data.

ethanalyzer local interface mgmt capture-filter "tcp port 636" limit-captured-frames 2000 write volatile:/mycapture.cap

UCS bug around Active Directory

Update:
UCS 2.1 addresses this particular issue.  Bug ID: CSCth96721

Summary:
Found an interesting UCS bug on 2.0(3b).  May be resolved in 2.0(4d), but have not tested yet.  This particular problem only manifests itself if your Active Directory tree structure is elaborate and causes a user account's distinguishedName to be longer than 128 characters.

Detailed:
Essentially UCS queries Active Directory w/ samAccountFilter, it receives the results of the query.  It then makes a bind call against the DN using the results it received.  The problem is the DN bind call variable on the UCS side seems to be limited to 128 characters which it then truncates the information when it makes the bind call.

Workaround:
The only real workaround is to move the affected account to another a higher level OU to shorten it's distinguished name.

Powershell:
You can use powershell to determine the length of your distinguished name by utilizing the Quest ActiveRoles PS snapin.
(Get-QADUser UserName).DN.Length

Jing, IIS, SWF, and Powershell fun

I've been using Jing to record short tutorial videos and uploading them to my IIS server's directory.  To view or share them I would have to create a simple HTML file.  I decided to automate this process by simply having the formatted HTML file generated when I threw a swf file into the directory using powershell.


# Here is the local directory on the IIS server where I'm throwing my swf files.
# This script is meant to run as a schedule task every 5 minutes or more if you like.
$VidPath = "D:\inetpub\wwwroot\videos"

# Here I'm querying for all the swf files in the directory.
$SWFFiles = get-childitem $VidPath | ? {$_.Extension -match ".swf"}

# This is where I begin to look @ each swf file and check whether they have an associated html file.
foreach ($SWFFile in $SWFFiles)
{
$HTMLCheck = $null
$HTMLCheck = Get-ChildItem $VidPath | where {$_.basename -eq $SWFFile.basename -and $_.Extension -ne $SWFFile.Extension}
# If I did not find an associated html file, this is where I would create one.
If ($HTMLCheck -eq $null)
{
$HTML = "<object width=`"100%`" height=`"100%`"> `
<param name=`"movie`" value=`"./$($SWFFile.name)`"> `
<embed src=`"./$($SWFFile.name)`" width=`"100%`" height=`"100%`">`
</embed> `
</object>"
$HTML | Out-File "$($VidPath)\$($SWFFile.Basename).html" -Encoding ASCII
}
}

I use this script in conjunction w/ my iPad directory script for fun.


Powershell, WMI, Local Computer Description, and value out of range error...

Summary:
Needed to update local computer description on servers that I own.  Easy peasy w/ powershell, or so I thought.

PreRequisites:

  1. Powershell 2.0+
  2. Quest.ActiveRoles.AdManagement Snapin
  3. SysInternal PSExec
Details:
Windows Server 2008 and 'Vista' based kernel systems seem to have some kind of WMI bug.  Searching the web has turned up only a mention of something regarding the use of "ItemIndex".  I'm @ a loss.  This script will work for 2008 R2 systems and the only work around appears to make use of sysinternals psexec cmd to call out the net config command on the local system.

Add-PSSnapin quest.activeroles.admanagement
$servers = Get-QADComputer -Name "someprefix*"
$Description = "Something I want to insert" 

foreach ($computer in $servers)

{
# Simply a check to see whether the system is active or not.
$Ping = Get-WmiObject Win32_PingStatus -Filter "Address = '$($computer.name)'" | Select StatusCode
If ($Ping.StatusCode -eq 0)
{
# This will work for all Windows versions.  
# I'm calling the live version, but for speed you may want to download it to your local system.
$computer.Name
\\live.sysinternals.com\tools\psexec.exe \\$computer.name net config server /srvcomment "$($Zone)"
# This will work for 2008 R2 systems and above
# It will return a "Value out of range" error on 2008 systems.
Set-WmiInstance -ComputerName $computer.Name -Path Win32_OperatingSystem=@ -Arguments @{description=$Description}
}
else {
Write-Host "$($computer.name) unreachable"
}
}

Value out of Range issue:
This error really bothers me even if it is for a relatively small problem.  Here is what occurs:
  1. powershell returns the expected data in the description field.
  2. When attempting to modify the value locally and remotely returns a "value out of range" error.
  3. The type is string and I don't see any reason why it shouldn't update.
  4. Only appears to affect 'Vista' based Operating Systems, such as Windows 2008 Server.
It's most definitely a WMI problem, but I'm rather stumped.

Configure ESXi Scratch Config w/ Powershell/PowerCLI and other advanced settings...

Summary:
Needed to script configure all my 100+ ESXi hosts w/ a scratch location.  Having a permanent scratch location configured is helpful when an error such as a purple screen of death (PSOD) occurs on ESXi.  It is not a requirement, but definitely a best practice.

PreRequisites:
  1. Powershell 2.0 +
  2. PowerCLI 5.1 +
  3. vCenter 4.1 +
  4. Local or Shared Datastore
    • Local is easy if you standardize on naming of a local datastore.  
      • I'll focus on this in my script example.
    • Shared Datastore essentially accomplishes a similar goal of a remote syslog server, you'll want to be sure to separate logs to their own individual directory.
      • Scaling may become an issue unless you focus these shared datastores among clusters rather than all hosts.
Details:

UCS SSH LDAP Login Syntax

Summary:
Login syntax using LDAP to Cisco Fabric Interconnects.

Linux/Mac:
Syntax:
ssh ucs-authdomain\\username@UCSIPAddressORDNSName

Example:
ssh ucs-tech.zsoldier.com\\zsoldier@ucs.tech.zsoldier.com


Windows/Putty:
Syntax:
ucs-authdomain\username

Example:
ucs-tech.zsoldier.com\zsoldier






































UCS F0401 <-- Really annoying error

Summary:

It had these ‘faults’ showing up on Chassis.  The error was rather vague, but one thing that hooked me was the idea that there was a ‘discovery policy’ incorrectly configured.

Resolution:

  1. Find and change the discovery policy:
    • UCS_Chassis_Discovery_Policy
  2. Decommission the Chassis
    • UCS_Decommission
  3. Recommission Chassis
    • UCS_Recommission

Voila, those errors should go away if you selected the correct discovery policy for your configuration.

Posting last known good backup to vCenter Custom Attribute (NBU 7.5)

Thought this was kind of a cool function of Netbackup 7.5.  It has the ability to post last good backup date to a vCenter custom attribute.  Here is the article:
http://www.symantec.com/business/support/index?page=content&id=HOWTO71014

The short of it is to simply add the extensions (Register extension, Unregister extension, Update extension) permissions to your NBU role on top of those perms listed here:
http://tech.zsoldier.com/2011/06/netbackup-perms-and-vsphere-4x.html
and
configuring vmware advanced attributes in netbackup 7.5+.
http://www.symantec.com/business/support/index?page=content&id=HOWTO70998#v62458854
<-- Pointed out by Michael in comments. Cause I forgot to add it.  -->



WinRM, https, Kerberos, and vCO Powershell Plugin 1.0.1

Summary:
Pain in my arse.  I was able to make it work this way, whether this is the correct way to do it is most definitely up for debate.  I started writing this on w/ vCO PS Plugin 1.0, so some things might need work.  I welcome corrections.

Details:
  1. WinRM by default only allows users that are members of the administrators.
    • See here how to add additional users
    • The only way I’ve been able to make this work in Orchestrator is if the service account I’m using is a member of the administrators group on the powershell remote host.
    • It works via standard WinRM or Powershell so a bit puzzled as to why I get access denied errors from vCO.  Still researching...  :-/
  2. Setup IIS
  3. Generate CSR from IIS
  4. Import CA generated CSR
  5. IIS Website -> SSL Settings -> Edit Bindings -> https://  -> Select imported SSL cert.
  6. Command Prompt (not powershell):
    • winrm quickconfig -transport:https
    • winrm set winrm/config/client @{TrustedHosts=”NameorIP of VCO host”}
    • winrm set winrm/config/service/auth @{Kerberos=”True”}
  7. Assuming you are using the vCenter Orchestrator virtual appliance:
    1. Log into vCenter Orchestrator local console as root
      • Default password for root is “vmware”
      • SSH is disabled by default, so it you must login via local console.
    2. You need to create a krb5.conf file in the following directory:
      • /opt/vmo/jre/lib/security
      • vi krb5.conf
      • Sample krb5.conf:
        • [libdefaults]    
            default_realm = SOMEDOMAIN.COM    
            udp_preference_limit = 1
          [realms]    
            SOMEDOMAIN.COM = {       
            kdc = kdc1.somedomain.com       
            default_domain = somedomain.com    
          }
          [domain_realms]
            .somedomain.com=SOMEDOMAIN.COM
            somedomain.com=SOMEDOMAIN.COM
        • You can enter multiple kdc servers (in Active Directory, usually the same as a domain controller)
          • kdc = kdc1.somedomain.com
          • kdc = kdc2.somedomain.com
        • krb5.conf is CASE SeNSITIVE!
        • If you use the [domain_realms] section, your domain names will translate into UPPERCASE if using the format above.
      • Once you’re done editing, hit “ESC”, “:”, “wq”, Enter
      • Change ownership/perms on krb5.conf file:
        • chown vco:vco krb5.conf
        • chmod 640 krb5.conf
    3. Restart vCenter Orchestrator Appliance.
      • You can probably restart a specific service, but I’m unsure as to which one.
Other helpful links:

ESXi Hosts Timing Out During HA Cluster Election

[Guest Post by Jeremy Reiman]

Summary:
  • ESXi hosts timing out during HA cluster election phase after cluster master is selected.  The HA Agent status in vCenter shows as unreachable on all hosts that timed out.

Symptoms:
  • ESXi host fails to enable HA Agent and shows error "operation timed out".
  • Error message "[ClusterManagerImpl::IsBadIP] x.x.x.x is bad ip" showing in /var/log/fdm.log on ESXi hosts.
  • TCPdump capture from ESXi host shows packets destined for IP address of other ESXi host are being sent to the MAC address of the firewall.  These should be going to the MAC address of the ESXi host management interface since both reside on the same VLAN.

Configuration Info:
  • ESXi host managment interfaces are on the same VLAN.
  • ESXi 4.1 +
  • Firewall is a Cisco ASA5500 running IOS 8.2(2).
  • Firewall Switch Module running 3.2(5) is also applicable.
  • All network ports are open on the firewall between the vCenter server and the ESXi hosts.
Resolution:
  • Disable ProxyARP on the ESXi host management VLAN.  The Cisco ASA5500 command to disable proxyarp on a VLAN is “sysopt noproxyarp <vlan_interface_name>”.

Logical Configuration Diagram Example:

Port Usage Details:

Get a powershell code signing cert from a Microsoft CA.

Summary:
This took me a little while to figure out.  Here are the basic steps.  This is so you can sign any scripts using your locally available domain CA.  By doing this, you can sign scripts and they will be authorized for use on your local domain.  This is so you can keep your powershell execution policy as remotesigned rather than unrestricted.

Replace SSL Cert Emulex OCM for VMware with a signed one.

SSL Certs are something of an enigma that have always eluded my proper understanding.  So I took it upon myself to figure this one out.
Summary:
Replace default OCM cert w/ one that is CA signed.  Click below to continue.

vCenter Operations 5.x vApp LDAP Configuration

Summary:
I happened to see someone searching for this and coming across my previous post on it’s wonkiness, so I figured I’d make a post showing how I went about configuring it w/ an Active Directory domain.  This only applies to the vcops-custom page.  The standard vCops-vsphere page uses vCenter’s authentication via role permissions.
Details:
  1. Log into your vcops-custom page as an admin. (example http://yourvCOPsUIvmIP/vcops-custom)
  2. Select Admin –> Security
    • Admin-Security
  3. Select the Import from LDAP button
    • ImportfromLDAP
  4. Select the add button
    • ImportUsersDialog
  5. Now see the screenshot below to see how to fill out the configuration screen:
    • ManageLDAPHost
  6. Below details how the auto-sync works:
    • ManageLDAPHost-2
  7. You’re pretty much done @ this point.
Auto Sync occurs once every hour, so once you configure it, it’ll take approx. an hour before users are granted access.  The other caveat is that nested groups are not supported.  Users must be direct members of the security group you setup w/ Auto Sync.
Feel free to ask questions in the comments.  I’m always keeping an eye on those.

Symantec and vExpert event

Many moons ago back in April, I and several other vExperts were invited to Symantec HQ for an executive briefing.  What I thought was just going to be a sales pitch, turned into a deep discussion around virtualization philosophy and technical discussion around Symantec’s many products.  Hit the link to read more if you are interested in my experience.

Uninstall HA agent manually

This is something you’ll likely have to do on a rare occasion.  In case you do though, here is the info need to do so.

  1. Disable HA on cluster.
  2. SSH into ESX/ESXi box

Run the following:

# Stops management services

/sbin/services.sh stop

# Runs uninstaller script

/opt/vmware/uninstallers/VMware-aam-ha-uninstall.sh

# Sometimes has problems removing the below directory, so we help it.

rm -rf /opt/vmware/aam

# Restarts management services

/sbin/services.sh start

RSA and VMware View iPad App

One of the nifty things about the current VMware View iPad App is it's ability to import a RSA token.  Unfortunately, the documentation on how to do this is a bit scarce.  These steps may work for Android too, but I don't have an Android tablet to test with.

PreReqs:

  1. RSA Server 7.1 SP4 <-- This is what I tested against.
  2. View 4.5+ w/ RSA enabled.
  3. VMware View iPad application
Simply go to your RSA self-service page and request a new token.  If it's enabled you should have an option like this:
You'll want to select "I need a specific software token" then select "RSA SecurID Token for iPhone and iPAD/iPOD"

Once you or your RSA admin approve your request, you should get a link and activation code that looks something like this:

Joe, your new or additional software token request has been approved with the following comments from your administrator:
RSAAdmin: approved
Please ensure that the RSA SecurID application is installed on your device before attempting to import your software token.
Download the SecurID Application: com.rsa.securid://ctkip?url=https://yourRSAServer:7004/ctkip/services/CtkipService

How To Import Your Software Token ( true ) Please follow the instructions provided by your administrator to import a token using the following information:
Link: https://yourRSAServer:7004/ctkip/services/CtkipService
Activation Code: 0000000000000

To import the software token into your iPad View app, you can copy the simply change the link that is prefixed w/ com.rsa.securid to viewclient-securid.  So the link would look something like this instead:

When you type/copy/paste this link into safari, it should open up the view ipad client and ask for you activation code.

Once done, you will be able to simply type your PIN for RSA credentials.

vExpert Gift!

Pretty sweet gift from @VMWare.

Photo Apr 02, 1 56 21 PM

vCOps Enterprise 5 vApp LDAP bug? (One or more users already exist and haven’t been imported)

Summary:

I had been having issues w/ our deployment of the  vCenter Operations vApp.  The Web GUI interface has two pages, https://vCopsServerName/vcops-vsphere and https://vCopsServerName/vcops-custom.  It seems vcops-vsphere simply uses vCenter privileges to determine whether you can login and what you can view.  vcops-custom however does not and has a separate set of permissions it uses to determine a user’s access authority.  They both however utilize the same useraccount table in the postgres database.

Workaround:

Greenshot_2012-03-26_11-00-33

This KB contains the steps needed to workaround the LDAP import problem:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2013440

Step 2 was incorrect as of this post date, it should read as follows:

# su postgres
# psql -d alivevm

I’ve let VMWare and @VMWareKB know of the typo.  So it should get corrected.

Details:

I found that when I login into vcops-vsphere, it uses my permissions on VC and creates a user object in the useraccount table on the analytics VM database labeled VC User if it doesn’t match one already found in the table. Although it creates an entry in the useraccount table it does not associate to an ‘account group’ in vcops-custom to allow access. These accounts even though listed, do not show up under ‘Not Grouped’ Account group in the user management section of the vcops-custom page.

This will likely become a larger issue as most users would log into the vcops-vsphere page first, create a crap entry in the useraccount table then LDAP import based on groups would have problems creating an authorized LDAP entry.  This can be somewhat mitigated if users log into the vcops-vsphere using their sAMAccountName instead of the UserPrinicipalName(UPN).  Then no conflicts should arise in the useraccount table. 

Should they login w/ their UPN into vcops-vsphere prior to the import job or that user’s inclusion into an LDAP group, then this issue will likely arise assuming that’s how you configured your LDAP import.  The only recourse that I’m aware of is following the steps detailed in the KB.

Failed to deploy ovf package: Operation Timed Out

Summary:
I’ve found this can occur when you attempt to deploy to vmfs w/ formatted blocks not equal to 1MB.  This only applies to vmfs 3.33 and earlier.  vmfs 5 or vSphere 5 formatted datastores should not see this issue as they are all formatted in 1MB block sizes.
Workaround:
Deploy OVF to a 1MB block sized datastore.
Side Note:
I’m wondering if deployment fails because the vmdk’s were originally created on 1MB datastore’s?

CD-Rom connected to another client (VMWare vSphere ESX)

Summary:

VM will not vMotion because CD-rom is detected as mounted by another user.  Option to disconnect or remove CD-rom is unavailable/grayed out under settings of VM.

CDROM connected

Resolution:

Open the VM’s Console, select the CD-rom drive icon and select disconnect.

Disconnect CDROM

As obvious as this seems, I found myself chasing a rabbit trying out methods to fix this issue.  This method worked in vSphere 4.1 Update 1.  Later revisions should too, not sure about earlier versions.

PowerCLI: UserVars.CIMoemProviderEnabled, changing to a value of 1 (or 0)

Summary:
This value appears after installing the Dell OMSA vib for ESXi 4.1.  Tried changing this value to 1 using PowerCLI proved a bit more difficult than I originally thought, even cheating w/ Onyx.
Example:
Using this:
$changedValue = New-Object VMware.Vim.OptionValue[] (1)
$changedValue[0] = New-Object VMware.Vim.OptionValue
$changedValue[0].key = "UserVars.CIMoemProviderEnabled"
$changedValue[0].value = 1
$_this = Get-View -Id 'OptionManager-EsxHostAdvSettings-00000'
$_this.UpdateOptions($changedValue)
I’d get this ‘useful’ error:
Exception calling "UpdateOptions" with "1" argument(s): "A specified parameter was not correct.
"
At line:1 char:21
+ $_this.UpdateOptions <<<< ($changedValue)
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : DotNetMethodException


Solution:
Apparently the ‘value’ property needs to be declared as an int64 type.  By default, Powershell assumes the value to be a ‘string’ type.  Below is what will work:
$changedValue = New-Object VMware.Vim.OptionValue[] (1)
$changedValue[0] = New-Object VMware.Vim.OptionValue
$changedValue[0].key = "UserVars.CIMoemProviderEnabled"
[int64]$changedValue[0].value = 1
$_this = Get-View -Id 'OptionManager-EsxHostAdvSettings-00000'
$_this.UpdateOptions($changedValue)

To determine what type of value the 'option' is looking for, you can query it to find out, like so:
($_this.setting | where {$_.key -eq "UserVars.CIMoemProviderEnabled"}).value.gettype().name


Above will return the 'type' of 'value' this particular setting is looking for.

Side Note:
How would you determine host and that obscure ‘OptionManager-ESXHostAdvSettings-0000’ object to work with and what host that references?  Here is how you can get to that object in the simplest fashion:
$MyESXHost = Get-VMHost MyESXHost
$MyESXHost.ExtensionData.ConfigManager.AdvancedOption
# Using examples from above, it can also be wrote out like this:
$changedValue = New-Object VMware.Vim.OptionValue[] (1)
$changedValue[0] = New-Object VMWare.Vim.OptionValue
$changedValue[0].key = "UserVars.CIMoemProviderEnabled"
[int64]$changedValue[0].value = 1
 
$_this = Get-View -ID $MyESXHost.ExtensionData.ConfigManager.AdvancedOption
$_this.UpdateOptions($changedValue)