VMware: vSAN 6.6 not showing all available disks when attempting to claim...

Was going through and attempting to setup new vSAN cluster but noticed that the wizard was only showing 3 of 4 disks from 3 of 4 hosts and 0 disks from another host.  This appears to be by design where the setup wizard will only target disks that have 0 partitions.  Makes sense.

This, however, is not obvious in the setup.

Simply delete any partitions from those disks that you'd like to have vSAN claim.  You can do this enmasse via PowerCLI or the Web Client interface (as pictured below).
[Warning: This is a destructive process so be sure that you know absolutely for certain that you are targeting the correct storage devices.  This is especially true if you plan to script this process.]

Erase Partition in Web Client
The above process would suck if you were doing it against a large cluster, so learn to do it in powershell or some other automated method.

PowerCLI Method:
$TCluster = Get-Cluster TargetClusterName
$TVMHosts = $TCluster | Get-VMHost | Get-View
Foreach ($VMHost in $TVMHosts)
    $ConfigManager = Get-View $VMHost.ConfigManager.StorageSystem
    #Spec defined and left blank to clear partitions
    $Spec = New-Object vmware.vim.hostdiskpartitionspec
    #I'm simply targeting all naa devices and those that state local disk. 
    #Reality is that you'd probably want a more in depth filter on the devices you target. 
    #My case was a new set of hosts, so this worked for me.
    $TargetDisks = $VMHost.config.StorageDevice.scsilun | Where {$_.DevicePath -match "naa." -and $_.LocalDisk -eq "true"}
    Foreach ($Disk in $TargetDisks)
        $ConfigManager.UpdateDiskPartitions($Disk.DevicePath, $Spec)
Side Note:
vSAN claimed disks have a protection mechanism against being erased via above defined method.  Any partitions that it runs into claimed by vSAN will be met w/ an exception of "Cannot change the host configuration"
If you for some reason need to delete those partitions, then you'll like have to try this method:

vSAN: Rebuilding an ESXi host that has vSAN claimed disks...

VMware/Security: Opvizor OpBot, cool, but scary too.

I've posted about OpBot in the past w/ a brief overview on how you can setup and deploy.  It's a very cool and immensely useful tool.  However, I must balance this with security.  Responsibly deployed, it can be a very useful tool.  However, there is a dark side to this from a security management perspective.  It also poses the very real risk for allowing generic internet access from within your datacenter.

First off, OpBot from Opvizor makes it very clear that you should only grant it's integration account read-only access.  You can do 'destructive' PowerCLI commands by passing login info via slack, but also not recommended.  As much as they have created an immensely useful tool, it also is somewhat of a pandora's box.  It's brought to light a security hole that can be difficult to secure at scale.  Currently Opvizor is the only one that I know of that makes this type of appliance, but that doesn't stop the many possible clones of this type of tech.

Basically what's happened is that it's a method in which a malicious VMware admin could deploy said appliance, give it an elevated service account (AD or otherwise) and no one would be any the wiser.  Now to be clear, a VMware admin should never be deploying things into a datacenter w/o a proper change/audit control process.  In the very least, anything deployed should be well documented and known.

NSX helps in this aspect w/ micro-segmentation.  Everything placed into service receives a specific policy and can communicate w/ only what is needed.  However, it'll only help as far as the security is implemented.  If complete outbound internet is open as a 'standard', then you've effectively enabled OpBot or things like it unfettered access.  First knee-jerk reaction is likely blocking Slack connectivity unless specifically enabled for said purpose.  However, this only guarantees to a "Slack", this does not protect from slack clones or the like.

It's not super simple, but here are some thoughts (for VMware solutions specifically):
  1. Audit/Change Control over Identity Management System (Active Directory) and whatnot.
    1. Any new service/shared account created should be immensely scrutinized.
    2. Change Auditor is a pretty good tool for this.
  2. Audit/Change Control to "Roles" in vCenter (Log Insight can help somewhat in this aspect, Hytrust CloudControl would give you a workflow engine in addition to audit capabilities.)
    1. Basically any account granted an 'admin-type' role should be alerted upon w/o an a peer-reviewed change control system.
    2. Any new role implemented should also be scrutinized for scope and alerting/monitoring put in place for 'high-risk' type roles.
    3. Any change to role permissions scrutinized as well.
  3. Audit/Change Control over passwords for 'service/shared' accounts. (Hytrust Cloud Control includes password vaulting for ESXi hosts)
    1. Password Repo such as LastPass/1Password/OneIdentity, etc.
    2. No single or group of people should actually EVER know by memory service/shared account passwords.
    3. Passwords should be changed based upon audit of password repo access when an employee leaves the company.
      • This would hopefully mitigate a time-consuming process of changing all passwords that said employee may or may not have used.
    4. Password Repo should have complete audit trail as well as alerts for specific types of access.
      1. More advanced, you could use the password repo system to change passwords automatically after a 'manual' checkout scenario.
      2. HyTrust does this for ESXi root passwords automatically.
  4. Network Security/Audit/Change Control (Palo Alto App ID Security)
    1. Subscribe to the mantra of trust nothing in or out.
    2. Peer Review all changes.
    3. Access to vCenter via NSX security policies audit/change workflow.
      1. Anything allowed access to vCenter should be audited.
    4. Palo Alto Firewalls can add an extra layer of heuristics type security to block anything not defined as allowable outside of just ports using something like app id.
Minimally, HyTrust CloudControl could mitigate a large amount of risk for a Slack type bot by using its workflow engine, however none of this really matters if you don't have a proper process behind it.  It may also not mitigate proper Identity Management controls.

Bottom Line:
This is a trust problem, however, this is why security, auditing, and change control processes are essential.  It's not a matter of simply disallowing useful tools, such as Opbot, for the sake of security.  It's about being smart and 'knowing' what's happening in your environment so you can implement productive tools to move the business forward all while being secure and safe.

Visual Aid:

VMware: Invalid Configuration for device # when deploying OVF/OVA...

Ran into this message when attempting to import an OVF/OVA to vCenter via Web Client from a Mac.  Not all OVA/OVF's have this issue.

  • Upload and deploy from a Windows system
    • OR
  • Upload and deploy to a local datastore if available.
    • OR
  • Use OVFTool to deploy
    • Example:
      • ovftool -ds=NameofTargetDatastore -n=NameYouWantVMtoBe --acceptAllEulas --net:bridged=NameofDVSorStdPortGroupYouWantVMattachedTo C:\Path\Turbonomic.ova vi://username%40mysubdomain.myrootdomain.suffix@vCenterNameorIP/virtualDatacenterName/host/ClusterName
        • %40 translates the @ symbol for the OVFTool if you need to authenticate using standard AD UPN or SSO domain user.
        • If Linux/Mac, replace C:\Path\Turbonomic.ova with /Path/Your.ova
        • -net:bridged switch is optional and can also be different depending on how the OVF has that parameter defined.
        • Target is Cluster assumes DRS enabled, go one further down and put hostname after cluster if DRS is not available.
    • OR
  • Use Import-vApp cmdlet from PowerCLI
    • Example:
      • $OVAConfig = Get-OVFConfiguration C:\Path\Turbonomic.ova
        • $OVAConfig.NetworkMapping.NAT.Config = "NameofVMPortGroup"
          • This particular setting is VERY specific to the Turbonomic OVA.  Other OVA's may have several other configurations/properties you may need to provide.
      • $TargetCluster = Get-Cluster NameofCluster
      • Import-vApp -Source C:\Path\Turbonomic.ova -VMHost ($TargetCluster | Get-VMHost | Select -First 1) -Datastore ($TargetCluster | Get-Datastore NameofDatastoreYouWant) -Name NameYouWantRegistered -OVFConfiguration $OVAConfig
        • This cmdlet requires a vmhost target, this example shows how you can target a cluster and have it deploy to first host in the cluster.
        • This demonstrates how you can target a datastore that belongs to the cluster you are targeting for deployment by name.
        • Name you want the OVA to be.
        • OVF Configuration specified to be passed.
Specifically ran into this deploying to a datastore backed by a FC array.  It ONLY fails when attempting to deploy from MacOS to an FC backed datastore using the web client.  Targeting a locally backed datastore worked fine.  I could deploy just fine from a Windows systems to that same FC backed datastore.  Seems to be a bug w/ Mac VMware Client Integration Plugin at least w/ 6.0 version.

Turbonomic: Network keeps dying when using static IP...

Deployed a new Turbonomic OVA 5.8.3 for some testing.  Logged into appliance via console, ran 'ipsetup' as instructed w/ 'static' selected.  VM stayed online for about 5 min. before dying.

  1. Assuming DHCP is not an option, you simply need to change the 'bootpromo' entry from 'static' to 'none'  in the /etc/sysconfig/network-scripts/ifcfg-eth0 configuration file.
  2. You may also need to kill the dhcp client via killing network manager
    • systemctl stop NetworkManager
    • chkconfig NetworkManager off
      • "NetworkManager" is case sensitive
    • systemctl restart network
  • OR
  1. You can utilize nmtui to modify system eth0 configuration.
    • If issues persist, utilize above steps.
At some point it was likely that Turbonomic upgraded their OS instance, but failed to take into account a change in the OS' option 'static' being no longer a valid value and has been replaced w/ 'none'.  Seems to affect newer versions of linux OS'.  This will likely be fixed sooner rather than later.  Also Network Manager's DHCP client seems to start because of network manager and kill what existing config there is.

Powershell: How to get REST API data in JSON format rather than XML using invoke-restmethod

I was exploring a REST API interface for an internal tool being built.  Being that I'm so accustomed to powershell, I wanted to explore how I could get data from it.  The Invoke-RestMethod is perfect for this, but I was having issues getting data back in straight json format.  Data kept coming back in ugly as hell xml format by default.

The short answer was that I need to make a hash table to pass to the -Header parameter of the invoke-method cmdlet.  Basically, it looks like this:

$Headers = @{"Accept" = "application/json"}
Invoke-RestMethod -URI "https://myrestapi/endpoint" -Method:Get -Headers $Headers 

Once I did this, I received the data back in json format and powershell automatically captures it as a system.array object.  Making it immensely easier to work with rather than the xml return.  See below pictures as examples of the difference.
Json returned data.

xml returned data
As you can see, the return I received when in json looks like any other object return from something like powercli whereas the xml return is this ridiculous mess.  Not all Rest API endpoints work in the same fashion.  Some will return json by default, but listing the "Accept" = "application/json" in your request header doesn't seem to hurt those that don't unless you want a different type of return.