VMware: vSAN 6.6 not showing all available disks when attempting to claim...

Summary:
Was going through and attempting to setup new vSAN cluster but noticed that the wizard was only showing 3 of 4 disks from 3 of 4 hosts and 0 disks from another host.  This appears to be by design where the setup wizard will only target disks that have 0 partitions.  Makes sense.

This, however, is not obvious in the setup.

Solution:
Simply delete any partitions from those disks that you'd like to have vSAN claim.  You can do this enmasse via PowerCLI or the Web Client interface (as pictured below).
[Warning: This is a destructive process so be sure that you know absolutely for certain that you are targeting the correct storage devices.  This is especially true if you plan to script this process.]


Erase Partition in Web Client
The above process would suck if you were doing it against a large cluster, so learn to do it in powershell or some other automated method.

PowerCLI Method:
$TCluster = Get-Cluster TargetClusterName
$TVMHosts = $TCluster | Get-VMHost | Get-View
Foreach ($VMHost in $TVMHosts)
{
    $ConfigManager = Get-View $VMHost.ConfigManager.StorageSystem
    #Spec defined and left blank to clear partitions
    $Spec = New-Object vmware.vim.hostdiskpartitionspec
    #I'm simply targeting all naa devices and those that state local disk. 
    #Reality is that you'd probably want a more in depth filter on the devices you target. 
    #My case was a new set of hosts, so this worked for me.
    $TargetDisks = $VMHost.config.StorageDevice.scsilun | Where {$_.DevicePath -match "naa." -and $_.LocalDisk -eq "true"}
    Foreach ($Disk in $TargetDisks)
    {
        $ConfigManager.UpdateDiskPartitions($Disk.DevicePath, $Spec)
    }
}
Side Note:
vSAN claimed disks have a protection mechanism against being erased via above defined method.  Any partitions that it runs into claimed by vSAN will be met w/ an exception of "Cannot change the host configuration"
If you for some reason need to delete those partitions, then you'll like have to try this method:

vSAN: Rebuilding an ESXi host that has vSAN claimed disks...

VMware/Security: Opvizor OpBot, cool, but scary too.

I've posted about OpBot in the past w/ a brief overview on how you can setup and deploy.  It's a very cool and immensely useful tool.  However, I must balance this with security.  Responsibly deployed, it can be a very useful tool.  However, there is a dark side to this from a security management perspective.  It also poses the very real risk for allowing generic internet access from within your datacenter.

First off, OpBot from Opvizor makes it very clear that you should only grant it's integration account read-only access.  You can do 'destructive' PowerCLI commands by passing login info via slack, but also not recommended.  As much as they have created an immensely useful tool, it also is somewhat of a pandora's box.  It's brought to light a security hole that can be difficult to secure at scale.  Currently Opvizor is the only one that I know of that makes this type of appliance, but that doesn't stop the many possible clones of this type of tech.

Basically what's happened is that it's a method in which a malicious VMware admin could deploy said appliance, give it an elevated service account (AD or otherwise) and no one would be any the wiser.  Now to be clear, a VMware admin should never be deploying things into a datacenter w/o a proper change/audit control process.  In the very least, anything deployed should be well documented and known.

NSX helps in this aspect w/ micro-segmentation.  Everything placed into service receives a specific policy and can communicate w/ only what is needed.  However, it'll only help as far as the security is implemented.  If complete outbound internet is open as a 'standard', then you've effectively enabled OpBot or things like it unfettered access.  First knee-jerk reaction is likely blocking Slack connectivity unless specifically enabled for said purpose.  However, this only guarantees to a "Slack", this does not protect from slack clones or the like.

Solution?:
It's not super simple, but here are some thoughts (for VMware solutions specifically):
  1. Audit/Change Control over Identity Management System (Active Directory) and whatnot.
    1. Any new service/shared account created should be immensely scrutinized.
    2. Change Auditor is a pretty good tool for this.
  2. Audit/Change Control to "Roles" in vCenter (Log Insight can help somewhat in this aspect, Hytrust CloudControl would give you a workflow engine in addition to audit capabilities.)
    1. Basically any account granted an 'admin-type' role should be alerted upon w/o an a peer-reviewed change control system.
    2. Any new role implemented should also be scrutinized for scope and alerting/monitoring put in place for 'high-risk' type roles.
    3. Any change to role permissions scrutinized as well.
  3. Audit/Change Control over passwords for 'service/shared' accounts. (Hytrust Cloud Control includes password vaulting for ESXi hosts)
    1. Password Repo such as LastPass/1Password/OneIdentity, etc.
    2. No single or group of people should actually EVER know by memory service/shared account passwords.
    3. Passwords should be changed based upon audit of password repo access when an employee leaves the company.
      • This would hopefully mitigate a time-consuming process of changing all passwords that said employee may or may not have used.
    4. Password Repo should have complete audit trail as well as alerts for specific types of access.
      1. More advanced, you could use the password repo system to change passwords automatically after a 'manual' checkout scenario.
      2. HyTrust does this for ESXi root passwords automatically.
  4. Network Security/Audit/Change Control (Palo Alto App ID Security)
    1. Subscribe to the mantra of trust nothing in or out.
    2. Peer Review all changes.
    3. Access to vCenter via NSX security policies audit/change workflow.
      1. Anything allowed access to vCenter should be audited.
    4. Palo Alto Firewalls can add an extra layer of heuristics type security to block anything not defined as allowable outside of just ports using something like app id.
Minimally, HyTrust CloudControl could mitigate a large amount of risk for a Slack type bot by using its workflow engine, however none of this really matters if you don't have a proper process behind it.  It may also not mitigate proper Identity Management controls.

Bottom Line:
This is a trust problem, however, this is why security, auditing, and change control processes are essential.  It's not a matter of simply disallowing useful tools, such as Opbot, for the sake of security.  It's about being smart and 'knowing' what's happening in your environment so you can implement productive tools to move the business forward all while being secure and safe.

Visual Aid: