Posts

VMware | AVS: Content Library or Non vCenter objects on VSAN produces unassociated but valid objects

Summary: Creating or Subscribing to a Content Library on vSAN is typical practice, but the annoying side effect?  Objects created on vSAN datastore that show up as "unassociated" objects and guess what, they simply inherit the Cluster's default VSAN policy at time of import.  Yeah, dumb right?  Also, by vSAN Operations Guide Standards section 6721 subsection 4 , you shouldn't do this. So then what? In Azure VMware Solution(AVS) , you have a couple of options.  You can do what the operations guide tells you that you shouldn't do or you can do something better.  Regardless of these options, the one thing you should do, is create a global content library.   Create Global Content Library:  here are instructions to create one on an Azure blob store. You should do this, makes life easier if you can. Also a fun little experiment to play w/ Azure Functions Centralized Storage of your ISO/OVF's etc. Attach external storage (Not necessary, but I'll explain why you s

NSX-T: Release associated invalid node ID from certificate

Image
Summary: Basically had an expiring certificate registered in NSX-T that was associated to a node_id that is no longer valid.  Long story short, there wasn't anything obvious in API to delete or disassociate a certificate from a node_id for 3.2.2.  Not sure how things got in this state, but annotating for future reference.  This may change in future revisions, so always check API for latest. Details: Effectively had a stale node associated w/ a certificate that was expiring.  Could not delete certificate until that node was disassociated from the certificate. To get certificate details and associated node_id's, you can use the following curl call (UI works too): curl -k -X GET -H "Content-Type: application/json" -u admin https://<manager ip>/api/v1/trust-management/certificates/<cert UUID> Above will return something like this: Below must be run from one of the manager nodes via elevation to root: ONLY RUN THIS IF YOU ARE ABSOLUTELY SURE OF WHAT YOU ARE DOI

iOS: Sleep Focus activating on wrong time zone

Image
Time is Relative Summary: For some strange reason, my sleep focus, was activating based upon my home timezone of EST while traveling to Japan and Australia.  My phone's timezone was correct as was my apple watch that is set to mirror my iPhone. [ Update:   While resolution below may help in some situations, I found my issue to be that I left a device (macbook pro) logged in and running in my home timezone while in Australia. It seems that there is no 'primary' controller for initializing focus modes.  It's basically whatever device sends the "It's focus time" message.  Which now explains why I has having such issues. So Apple needs to fix this by defining a primary device (should be my iPhone/Apple Watch IMYHO) so weird people like me who have multiple devices can get proper sleep outside my home timezone. ] Workaround/Resolution: Check if you have another other Apple device logged in with your Apple ID in your home location.  Chances are, if you do, that

Azure VMware Solution: NSX-T Active/Active T0 Edges...but

Image
Summary: Azure VMware Solution (AVS) delivers by default w/ a pair of redundant Large NSX-T Edge VM's each running a T0 in active/active mode.  So why is my traffic only going out one Edge VM? Short answer: The default T1 that is delivered w/ AVS is an active/passive T1 where you connect your workloads to.  So while it could technically take either T0, it's always going to go out the closest T0 to the active "SR" T1.  Where do the SR's live?  You guessed it, on the Edge VM's.  As you can imagine, this can lead to a bottleneck if you try to shove all your traffic through a single Edge VM. Simple Diagram: Longer answer with Options:

vCenter: Cluster Skip Quickstart Workflow via API

Image
Summary: Basically, whenever you reset vCenter, you might end up w/ a warning on a cluster running vSAN that's just annoying.  To circumvent, this from alerting, you need to disable quickstart.  Easy enough via UI, but API is a little weird here. Details: For one, code capture doesn't seem to understand this.  So no help there unfortunately.  Secondly, nothing named "quickstart" is in the API, so made this somewhat annoying to try and find.  Seems like someone had this question on the VMware communities forum 2 years ago w/ no answer.   Someone asked me internally, so I had to dig into it. Basically, two things: You can create a cluster w/ quick start disabled from the get go by passing a false boolean to a parameter named: "InHciWorkflow" via API/PowerCLI call Secondly, to "skip QuickStart" on an already created cluster, you can call a method called: "AbandonHciWorkflow" So yeah, you can see how "quickstart" and "HCIWorkfl

NSX-T: Find and Delete Orphaned Ports

Image
Summary: Basically had a bunch of orphaned ports (65000+), don't know why or how it happened (hypothetically NTP related), but needed to clean them up.  Doing it via UI was obviously not an option as it would only return 50 ports per page at a time.  Oh and it wouldn't refresh after every delete. Details: I'm saying 'orphaned', but in reality I'm only keying off the idea that the port is reporting "Operationally Down".  This could simply be a powered off VM, but there is little harm in deleting these type of ports as they will simply be recreated if that VM were to be powered up.   This may not apply in all situations, so use this with caution. Powershell Example(s): References: https://www.virten.net/2021/03/error-when-connecting-virtual-machine-to-nsx-t-segments/

vSAN: The cascade scenario that vSAN stretch cluster has issues with...

Image
Summary: Basically while testing stretch cluster, we ran into strange failover behavior.  The fact that it was not simply occuring.  During this testing, we found a dirty little secret about stretch cluster failovers.  One that makes me rethink if stretch clusters really is worth doing. Documented Failure Scenarios Details: All documented scenarios effectively deal w/ a 'single' type of failure.  The problem is disasters/failures can be multi-faceted and cascading in some instances.  Taking the Secondary Site Failure or Partitioned scenario and adding the 'cascading failure' to it and you end up in a whole world of trouble depending on the next 'failure'. Below effectively depicts the failure of the interconnect between the two sites.  The problem this fails to take into account is that there are typically 3 things involved to this.   The networking between the two sites The preferred site routers The secondary site routers So here is a slightly more involved d