vCenter 4.0 U1 Performance Statistics w/ mix of ESX 3.5 and 4.0 Hosts

Configuration:
  • ESX 4.0 U1
  • ESX 3.5 U4
  • vCenter 4.0 U1
Issue:
Discrepancy between tracked metrics and “Statistics Level” set in vCenter.  For instance, Level 1 Statistics Level in 4.0U1 now includes “CPU Ready” in historical tracking.  In vCenter 2.5, this metric was only historically tracked @ Level 3.  However, this seems to only apply to VM’s hosted on ESX 4.0 AND vCenter 4.0.  If the VM is hosted on an ESX 3.5 system, then that metric is not tracked.  Another possible cause for lack of historical metric data may be that SQL jobs are not completing properly.  See SQL Job resolution section.
Hypothesis:
I’m guessing here, but this is likely due to the way ESX 3.5 identifies the metric to vCenter which is probably as Level 3.  Below is what you might see when looking @ a VM’s performance on an ESX 3.5 and 4.0 host.
Examples:
ESX 3.5 VM w/ vCenter 4.0 Stat Lvl 2:
3.5 VM
ESX 4.0 VM w/ vCenter 4.0 Stat Lvl 2:
4.0 VM
References:
SQL Resolution:
This is something I ran into that may have been caused by someone in my group or may have happened during an upgrade.  Essentially, vCenter creates SQL jobs to rollup historical data, if one of these breaks, then you may only notice the issue when/if you change your vCenter Stats level. (New Counters don’t show up.)  In my case, a job was partially working so it would report success upon execution, but was not performing a ‘step’ because the previous step was set to quit upon success rather than proceeding to the next step.
This ‘broken’ job left me with a ‘hist_stat2’ table of 150 Million rows.  So, here are some steps to remedy something like this:
  1. Run the following against each ‘hist_stat#’ table to determine whether you need to truncate the table.  If it takes longer than 5 min to run, you might have a broken SQL job and should probably truncate the table(as long as your SQL server is performing normally)
   1: select count(*) from vpx_hist_stat3


  • The job I had problems with in particular was one named “Past Week stats rollup<databasename>”

  • Step 2 in the job was configured to “Quit the job” on success.  This step should be configured to “Go to the next step” on success.

  • WARNING: These next steps WILL delete all data against the target table and you will lose some historical performance data.  Be sure to have a backup just in case.

    • Once you’ve determined the table that hasn’t been getting rolled up, run the following command to truncate the table:
       1: truncate table vpx_hist_stat3