VCSA 6.5 + macOS Sierra

While macOS Sierra is not officially supported for deploying the new VCSA for vSphere 6.5, most folks have had no issues installing/upgrading on it. However, I was not one of those lucky ones and ran into an error about halfway through the installation process “Error: ovftool is not available” was the error message and would not let me pick the size of the new to deploy OVA.

Reading through the logs you will see something like this: 2016-11-17T00:33:11.992Z – error: could not find ovftoolCmd: /private/var/folders/lt/w4dsqrqn02x0h26k6vmwdtq40000gn/T/AppTranslocation/vcsa/ovftool/mac/ovftool

The quick fix is to simply copy the entire vcsa folder on the ISO into whatever directory is listed in your error log. Once you copy that entire folder (Which has the ovftool and the ova file) it should continue with no issues.



Why data granularity matters in monitoring

There are a wide variety of solutions on the market all claim to have various levels of machine learning, adaptive baselines, market driven analytics, learning algorithms etc… By and large they all collect the data the same way, poll vCenter for performance data via real-time APIs and on at the same time or less frequently poll all the other aspects of vSphere (inventory, hierarchy, tasks & events, etc…) and then store that data at varying levels of data granularity.

With every solution collecting the data from the same place, it should make it a straight out one algorithm vs. another, but it turns out that is not the case. There are two aspects that greatly influence the analytics – Frequency of polling and data retention.

Frequency of polling is pretty straight forward. Pull data faster equates to more data points and a better chance of catching peaks, valleys and general usage. However, with faster polling it comes with a cost of performance to poll the data every X minutes/seconds (on vCenter, the data collector and the solutions database) and a huge impact longer term on storing that data. Ideally, there should be some middle ground on collecting the data.

Most solutions poll every 15 minutes. Some of these can be changed down (good), and unfortunately many cannot (not so good). Those that can go lower, generally stop at polling every 5 minutes for vSphere. 5 minutes seems like an eternity to anyone focused performance monitoring and analytics. Fortunately, the vCenter API offers the ability to pull 20 second data for the last 5 minutes, which gets around most complaints. Pull 5 minutes (300 seconds) of history / 20 second point in time cycles = 15 data points.

One would think that is not hard to pull 15 metrics every 5 minutes, but every object can have dozens of metrics and properties to collect. In a smaller environment, this might be doable, but at large scale it can be enormous data sets to poll, forward and store. That data can expose weaknesses in the core platform of the solutions, and thusly 15 minutes or infrequent polling is enforced.

Even still, with all of that data, it has to sit somewhere and be analyzed. The algorithms need the historical context in order to consider ‘what might happen’ in 10 minutes or tomorrow or next week. What are the business cycles of a given application? How granular is it stored over time, and thus available for those super smart algorithms to analyze is key to answer those sorts of questions.

For a few, raw data is kept forever or a configurable amount of time, ideally long enough to analyze full business cycles. The unfortunate answer in most cases however, is it is not stored very granularly. Current data is fed in and stored at a highly granular state for a few days and rolled up over time to hourly or daily chunks. In practice this works well for analyzing short term spikes, but for anything longer term trends it will come with a harsh penalty.

Let’s take a look at this in graphical form. I’m going to use vRealize Operations to visualize the data below in the charts. In figure 1 we see the raw data coming in over the a period of a few days with 5 minute granularity. We can see the peak value of 106% Memory demand.

Figure 1.

Data Granularity set at 5 minutes

Data Granularity set at 5 minutes

Next if we take that same data and do 1 hour intervals, we see that in figure 2 we now have a memory demand of 57%. Some of the data granularity and importance is still there, but if we wanted to base this on peak usage we would be doomed already!

Figure 2.

Data Granularity set at 5 minutes

Data Granularity set at 1 hour

Lastly, if we look at Figure 3 where we have rolled up to daily intervals of data granularity. We have completely lost the peak and in fact it is looks to be in part of a trough of performance with only 32% memory demand for that day.

Figure 3.

Data Granularity set at 1 day

Data Granularity set at 1 day

The reality is anytime you take 180 samples per metric/property (60 minutes x 3 samples per minute = 180 samples) and roll them up, you are going to lose the importance. It doesn’t matter if it stores the minimum value, the maximum, the average etc… because regardless of which one you choose, there will never be the ability to go back and reanalyze that data, you will miss a peak, a valley, some importance. From then on the data will always be some sort of an average of an average and that makes all of the data correlation, capacity planning, forecasting and reporting suspect.

The last thing you want to base your next server purchase on is data that is averaged over long periods of time, you may end up under buying hardware and when those peaks come you will not be ready for them. The same goes for things like workload balancing, if your data historically is averaged, there could be times where you end up moving VMs multiple times vs. moving to the right place the first time.

Make sure whatever performance monitoring, capacity management and alerting solution you use has the ability to keep that granular data for full business cycles! These are just some of the reasons I continue to encourage customers to look at vRealize Operations. It provides all of the granularity and accuracy you need to perform your job and make the right decisions the first time.

Think about the kittens – Compress your PPTs

I’m pretty sure somewhere there is a saying that ‘Every time you send a 20Mb+ PPT, someone kills a kitten.’ So, please think of the kittens and Compress Pictures in your presentations.


Compress your PPT images

In most cases this simple, yet highly effective tip, can save you 50% or more on the size.

Installing a VIB via PowerCLI

I’ve been rebuilding part of my home lab and trying to really not use Windows servers, so no update manager for me. Which means when it comes time to install a VIB, I can use lots of different mechanisms. I could turn on SSH and use ESXCLI, use the VMA or PowerCLI. Since I still have that installed on my Windows 7 VM, I figured that is the easiest for now. Unfortunately, there were no clear up examples out there of: I have a VIB on X Datastore -> Go install it! So I wrote out a little code and it works like a charm!

Essentially I connect to my home lab, get all of my hosts in ‘Cluster’ and then install my VIB from the VIBPath. It took all of 5 minutes vs. lots of manual typing. For those wondering, I was installing the Synology NFS VIB.

Connect-VIServer VC.home.lab
ForEach ($esxHost in (Get-Cluster “Cluster” | Get-VMHost))
$esxcli = Get-EsxCli -VMhost $esxHost

Data Protection – Family Photos – Part 1

My life seems to be one of constantly trying to corral pictures. My wife and I both have a few picture taking devices (phones/tablets/etc…), my 4 year old already has a point and shoot and we have the larger family camera. With all these devices it has been a nightmare to get photos in one place, and at the same time secure that one place against the worst.

We’ve all heard it, a friend or family member loses everything to a theft, corrupt hard drive, ransomware etc… So how the heck do you protect some of your most valuable assets, the memories? For me, it’s been a LONG road to get even half of the way there, so I thought I would share with you what I’ve been doing right and so very very wrong…

Before I begin, a little about my setup – I’m an Apple user, we have MBPs in the house, a Synology NAS device and random other hardware/NAS/etc…

In the beginning – There was iPhoto… and then quickly I moved to Aperture. The photo management aspect of iPhoto was quite annoying and it lacked a lot of the more advanced features. But either way I had no issues with file management, one library had everything in it and that was around 10k pictures back in 2009. Very easy to manage, I would copy the library to a USB drive and I was solid.

Then came the kids – Like so many others, when we had our first child in 2010 I went from 10k photos to over 100k that year. It was a crazy, wonderful, beautiful time but my library went from 20G to around 150G in no time and I was off to get a bigger USB drive. it was still manageable but I started looking for a better solution. As the years have gone on I’m now in the range of 700G of photos and 300G of video. At this point I had to separate my albums for sanity sake and went to yearly based albums. The USB drives just don’t cut it anymore, even with rsync or other copy utilities, it just takes a long time to search for changes and sync, although cutting into smaller albums did fix a lot of this. There HAD to be a better way.

Dark times of file sharing programs – Dropbox, gDrive, Box, Skydrive and many others promised me hope in a world of unmanageable files and a spouse wanting to kill me for not having a better solution. So tepidly I stepped into trying them. I started with my 2009 and prior libraries, which seemed to go well. I had the libraries up in a few days and I tried opening them on different computers and it worked! And this was my downfall, I assumed that the larger libraries would be the same and copied in 2010 library and thought nothing of it… until it was 100 days to sync, and then it was 99.4 days and slowly it went and each of the tools was crunching away.

I was shocked to see roughly 400k files needing to be cached. I looked for 2010 and there were a little over 100k pictures, but why were there 300k ‘other’ things?! Turns out it’s the way that Aperture stores it’s files with thumbnails, previews and other stuff for their database. This caused all of the file sharing tools to slow things WAY down because of hashing, copying etc… Long story short, it failed miserably and had to scrap it. I couldn’t let it copy for 2 years all of the albums.

Nirvana – After trying other programs like Lightroom I realized I needed to change the way I store my files and see what I can do to shrink that massive file count size. I didn’t want to move to Lightroom just yet, as my spouse is more comfortable with Aperture. The answer was to move to ‘Referenced’ images. This is essentially storing the files in my own data structure vs. Aperture storing all of the files in their package (directory).  When I moved to this I realized I could solve 50% of my issue. Keep the albums local for now and get the files up and shared.

What does this all look like?

Photo Management Software – Aperture 3.X w/ Referenced Imanges

Aperture Libraries – Stored on local SSD drive

Master images (Referenced) file structure – Filesharing Folder -> Photos Albums -> Year

File sharing tool of choice – Dropbox

File Sync protection – Dropbox sync of Master images to 2+ other computers + Synology NAS

Data backup – Crashplan from primary MBP to Synology NAS + Crashplan Cloud (All photos, libraries, etc…)

When I moved to this method I went from 2ish YEARS of syncing to 10 days! All of my master images are now sync’d and I can look at the next steps.

What’s Next?

Figuring out how to get the Aperture libraries synchronized to all computers in a reasonable time frame AND the bit rot issue…

Gigabyte Brix + ESX 5.5 U2

Updating my lab this week I realized my bios was out of date on my Brix devices and decided to flash them prior to the 5.5 u2 install. Nothing to much out of the ordinary here, but the one thing to make sure you check is the CPU Settings after you upgrade. It seems to turn VT-x off by default, so make sure you turn it back on!

That said, being upgrade time again I always make a custom ISO for my BRIX because the ethernet card that is installed is non-standard. Not much has changed in terms of creating this, but the EsxImageProfile changed to Date driven vs. build number. So just make sure you watch out for that. Below is what I used. From there I simply added this to my Update Manager and pushed it out to my hosts.

# Add VMware Image Builder Snap-in
Add-PSSnapin VMware.ImageBuilder

# Add VMware Online depot

# Clone the ESXi 5.5 U2 profile into a custom profile
$CloneIP = Get-EsxImageProfile ESXi-5.5.0-20140902001-standard
$MyProfile = New-EsxImageProfile -CloneProfile $CloneIP -Vendor $CloneIP.Vendor -Name (($CloneIP.Name) + “-customized”) -Description $CloneIP.Description

# Add latest versions of missing driver packages to the custom profile
Add-EsxSoftwarePackage -SoftwarePackage net-r8168 -ImageProfile $MyProfile
Add-EsxSoftwarePackage -SoftwarePackage net-r8169 -ImageProfile $MyProfile
Add-EsxSoftwarePackage -SoftwarePackage net-sky2 -ImageProfile $MyProfile

# Export the custom profile into ISO file
Export-EsxImageProfile -ImageProfile $MyProfile -ExportToISO -FilePath c:\ESXi-5.5.0-20140902001-customized.iso

SpiceWorld Austin 2014

A few weeks back I heard about SpiceWorld through Twitter and discovered a few friends would be attending. Not knowing much about Spiceworks beyond the cursory look I decided to take a little bit deeper peak and wanted to learn more about both the company and their event.  I reached out to Spiceworks and had the pleasure to work with Raychelle about securing a pass and visiting the conference.

Pre-Conference (Monday)

Living in Austin certainly made the trip quite easy, a short drive downtown (past the Spiceworks office no less!) and I was there. Monday was essentially the pre-party/get together. For those of you that attend VMworld, I would compare it slightly to #VMunderground, heck even Brian Knudtson was there! As many parties go before conferences, there was food/drinks and many a technical conversation to be had. My biggest thoughts pre-conference was how diverse of a group there was attending, but more on that later.

Reminiscent of a time long ago

Not knowing much of what to expect I arrived Tuesday morning early and ready to kick things off. I was greeted with the usual registration lines, folks running around and a HOT breakfast. You are reading that right, a HOT breakfast at a conference and it was actually good. The 1500 or so folks attending were getting right into the swing of things and I have to say the energy was really refreshing. One thing quickly evident about SpiceWorld, it is very admin-centric. It hasn’t been invaded by 1000s of executives and overly ‘salesy’ folks.
To preface, I’ve been working as a consultant, product manager, architect and various other roles for the last 10 years, but before that I had the usual IT administrator day job. I ran our corporate IT data centers for a pharma company. We had AS/400s, Windows, Citrix, you name it… Back then I remember the struggles with users, printers constantly breaking, remote access (dialup back then!) and all of the other pains.
Not dealing with users day in and day out I’d compartmentalized that portion of my brain and rarely thought of the struggles of the IT admin. Yet, here I am being confronted with them again and how difficult a job they really have. No amount of software or hardware can change the personal interactions and struggles that still exist in IT today. Nothing can change that ‘marketing’ person who constantly breaks the printer or wants the overly customize their wallpaper and turn mouse trails on.
Listening to speakers talk about all of this and then giving advice on how to deal with different situations and personality types was great. It was great to see IT admins really open up about difficulties they face and also hear very constructive conversations on how we can fix these things. During the Q&A was when it hit me, no one is really afraid to open up at this conference. Normally, in a room of 50 or 100+ when Q&A comes around, folks get shy and there are one or two people who will start things off. Every session, without fail, there were 5 – 10 hands going up for the mic and people wanted to interact. It was very refreshing to see so many people interested and not shy about speaking up.

Speaker content

Unlike so many of the other conferences out there, most of the speakers were not paid for vendors OR 100% use our products from vendor X. SpiceWorld had THE BEST speakers in terms of knowing their audience, content & participation. Each session I attended was very well-tailored to the conference and never strayed to far from topic. I was very impressed, as always, by Stephen Foskett and Chris Whal, but enjoyed all of the presenters immensely on topics from Marketing to users to Storage to Networking. I honestly cannot say enough praise to Spiceworks for getting such a great lineup of folks to speak at the conference.


Not knowing much about Spiceworks, other than a lot of friends work there, I really enjoyed learning more about their platform and what they are looking to do. Essentially, they are a social media platform and forum for users to ask questions, interact with vendors and leverage as a helpdesk. It even has some really neat features for monitoring and reporting. The platform itself is free, although there are portions you can buy for more feature completeness, and they have a large user base so it stays pretty active.

I’d say the biggest difference to sites I am use to, mainly VMTN, is that the user base is mostly SMB. For me, it’s almost a time machine back 5-10 years on the ESX side of things as questions tend to be around getting up and running, configuration and architecture of SMB designs. That said, some of the other areas that I am not an expert in, ie. Networking or Storage, do have a large mid-market and what I call ‘departmental-enterprise’ feel to the questions. Overall, I can say I will be visiting more often now that I know about the resources.

Data Analytics

One thing I do think about is the sheer amount of data that Spiceworks has and the massive analytics that could be done on that platform. Everything from machine types, cpu sockets, applications running etc… there is so much data there that is just begging to be analyzed, I hope that Spiceworks will leverage that and/or open that up to the community to start doing some data mining.

Closing Thoughts

If I had to boil it all down to a single word for SpiceWorld Austin, I would say ‘community‘. Even being new to Spiceworks/Spiceworld, every person I met was genuinely kind and inquisitive about what brought me to the conference and then just opening up about what we do and how we all do it. The conference was a huge success in my mind for both Spiceworks and for myself. I can certainly see myself back next year and hopefully I can snag a speaking slot.


Community matters

I’ve had the pleasure of working with VMware products for the last 10 years now, and over that time I’ve changed jobs more times than I can easily count. Throughout all of the change, one thing that has remained constant is the wonderful people in the community that VMware has around itself.

I’ve seen my own contributions ebb and flow throughout my time in the community. But just being part of the community makes you part of a special family. I use the word family, because that is what it truly feels like.

I haven’t contributed a lot lately, but I’ve been in Silicon Valley during the last two weeks and it amazes me how being thousands of miles from home you can bond with folks over something common like virtualization and become friends. Talking to folks on Twitter and a organizing dinner or attending the SV VMUG yesterday, it was like I never left Austin or haven’t blogged/tweeted in awhile.

While not a technical post, I think we need to take a step back and just enjoy the moments we have with each other and the relationships we form. Individual companies/products may come and go, but this community and the relationships we build will always be here and for that I want to say to everyone reading this.

Thank you.


Behavioral Analytics – Food for thought

One of the more recent hot topics in the performance analysis/monitoring space as of late has been the concept of behavioral analytics, learning algorithms or baselining.  The concept itself is quite simple – Look for patterns in data sets and as the data set gets larger, the algorithms can get closer and closer to predicting the behavior over time or through correlation through multiple metrics/data points.

Take internet usage for example at your company.  The most heavy times of internet traffic tend to occur when folks arrive and log into the network in the morning, around lunch time and at the end of day.  This sort of repeatable pattern can be analyze and over time become a baseline for expect behavior of the metric(s) for internet usage.  In this way you don’t have to understand the behavior and set the more traditional threshold of something like ‘Internet Usage over X’ is abnormal so alert on it.  This is an over simplified version of what happens, but it gets the point across

No single person, or team of people, can be expected to truly understand the expected behavior of all of the workloads running, let alone how they would change over time.  When you start to scale up to larger environments, these sort of behavioral analytics are crucial to the enterprise.  For that reason large enterprises are increasingly looking for tools that can help them understand when they really have pain vs. noise of traditional thresholding methods.

But… there is a downside to learning algorithms that is hard to program around.  Suppose your current environment you have 100ms of latency to your storage devices.  This is obviously not good, but if you plug it into a learning algorithm, it will learn that the behavior might be normal.  Yet you know that it is not good, the algorithms would come to expect that and you get the inverse, when it deviates away from that expected behavior you could get false positives.  In that way, a metric like latency is not a good fit for a learning algorithm by itself.  It needs to look at multiple metrics to try to correlate behavior or use a traditional threshold base value where ‘Latency over Y’ is bad.

Just some food for though.

Disk Space – What matters?

One of the more common things people ask me in monitoring is how can I accurately know about when I am going to run out of disk space.  The common method is to look at remaining capacity to show 80, 90, 95% or something to that affect.  What if the drive is 2TB?  Even at 95% full, that means there is ~102G free.  So would I really want want to know that it is low on space?  But would I want to just know only on size?  What about growth rates?  If I have a server that is normally full, but it’s not growing would I even want to know that it’s % or size is full?  What action would I take?

So what should you care about?  How can I reliably tell when I am running out of drive space?

The simple answer is growth rate.  If it is growing, when will it reach full?  This has to be looked at long term AND short term in something like a moving average so we don’t get too much noise, but also we want to know if it starts filling up quickly and therefore a balance has to be had on two different growth rates (short/long term imo)

Beyond growth rate, you still want to allow for the more traditional gates.  If the host/vm/drive was just built it will go from 0 to 25% or something and that skews data.  So you have to balance the freshness of the data with the traditional 80/90/95% and the MB/GB remaining that you are comfortable with.

What I propose is something like this:

  • Check the age of the object – If it is new, calculate growth, but don’t use it until we have enough datapoints.
  • Calculate growth rates for daily growth rate (many 2-10 min chunks) and that becomes part of the overall weekly growth rate (many hourly chunks).  You can then even take the weekly rate over time and look at that growth rate for longer term trending.
  • If we are established and we have high growth, tell me when I get 4 hours, 2 days and 5 days out.
    • Anything else and it’s noise, why would I care that I’m a month from running out of space?  I want to know prior to the weekend that I could have issues next week (5 days).  I am busy, so tell me that I have a day or two to deal with it (2 days).  Ok it’s still growing and getting close, time to expand the drive, delete stuff etc… (4 hours)
  • If we don’t have a reliable set of growth rate data, fall back to space free and/or percentage based.  Set reasonable gates, 90/95/97% and/or 20G, 5G, 1G.  Clearly this part is more about you knowing your environment, because one size doesn’t fit all.
    • Right – Your D: drive is 97% consumed and only has 300M free.
    • Wrong – Your D: drive is 97% consumed and you have 200G free.  (Really?)

In the end, growth rate should be what we want to focus on, as it tells me what I really want to know – You are going to run out of space in X hours, so do something! When we don’t have enough data, we can’t ignore drive space, we just fall back to the traditional methods.  In the end this becomes actionable data and we all already know what you do on drive space running low.

So now what?  If you stop and look at the file system, we could do the same thing here to look at what files are being touched and growing, what is new on the drive that is filling it up and even point to common file types that could be removed to reduce the size of the drive.  That can get a bit more complicated, but again we want actionable information and why have to hunt and peck for what files are top consumers, do I have 10k temp files that are pretty safe to delete or even what directories are the ones that are doing the growth?  This is all things monitoring tools can do today, so why not surface that when you tell me the alert.

“Your C: drive is currently at 500M free and will fill up in 3.5 hours.  The fastest growing folder is C:WindowsTemp @ 140M/hr and the largest folders are C:Program FilesSomethingat 40G and C:AppCustomdatabase at 27G”

Something like that is pretty powerful.  I know instantly what is wrong, what things are likely causing it to be a problem.  I can now go add space or look at the FS to why things are growing.