I’ve had the pleasure of working with VMware products for the last 10 years now, and over that time I’ve changed jobs more times than I can easily count. Throughout all of the change, one thing that has remained constant is the wonderful people in the community that VMware has around itself.
I’ve seen my own contributions ebb and flow throughout my time in the community. But just being part of the community makes you part of a special family. I use the word family, because that is what it truly feels like.
I haven’t contributed a lot lately, but I’ve been in Silicon Valley during the last two weeks and it amazes me how being thousands of miles from home you can bond with folks over something common like virtualization and become friends. Talking to folks on Twitter and a organizing dinner or attending the SV VMUG yesterday, it was like I never left Austin or haven’t blogged/tweeted in awhile.
While not a technical post, I think we need to take a step back and just enjoy the moments we have with each other and the relationships we form. Individual companies/products may come and go, but this community and the relationships we build will always be here and for that I want to say to everyone reading this.
One of the more recent hot topics in the performance analysis/monitoring space as of late has been the concept of behavioral analytics, learning algorithms or baselining. The concept itself is quite simple – Look for patterns in data sets and as the data set gets larger, the algorithms can get closer and closer to predicting the behavior over time or through correlation through multiple metrics/data points.
Take internet usage for example at your company. The most heavy times of internet traffic tend to occur when folks arrive and log into the network in the morning, around lunch time and at the end of day. This sort of repeatable pattern can be analyze and over time become a baseline for expect behavior of the metric(s) for internet usage. In this way you don’t have to understand the behavior and set the more traditional threshold of something like ‘Internet Usage over X’ is abnormal so alert on it. This is an over simplified version of what happens, but it gets the point across
No single person, or team of people, can be expected to truly understand the expected behavior of all of the workloads running, let alone how they would change over time. When you start to scale up to larger environments, these sort of behavioral analytics are crucial to the enterprise. For that reason large enterprises are increasingly looking for tools that can help them understand when they really have pain vs. noise of traditional thresholding methods.
But… there is a downside to learning algorithms that is hard to program around. Suppose your current environment you have 100ms of latency to your storage devices. This is obviously not good, but if you plug it into a learning algorithm, it will learn that the behavior might be normal. Yet you know that it is not good, the algorithms would come to expect that and you get the inverse, when it deviates away from that expected behavior you could get false positives. In that way, a metric like latency is not a good fit for a learning algorithm by itself. It needs to look at multiple metrics to try to correlate behavior or use a traditional threshold base value where ‘Latency over Y’ is bad.
Just some food for though.