If your server, rack or cage is in one of the UK’s established and well run data centres you can probably guarantee that they are going to keep the temperature within a fairly well controlled temperature range (unless something goes very wrong).
You also probably already have onboard temperature monitoring in your servers, disk arrays and switches so I bet you have never thought about any other type of temperature sensing or even mapping.
But if you’re simply looking at onboard data the chances are you are actually missing out on a lot of interesting, actionable information. Lets talk about the useful stuff that we at Purrmetrix can help you learn.
1. Locating a Hot Spot.
When you look at your onboard monitors you get a number on a screen – maybe even a graph. But how – and where – does that fit in to the bigger picture and I mean that literally. By putting data into a heat map you can see instantly how that all fits in. Is the machine you are looking at the actual problem? Is it a wider issue in the rack with air flow? Is it one of the machines next to it? A picture paints a thousand words after all. You could use this information to help you to make a better informed choice about where to fit your next piece of kit or how to set up the next rack you buy.
2. Tracking access to racks
If you have 3rd party engineers accessing your kit for maintenance, replacing kit or just remote hands from the DC to check something out for you wouldn’t you like to know bit more about what they are up to? When they opened the door, which racks they accessed, how long they left the door open for while they walked away from the DC floor or even if they left the doors open when they left. Yes the data that you get from the kittens really are that sensitive as the picture below shows.
3. Remedying problem rack conditions
So what can you tell from the kittens that we believe can help you find out a bit more about whats going on when a fault occurs: if the temperature of the air coming into the rack has risen, if the air flow is being obstructed, someone has been into your rack, the humidity (only with the temperature and humidity monitors) has got up which could suggest a leak or a liquid spill, heat from a next door rack is effecting your rack etc. You could set up alerting in your account so that you can be informed when things go out of your preferred parameters.
4. Analysing what went wrong
Lets say that you had a total network meltdown and you really didn’t have time to study exactly what was going on in your rack regarding temperature. That’s ok because you can go back and replay the data when you have the chance. You might not have been able to figure out what happened or why. Perhaps the same scenario keep cropping up and causing a disk to fail or kit a server to reboot. You could find that there is some small event that is causing the issue or even that now you know how it shows up on the data you could recognise it sooner, set up an alert and stop it from causing the issue before it starts.
The importance of time and space.
There are a lot of possibilities but only with the correct level of monitoring. To make sense of the data you need to have information on where it is coming from, as well as when, and you can’t get this just from your onboard sensing. It’s time to bring our thinking up to date.
If you think all of this sounds interesting then ordering a starter kit would be great place to start. A medium bundle contains 1 gateway and 8 kittens, this would do the trick for one rack with 4 in the front and 4 in the back spaced equally from top to bottom. If you want to have a chat to us about your current step up, problems you think you may have and how best you can detect them using kittens then you can contact us on 01223 967301 or firstname.lastname@example.org
Liz Fletcher is Purrmetrix’s project manager. After nearly a decade in IT, miles of cabling and gallons of tea she is currently dividing her time between Purrmetrix and the UKNOF Programme Committee.