Hotspots. Real data on rack cooling.

Since last year we have been running a few pilots investigating cooling in data centres across the UK. We now have nearly 15 months of highly detailed data on cooling in racks from different data centre locations that include standard servers, blade servers, hard discs, switches and even a bunch of Mac Minis and it seems like a good idea to share a bit of what the data is telling us.

Health warning: these are pretty preliminary ideas on what is, after all, quite a small sample of data. We’ll need more sites and more time to confirm these hypotheses. But now seems like a good time to take stock and see if we can answer some of the simple questions and think about where we could go with this approach.

It’s also likely data centre professionals are going to want to ask more sophisticated questions than we have about cooling and environmental monitoring from this data: we’ve listed a few  ideas for investigation below and as we add more sites and data we’ll come back to these.

Rewind: what’s Purr all about?

Here’s an Internet of Things recipe we love. Take some very dull data (let’s say temperature). Add a few bits of context (try time, exact location and location relative to other sensors). Sprinkle it with Internet Sauce and voila! Information…to inform decisions and help people decide what action to take next.

For Purr, these actions are about making HVAC more efficient. We think it’s a bit crazy, in the age of the IoT, that most data centres are using three times more cooling capacity than they need. We also think that looks like an expensive problem, or maybe an opportunity.

Temperature in server rack 1

Context for the data: What and where are our pilots?

For practical and ethical reasons (we have an absolute ban on publishing identifiable data without customer clearance) we won’t go into where these pilots are.

There are five test racks, instrumented with 6-8 temperature sensors and a gateway. All of them are in co-location facilities in the south-east of England sitting alongside other customers’ racks.

Data centre temperatures: what does the data tell us?

A few big stories jump out: after several months, the most noticeable feature is the difference between the best and worst performing racks over time. Our champion rack held a mean temperature of 20.6 degrees with less than a degree of variance throughout monitoring. On the other hand we had one rack whose mean temperature of 20.27 obscured swings of over 16 degrees during the trial including peaks over 31 degrees.

Our sensors also revealed the gradients in the racks – ambient intake temperatures change at different levels. Again this was much more pronounced in some facilities than others, with one rack showed a gradient with a 4.4 difference on the intake side. Others had a much tighter range – down to 0.2 of a degree.

Here’s a quick summary of the data from the trial:

Temperature for server racks

Why mean data can be mean – the stories behind the numbers.

Averaging or looking for max/min data obscures some of the interesting stories, which can easily been seen from a heatmap and graph. What, for example, happened on this afternoon?

Server rack temperature showing open door 

In this case, we know: that sharp fall in temperature is characteristic of someone opening the door of the server rack while they work inside.

This one is more mysterious: why does this cold side suddenly develop a hot spot? Is it a variation in air pressure in the facility? Some equipment slipped inside the rack?

Server rack temperature in datacentre 1

What’s next – more questions. More data

Clearly we have a long way to go before we have enough data to undertake the sort of detailed analysis that say, Backblaze can do. (Oh how happy that would make me!) 

But already there is a queue of things on our bucket list for investigation. Which include: 

  1. Can the delta between intake and exhaust tell us anything useful about type of equipment and the type of work it is doing?
  2. Can it give us a useful indication of where a rack has excess capacity?
  3. How many short excursions out of recommended operating range should be tolerated?
  4. Is there a characteristic pattern of temperature change that shows when air is mixing?

If you have questions about the effects of temperature on your server, hard disc or switch please let us know, we’d love to add to the list.

Better yet, if you have a site where you would like to know more about temperature changes then join our beta programme and we’ll install Purr and work with you to analyse the data.

 

This is the first in a series of posts called ‘Data (Centres), meet Information’ in which we share what we’re learning from our installations. We’re hoping to foster some discussion about the best ways to use information to improve Data Centre efficiency. We’d love to hear your thoughts – whether it’s on better ways to do this or if you think we’re barking up the wrong tree.

Leave a Reply

Your email address will not be published.