This spring AMD released their new line of server and workstation processors, the Opteron 6100-series “Magny Cours” processors. Their previous Opteron products were lagging behind the competition from Intel, so a refresh was seriously needed. The 6100-series is not a complete re-design of the Opteron architecture, but it offers significant performance improvements and warrants serious consideration. Extremely cost-effective 48-core Opteron servers are shipping now.
These processors are designed for high-end workstations and servers, so they will compete against Intel’s Xeon processors. The same line of processors will go up against both the Xeon 5600-series and the Xeon 7500-series, which I analyzed earlier this year. This is a new twist, as both Intel and AMD have historically produced two separate lines of processors. With this release, AMD has designed the same processors to operate in both 2-socket and multi-socket SMP servers.
With a product this complex, it’s very difficult to cover every aspect of the design. I will be focusing primarily on the performance of the new processors, with a particular focus on HPC as that is the market with which I’m most familiar.
Introduction and Disclaimer
No matter which model you choose, you’ll get a serious performance boost compared to older processors. They are the latest iteration of AMD’s multi-core Opteron architecture, which has been the preferred processor in HPC clusters for many years. When 64-bit Opteron processors were first released, Intel had nothing offering competitive performance. Several years later, Intel was able to take back the performance crown with the Nehalem and Westmere Xeon processors. AMD has won it back with their release of the 6100-series “Magny Cours” Opterons.
It’s worth noting that this write-up is simply an analysis of the processor architecture and features. A follow-up article will cover the real performance metrics. But important considerations come to light simply from reviewing the design, power consumption and price of each model. Hopefully this analysis will help you know what to look for as you consider your options.
Processor Cores and Computational Throughput
The processor lineup is fairly straightforward, with 8-core and 12-core processor models. These are the highest core counts currently available in x86 processors, but are not without a caveat. Each processor is built by gluing together two 4-core or 6-core dies. However, this hasn’t turned out to be a significant performance hindrance.
Where it becomes less straightforward is the overall computational capacity of each processor. Even the lowest-end server will be endowed with 16 processor cores, so you’ll need to provide a lot of jobs or threads to achieve full performance. The total computational capacity of the server will be a combination of the number of processors and their clock speed. Processor clock speeds are no longer increasing these days (they rarely even reach 3GHz), so efficiency and higher numbers of cores have to make up the difference.
Two arcs are visible in the above plot – one for the 8-core processors and one for the 12-core processors. The 12-core models are a step or two behind the 8-core models, simply due to the requirements of fitting higher core counts in the same thermal envelope. But the top-end, higher-wattage SE model brings the 12-cores close to the top of the pile.
The three HE models are high-efficiency, low-wattage processors and thus have some of the lowest clock speeds. However, there are also standard-power models in the same neighborhood, which are offered at very attractive prices. As we’ll see below, the HE models are not likely to be of interest to HPC users.
Continuing with the idea of determining total computational capacity, consider the total number of processor cycles provided by each model. This is simply the number of cores multiplied by the clock speed:
This is the most straightforward plot of the bunch – almost a completely linear curve. As you move up the model lines, the total number of processor cycles increases linearly. Half of these increases are due to processor frequency bumps, and half are from increased core counts.
Consider a comparison between the 6136 and 6174 models. The clock speed of the 6136 is 100MHz higher, but the increased core count of the 6174 provides approximately 33% higher performance. This is the type of difference that requires benchmarking: is your application threaded efficiently enough to take advantage of all 12 cores of the Opteron 6174?
Despite the simplicity of the plot above, it’s very important to recognize that the raw number of cycles a processor provides rarely translates into real performance. Every processor architecture has its own efficiency – the amount of useful work actually accomplished during each processor cycle. These efficiencies vary widely depending on the design, and will vary within the Opteron 6100-series.
For HPC applications, the largest deciding factor of efficiency is often the memory bandwidth. Both the 8-core and 12-core models provide four channels to 1333MHz DDR3 memory. This is the largest memory bandwidth currently available on commodity servers, with Intel “Westmere” Xeons providing three channels to 1333MHz DDR3 and Intel “Nehalem-EX” Xeons providing four channels to 1066MHz DDR3 memory. However, the 12 cores on a 12-core Opteron will have to share the same amount of bandwidth that 8 cores on an 8-core Opteron would have access to. This may cause some applications to scale poorly beyond 8 cores.
Also keep in mind that all models have the same quantity of L3 cache (12MB), so the 12-core models have less L3 cache per core than the 8-core models. HPC applications are unlikely to be severely impacted by this, as they typically are too large to fit in cache, but this may make some difference.
Few datacenters, if any, can ignore the power consumption of their servers. It’s naive to simply look at the overall power consumption of each server – you have to balance the power it consumes with the performance it provides. There is some variation between the new Opterons:
The above plot is a bit different from the others, as the bars represent the power consumed by each processor core (in watts). The points represent the number of watts consumed per billion processor cycles. Examining the bars is the naive approach, as this ignores performance per watt (but it’s worth looking at once). Examining the points reveals the true performance per watt of each model.
As you might expect, the 12-core models are more efficient than the 8-core models, because they provide more cycles within the same power envelope. But it’s surprising to see where some of the “high-efficiency” models land. If your servers do not frequently run idle, you’ll achieve better power efficiency using the higher-wattage 6176SE than any of the 8-core low-wattage HE models. In fact, the best efficiency is achieved by the 6174 rather than one of the HE models. Just another reason why you must carefully examine the data before making your choice.
Of course, the final decision frequently comes down to price:
Similar to power consumption, it’s wiser to look at performance per dollar than just the raw price of each model. The results are a bit jumbled, but it’s clear that the 6128 is the most cost-effective and the 6176SE is the least.
The HE models have a bit of a premium, but the highest premiums are paid for the 12-core models.
One model can’t be recommended for everyone – none of the processors is better than the others in every possible way – but there are a few excellent options:
6128: Considering this model is the lowest cost by a considerable margin, it’s often the first choice for that reason alone. However, it is the least efficient in terms of power consumed per processor cycle. Its lower core count and slower clock speed put it towards the bottom of the performance curve, but it is quite likely the most cost-effective x86 processor on the market.
6136: Providing the highest clock speed available for this generation, this model may be of interest to users with applications that do not scale to higher numbers of cores and want the shortest runtimes. Aside from that this model is less attractive, offering mediocre power efficiency and fairly high cost.
6174: With the absolute best power efficiency and second-highest overall computational throughput, the 6174 is almost certainly the best choice if you can afford it. This model will provide exceptional processing power and consume a relatively low amount of power.
6176SE: For users that must have the shortest runtimes, the 6176SE will provide that. Its power efficiency is better than the 8-core models, but it is by no means power efficient. You’ll pay the most for this model.
Some users and administrators will be tempted to consider Intel’s Xeon processors as competition for the Opteron 6100-series. The Xeons do win in same cases, but these Opterons win for almost all HPC workloads. I’ll review the performance numbers in my next article.
Data analyzed with R:
R Development Core Team (2009). R: A language and environment for
statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.