In the HPC industry, it seems that history is always doomed to repeat itself. The CPU isn’t fast enough, so we add a co-processor to handle the really serious calculations. Then process technology improves, we can fit more transistors on a chip and the co-processor is moved onto the CPU die.
For the last half-decade, we’ve been in the midst of this cycle. Researchers realized that graphics cards (GPUs) were basically huge vector processors. Why make a couple CPU cores churn away on the math when the graphics card has a couple hundred cores? Thus we have General-Purpose GPU computing (GPGPU). Some have resisted this trend, but a lot of very serious scientists and institutions are using GPUs extensively. Like many cutting-edge technologies there is constant change and it takes more effort to get everything working, but these co-processors offer significant benefits.
I wasn’t really around for the previous batch of co-processors in the 1980s, but it’s clear that this time there is more at stake. Multi-billion dollar corporations (with billion-dollar R&D budgets) are building the co-processors. Astronomers, biologists, physicists, chemists, doctors, surgeons, mathematicians, engineers, and bankers are taking advantage of the performance. The fields of data analytics and computational modelling are serious business. Some in the life-sciences fields are calling them the “computational microscope” because they offer so much potential.
In May, NVIDIA introduced their new Tesla GPU Accelerators based on the “Kepler” architecture. These will offer 4.58 TFLOPS single-precision and over 1 TFLOPS double-precision floating point performance. Intel’s MIC co-processors are also expected soon and will offer similar performance.
Performance continues to increase at an exponential rate. I won’t go into details here, but it’s worth reading about if you haven’t before. Three years ago I wrote about the future of computer memory – those predictions are still on track and a similar curve can be drawn for computational throughput.
What’s worth thinking about now is the next phase of the cycle. How will the GPU be integrated/absorbed by the CPU?
First, understand that it’s going to take another half-decade before we’re likely to get there. AMD has a “Fusion” CPU+GPU architecture, but it is designed for low-wattage consumer and mobile products. The same technology could be applied to high-performance server chips, but that doesn’t seem to be the plan. AMD’s CPUs are popular in the HPC space, but their GPUs are often eclipsed by NVIDIA’s products. The HPC market is only ~8% of the total server market. Cloud computing, virtualization and other datacenter-intensive initiatives are growing. Some combination of these factors must be dictating AMD’s plans for the future (unless they have a big surprise in store for us).
Intel has gone a different direction entirely. Rather than GPU cores, they’re building co-processors based on a streamlined x86 CPU core. Lots of potential, given their R&D budget and the amount of time they’ve devoted to the product, but unlikely to be anything but a co-processor for some time.
Last is NVIDIA. They’ve just announced their new Kepler GPU products, so those co-processors will probably be their main contender for 2012, 2013 and 2014. But they are building flexibility into the architecture with technologies such as “Dynamic Parallelism” and “GPU Direct”. With Kepler GPUs and CUDA 5, programmers will be able to run applications almost entirely on the GPU (with very little required of the CPU). Data and commands can be sent directly from the GPUs to storage, networking and other GPUs. Essentially, the CPU will be used to launch the operating system and load the applications. The GPUs can take it from there.
NVIDIA and a few of their customers are planning to mix ARM CPUs with NVIDIA GPUs. Others seem to be planning to move their CPUs as far out of the picture as is possible right now. You can purchase an ARM+GPU development kit today (NVIDIA CARMA). Over the next couple years, we’ll become accustomed to powerful GPUs attached to fairly low-end CPUs.
This is all speculation, and it seems foolish to suggest the imminent extinction of the CPU, but there are possibilities. Rather than AMD’s CPU+GPU Fusion products, we may see an NVIDIA GPU+CPU product: a GPU with a small number of ARM cores. The GPU will perform all the calculations and the ARM will be used to manage mundane operating system details.
Moving users and developers from x86 to ARM wouldn’t be easy – there is a lot of legacy code out there. But NVIDIA very rapidly moved a huge number of people from x86 to x86+GPU. Given the performance and energy-efficiency demands of “exascale” computing, this is possibly one of the best ways to get there.