Mar 022016
 

I get the feeling it’s time for someone to set the record straight on OpenHPC. Social media and the press have had some strong sentiments of OpenHPC is Intel and OpenPOWER is IBM. While each vendor did start its own “open” initiative, they’re addressing different challenges and are not as contradictory as you might think.

First, I should state that these are strictly my own opinions and I’m not entirely impartial. I am not a founding or current member of either organization, but I have been actively watching both since their inception. I have fairly strong opinions on each, but I also expect to be working with both. In other words, OpenHPC and OpenPOWER could work together!

Logo of the OpenHPC initiative

I say this because OpenHPC really does appear to be opening up. When the OpenHPC git repo was put up, the naysayers pointed out that all existing commits were from Intel. However, this repo is merely a set of build scripts and unit tests for existing open-source tools. OpenHPC is built using the open-source packages that HPC sites have been using for years (SLURM, Warewulf, OpenMPI, MVAPICH, etc). The building blocks of OpenHPC have been open for years.

Intel is pushing this initiative along through the generous contributions of Intel employees, but they are not locking others out. For instance, Intel’s Chief Evangelist of Software Products has openly invited IBM to join OpenHPC (and has stated it would be an “extreme disappointment” if the foundation didn’t feel open enough for such a move).

Furthermore, there’s at least one person looking into a port of OpenHPC to ARM64. I doubt Intel will assist too much in such an effort, but there is a clear precedent of cooperation (that individual has been given access to the OpenHPC build system).

Only time will tell, but I expect OpenHPC and OpenPOWER to significantly impact HPC in the coming years. The end of 2015 was very exciting for HPC, with clear support from President Obama and one of the most lively HPC conferences on record. I’m very excited for 2016.

Jan 022015
 

I just finished watching Particle Fever, which describes the ~30-year path that physicists endured before the confirmation of the Higgs boson particle. Thousands of people spent years of excruciatingly painstaking efforts to confirm one aspect of our reality. Yet there were setbacks (some taking years) and the collider won’t even be operating at full power until 2015 (although the original schedule called for full-power operation in 2008)…

Candidate Higgs boson event in CERN CMS detector

Candidate Higgs boson event in CERN CMS detector

I know (from both colleagues and personal experience) that the efforts from the IT and computational folks backing up these experiments are no less painstaking and mundane. Keeping a single computer operating correctly can be a pain. Keeping hundreds or thousands operating correctly (along with the incredible diversity of dodgy scientific software packages) is basically impossible.

Continue reading »

Mar 312014
 

I’ve been thinking a lot about the ways we automate tasks and abstract away difficult/complicated aspects of our lives. That’s what all the “progress” of the last 100 years has been – better ways to save labor and still get the same tasks accomplished. Our species is growing increasingly efficient.

I’m sure the HPC industry has also seen improvements in personal efficiency. Certainly we can get the compute portions of our work done much more quickly. But do you still find yourself fighting many of the same systems/software issues you faced years ago? I feel our industry still has a long way to go as far as making the everyday user’s life simpler.

Continue reading »

Nov 302013
 

Computers are complex systems, which makes them difficult to predict. Often times the hardware layers are fairly sophisticated, with the software adding even more factors – many more than a person can fit in their head. That’s why unit tests, integration tests, compatibility tests, performance tests, etc are so important. It’s also why leadership compute facilities (e.g., ORNL Titan, TACC Stampede) have such onerous acceptance tests. Until you’ve verified that an installed HPC system is fully functioning (compute, communication, I/O, reliability, …), it’s pretty likely something isn’t functioning.

Stampede InfiniBand Topology

Stampede InfiniBand Topology

The Stampede cluster at TACC contains over 320 56Gbps FDR InfiniBand switches. Including the node-to-switch and switch-to-switch cables, over 11,520 cables are installed. How much testing would you perform before you said “everything is working”?

Continue reading »

Jun 022012
 

In the HPC industry, it seems that history is always doomed to repeat itself. The CPU isn’t fast enough, so we add a co-processor to handle the really serious calculations. Then process technology improves, we can fit more transistors on a chip and the co-processor is moved onto the CPU die.

For the last half-decade, we’ve been in the midst of this cycle. Researchers realized that graphics cards (GPUs) were basically huge vector processors. Why make a couple CPU cores churn away on the math when the graphics card has a couple hundred cores? Thus we have General-Purpose GPU computing (GPGPU). Some have resisted this trend, but a lot of very serious scientists and institutions are using GPUs extensively. Like many cutting-edge technologies there is constant change and it takes more effort to get everything working, but these co-processors offer significant benefits.

I wasn’t really around for the previous batch of co-processors in the 1980s, but it’s clear that this time there is more at stake. Multi-billion dollar corporations (with billion-dollar R&D budgets) are building the co-processors. Astronomers, biologists, physicists, chemists, doctors, surgeons, mathematicians, engineers, and bankers are taking advantage of the performance. The fields of data analytics and computational modelling are serious business. Some in the life-sciences fields are calling them the “computational microscope” because they offer so much potential.

Continue reading »

Oct 082010
 

This spring AMD released their new line of server and workstation processors, the Opteron 6100-series “Magny Cours” processors. Their previous Opteron products were lagging behind the competition from Intel, so a refresh was seriously needed. The 6100-series is not a complete re-design of the Opteron architecture, but it offers significant performance improvements and warrants serious consideration. Extremely cost-effective 48-core Opteron servers are shipping now.

These processors are designed for high-end workstations and servers, so they will compete against Intel’s Xeon processors. The same line of processors will go up against both the Xeon 5600-series and the Xeon 7500-series, which I analyzed earlier this year. This is a new twist, as both Intel and AMD have historically produced two separate lines of processors. With this release, AMD has designed the same processors to operate in both 2-socket and multi-socket SMP servers.

With a product this complex, it’s very difficult to cover every aspect of the design. I will be focusing primarily on the performance of the new processors, with a particular focus on HPC as that is the market with which I’m most familiar.

Continue reading »

Aug 182010
 

RSH and RLogin aren’t that difficult to set up once you’ve gone through the man pages and done the installation a few times, but those first few times are a pain. They’re old and insecure, but still frequently used on small compute clusters. I get the impression that a lot of beginners get stuck fiddling with them for hours or days. They’re quite possibly the biggest stumbling block one faces when setting up a compute cluster by hand (setting /etc/hosts.equiv, setting /root/.rhosts, making sure the right flags are being sent to the rsh and rlogin daemons, etc).

Both use the xinetd daemon, which is one of those carry-overs from ancient Unix. Plenty of old Unix stuff made sense, but inetd is backwards. To enable a service, you set disabled = no. To disable a service, you set disabled = yes.

Putting double negatives in your configuration file is not a good idea. When a setting this basic takes a couple seconds of thought, you’re doing it wrong. Were it something more complicated, administrators would be selecting the wrong option all the time.

Apr 192010
 

Last month Intel released their new line of enterprise-class x86 server processors, the Xeon 7500-series “Nehalem-EX” processors. This is very significant, as their existing enterprise x86 processors (7400-series) were getting quite old and were not particularly competitive. The new Xeons provide much higher computational performance, as well as many enhancements for reliability, availability, and serviceability (RAS). They are immediately available in 4-socket configurations and will also be appearing in 8-socket configurations.

With a product this complex, it’s very difficult to cover every aspect of the new design. I will be focusing primarily on the performance of the new processors, with a particular focus on HPC as that is the market with which I’m most familiar.

To the best of my knowledge, the Xeon 7500s are some of the most diverse processors released under the same name. Their core counts range from 4 to 8, with clock speeds ranging from 1.87GHz to 2.67GHz and L3 cache ranging from 12MB to 24MB. This makes the decision of which processor to purchase more difficult than ever before, as one can’t easily determine which processor is “best”. You have to carefully evaluate your application and requirements, as well as the capabilities of each model.

Continue reading »

Oct 222009
 

When working on high performance supercomputers, network latency and bandwidth are of utmost importance. If messages cannot be sent quickly enough between compute nodes, a supercomputer may actually perform more poorly than a standard server or workstation. So much time is spent waiting for input from other locations that nothing is actually accomplished. These days, more time is spent optimizing latency than bandwidth since most clusters have all the bandwidth they need (2-4GB/sec).

I’ve discovered that an analogy can be made with traffic on the highway. Computers keep getting faster, and now maybe traffic can too. I promise you’ll notice this the next time you head out on the road.

Continue reading »