hpc

AMD Opteron 6100-series "Magny Cours" Processor Lineup

This spring AMD released their new line of server and workstation processors, the Opteron 6100-series "Magny Cours" processors. Their previous Opteron products were lagging behind the competition from Intel, so a refresh was seriously needed. The 6100-series is not a complete re-design of the Opteron architecture, but it offers significant performance improvements and warrants serious consideration. Extremely cost-effective 48-core Opteron servers are shipping now.

These processors are designed for high-end workstations and servers, so they will compete against Intel's Xeon processors. The same line of processors will go up against both the Xeon 5600-series and the Xeon 7500-series, which I analyzed earlier this year. This is a new twist, as both Intel and AMD have historically produced two separate lines of processors. With this release, AMD has designed the same processors to operate in both 2-socket and multi-socket SMP servers.

With a product this complex, it's very difficult to cover every aspect of the design. I will be focusing primarily on the performance of the new processors, with a particular focus on HPC as that is the market with which I'm most familiar.   read more >>

RSH, RLogin, and Bad User Interfaces

RSH and RLogin aren't that difficult to set up once you've gone through the man pages and done the installation a few times, but those first few times are a pain. They're old and insecure, but still frequently used on small compute clusters. I get the impression that a lot of beginners get stuck fiddling with them for hours or days. They're quite possibly the biggest stumbling block one faces when setting up a compute cluster by hand (setting /etc/hosts.equiv, setting /root/.rhosts, making sure the right flags are being sent to the rsh and rlogin daemons, etc).

Both use the xinetd daemon, which is one of those carry-overs from ancient Unix. Plenty of old Unix stuff made sense, but inetd is backwards. To enable a service, you set "disabled = no". To disable a service, you set "disabled = yes".

Putting double negatives in your configuration file is not a good idea. When a setting this basic takes a couple seconds of thought, you're doing it wrong. Were it something more complicated, administrators would be selecting the wrong option all the time.

Intel Xeon 7500-series "Nehalem-EX" Processor Lineup

Last month Intel released their new line of enterprise-class x86 server processors, the Xeon 7500-series "Nehalem-EX" processors. This is very significant, as their existing enterprise x86 processors (7400-series) were getting quite old and were not particularly competitive. The new Xeons provide much higher computational performance, as well as many enhancements for reliability, availability, and serviceability (RAS). They are immediately available in 4-socket configurations and will also be appearing in 8-socket configurations.

With a product this complex, it's very difficult to cover every aspect of the new design. I will be focusing primarily on the performance of the new processors, with a particular focus on HPC as that is the market with which I'm most familiar.

To the best of my knowledge, the Xeon 7500s are some of the most diverse processors released under the same name. Their core counts range from 4 to 8, with clock speeds ranging from 1.87GHz to 2.67GHz and L3 cache ranging from 12MB to 24MB. This makes the decision of which processor to purchase more difficult than ever before, as one can't easily determine which processor is "best". You have to carefully evaluate your application and requirements, as well as the capabilities of each model.   read more >>

Throughput and Latency

When working on high performance supercomputers, network latency and bandwidth are of utmost importance. If messages cannot be sent quickly enough between compute nodes, a supercomputer may actually perform more poorly than a standard server or workstation. So much time is spent waiting for input from other locations that nothing is actually accomplished. These days, more time is spent optimizing latency than bandwidth since most clusters have all the bandwidth they need (2-4GB/sec).

I've discovered that an analogy can be made with traffic on the highway. Computers keep getting faster, and now maybe traffic can too. I promise you'll notice this the next time you head out on the road.   read more >>

Syndicate content