Computational Science Graduate Fellowship
Home navigation
Welcome navigation
Practicum Experiences navigation
DOE Lab Research navigation
Alumni Profiles navigation
Howes Awards navigation
Activities & Events navigation
Archive of past issues

Fellows Directory navigation
DOE Labs Research headline

Pacific Northwest National Laboratory

Harnessing Hundreds of Thousands of Processors

(Page 2 of 3)

Shared memory, however, comes at a price.  “One-sided sharing is like accessing somebody else’s mailbox without involving the post office, so everybody has to know where everything is,” Nieplocha says.  “That means Global Arrays must maintain indexes to track the physical location of the data, and employ communications techniques that optimize how the data flows between processors.”

The payoff, however, is huge. Instead of spending time describing handshaking routines for thousands of memory locations, programmers can access shared memory the same way they would on an individual PC.  They don’t even need to know the underlying mechanics of memory manipulation.


ScalaBLAST gives a sizeable performance boost over BLAST, a conventional sequence analysis tool.
Click image for larger version and more information

Nieplocha and colleague Chris Oehmen recently put Global Arrays to work in BLAST, a bioinformatics program.  BLAST lets researchers match snippets of newly-sequenced DNA or protein structures with known genetic information.  In August 2006, Nieplocha and Oehmen published a paper showing that their new global array-based ScalaBLAST software is the most scalable and highest-performing parallel implementation of the BLAST algorithm.

Smarter Memory

With so many processors yoked together in modern supercomputers, the researcher explains, the speed of an individual processor hardly matters.  Instead, the most common impediment to high-performance computing is the speed at which the network shuttles data between memory, processors, and hard disk storage.

By doing away with time-consuming handshaking, Global Arrays let data flow smoothly along the network to tens of thousands of thirsty processors.  But what if supercomputers could process some types of data without even taking them out of storage?

Storage on computers means hard drives, which act like file cabinets to stockpile information.  When computers need the data, they go to the cabinet, open a file, read it, and do something with the data.  Then they write the results onto another file in the cabinet.

But what if the filing cabinet were smarter? What if you could write measurements in feet and inches, but pull out the same data in meters and centimeters when you needed metric units?

Better yet, suppose you wanted to study rainfall at certain elevations. Ordinarily, the supercomputer would have to send all the rainfall data to a processor to sort out the information you want.  But suppose your hard drive did the sorting for you?  This would reduce the flow of unwanted data over the network and also cut the time processors waste on doing menial calculations.

Visionaries first suggested the concept, called active or intelligent disks, 20 years ago.  They proposed using small processors to run simple calculations on stored data.  Hard drive manufacturers rebelled at the idea.  They did not want anyone tinkering with drive electronics, since it might cause unexpected failures.

In 2005, the PNNL team found a way to do those calculations using another part of the computer that’s usually better left alone: the kernel.  The kernel contains the core functions of the computer operating system.  Like brain surgery, tinkering with the kernel is both delicate and costly.  “In the long term, such modifications are too complex to be practical,” Nieplocha says.

One year later, however, the group developed a way to achieve the same results with software that intercepts requests for data as they move through the network.

“Now we don’t have to modify the operating system or any system software,” Nieplocha explains.  “And we can also do some types of processing, such as floating point computation, that you cannot do in the kernel.” The group is testing the approach on bioinformatics software.  “We still have a few issues, but there are no major research challenges,” he concludes.

« Previous       1   |   2   |   3   |   References   |   Sidebar   |   Print       Next »