DEIXIS 2007-2008  THE DOE CSGF ANNUAL

Lawrence Berkeley National Laboratory

Power Play

NERSC Makes High-Performance Computing Accessible

By Thomas R. O’Donnell

The personal computers in Lawrence Pratt’s laboratory weren’t cutting it.  His research on the structures and interactions of lithium compounds was hindered because the Pentium 4-type machines he uses at Fisk University in Tennessee couldn’t keep up with the demands of modern computational chemistry.

His work finally kicked into high gear with a grant of 150,000 high-performance computer processor hours — and the help of the National Energy Research Scientific Computing (NERSC) Center, based at the Department of Energy’s Lawrence Berkeley National Laboratory (LBNL) in California.

“I use every last minute” of computer time, says Pratt, who got onto the powerful NERSC machines through the DOE’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. He’s since qualified for another grant of 150,000 processor hours and “I’m burning through it like crazy, but I’m also publishing a lot of papers — five so far this year.”

As INCITE projects go, Pratt’s is small.  Most INCITE projects, which hold potential for major scientific breakthroughs, are awarded millions of processor hours — but they consume up to 20 percent of the hours available on NERSC’s massively parallel computers, says center director Horst Simon.  The remaining 80 percent is divided among 300 or so projects, each using tens of thousands to millions of processor hours per year.  A small amount of time is set aside for “startups,” or researchers who are still preparing their software for massively parallel processing.

Simon would be happy to have more such projects use NERSC computers as a gateway to high-performance computing.  In particular, he’s working to provide more students — especially DOE Computational Science Graduate Fellowship (DOE CSGF) participants — more opportunities to try their programs.  “We want to get students interested in using NERSC or any of the other DOE computational resources, so they have a positive experience and become used to integrating scientific computing into their work,” especially after graduation, says Simon, who is leaving his post soon to concentrate on other roles at the Berkeley lab.

The center is a perfect place for a first taste of science on massively parallel computers.  For more than 30 years, it’s been DOE’s main production center for scientific computing, and it hosts some of the department’s largest, fastest systems for unclassified research.  More than 2,500 users from dozens of universities, private research institutions and DOE laboratories work on around 300 projects each year.  Yet, users rarely visit the NERSC facility.  Most connect to and use center computers via ESnet, DOE’s high-speed network, and the Internet.  Their work produces mountains of results — for 2006, researchers published more than 1,400 papers related to calculations on NERSC computers.

The NERSC Center’s role as a DOE service facility means the projects running on its computers cover virtually every strategic theme pursued by DOE and its Office of Science — “Everything from astrophysics down to nanoscience,” Simon says.  Fusion energy, materials science, chemistry, climate, genomics, computational biology, applied math and computer science are just some of the disciplines with research on NERSC Center machines.  “Our mission is really to be the high-end production resource for the Office of Science, so general purpose and diverse applications have been part of our mission,” Simon adds.

In-house expertise is part of what makes the NERSC Center popular with researchers.  The staff works with computational scientists to tune applications for the best performance, visualize their results and make their research more effective.  There’s a strong culture focused on assisting users, and the staff is experienced and stable, Simon says.  The center stages regular training sessions, encourages user communities to exchange information, and hosts databases of user questions.

“If there’s a difficult project that NERSC staff can work with the scientific users on, it often turns into a scientific collaboration,” with the staff member listed as a joint author on research, Simon adds.

Pratt, who scaled his mathematical models up from the single processor in his desktop computers to eight processors, says NERSC experts were helpful.  Without access to high-performance computing, “There are projects I would not have been able to complete,” he says.  “We were able to find out a lot about the chemistry through computational mechanisms that would not have been easily obtained by experiment.”

Large allocations of NERSC computer time generally are awarded competitively through annual requests for proposals, with DOE program managers making the decisions.  Computing experts at NERSC, Argonne National Laboratory, Oak Ridge National Laboratory, and Pacific Northwest National Laboratory judge whether the codes and algorithms described in the proposals are ready to run on massively parallel machines.  Peer review panels scrutinize the proposals for their impact on science.  “This is sort of self-selecting, because the people who apply know what the reviewers are looking for,” Simon says.

In the past, requests often totaled more than 10 times the amount of available time.  Now the deployment of ever-more-powerful computers allows NERSC to meet about half of the requests.  While NERSC seeks to accommodate projects and investigators with little or no background in parallel processing or high-end computing, those are becoming rare as parallel machines become ubiquitous.

The process and facilities may seem daunting, but it’s not hard to get a foot in the door.  New users who want to try their codes or develop new ones on NERSC machines can apply online for a startup allocation. Startups must meet the Office of Science mission and require high-performance computing.

NERSC is making it even easier for DOE CSGF fellows.  It’s allocating 40,000 to 50,000 startup hours to between 50 and 70 projects fellows put forward.  The allocations will let students see how well their codes scale in parallel, and enable them to work with NERSC consultants to improve their projects and code performance.

The idea is to get fellows to look beyond the computational resources they have at hand through their major professor or department, says David Skinner, leader of NERSC’s Open Software and Programming Group and coordinator for the SciDAC Outreach Center (see sidebar).  Some students may have discovered NERSC Center computer resources if their advisor has used them, but the goal is to attract even those who have not worked with the center before.

Fellows can go to a Web page and fill out a survey about their computing needs.  About half of respondents through summer 2007 “were people who said ‘This is where I’m generally headed, but I’m not there yet,’ ” Skinner says, but other fellows said they can’t get their research done fast enough with their present computer resources.

“A couple…said ‘My workstation is too slow for the work,’ ” Skinner says. “For those, my response was to get them onto some high-performance computing facilities,” either at NERSC or another national laboratory, such as Oak Ridge or Argonne.  The startup allocations should let fellows try out their codes for up to 18 months.

Students, Simon says, often are shy about asking for help getting their codes to run on NERSC Center computers.  Many are accustomed to solving their own computer problems and working with university computing centers that often were staffed by their fellow students, Simon adds.  They’re “quite surprised when they come to NERSC, because we are a full-service organization.”

“We don’t discriminate against students,” Simon says; they’re often the people doing the nitty-gritty coding for their major professors’ research.  DOE CSGF fellows who get access to NERSC can work with some powerful computers:

NERSC also has two data storage systems:

In the fall of 2007, NERSC also will bring its latest computer system on line.  Dubbed Franklin, the Cray XT4 will have 19,344 compute CPUs, at least two gigabytes of memory per CPU, and a sustained performance of 16 teraflops, as opposed to a theoretical peak performance of 100 teraflops.  With future upgrades, Franklin could have a theoretical peak of 1 petaflops — one quadrillion calculations per second.

Franklin will increase the number of NERSC computer cycles available for research by a factor of 16, Simon says — and every one of them is needed. “Researchers often ask for many more processor hours than we can actually accommodate.  In the last couple of years we had requests that were more than six or seven times what we had available,” he adds.  Most researchers got only a fraction of the processor hours they wanted.  With Franklin, “We expect we will have, for once, enough cycles to keep everybody happy” — but not for long. Demand for computing time is constantly growing.

That’s why, as soon Franklin is stabilized and running, NERSC will begin preparing for the next system, called NERSC-6 for now. NERSC-6 is likely to start life as a 1-petaflops-capable machine.  It’s also likely to have even more processing cores on a single chip, a change that poses challenges for the future NERSC director.

“There’s always enough work to do,” Simon says.  “There’s exciting stuff to do as long as computers grow and become more powerful.  We never stand still.”


References

Horst Simon was named Associate Laboratory Director for Computing Sciences at Berkeley Lab in 2004.  He represents the interests of the lab’s scientific computing divisions — the NERSC Center and Computational Research — in the formulation of laboratory policy, and leads the overall direction of the two divisions.  He also coordinates constructive interactions within the computing sciences divisions to seek coupling with other scientific programs.  Simon joined LBNL in early 1996 as director of the newly formed NERSC Division, and was one of the key architects in establishing NERSC at its new location in Berkeley.  Simon also is the founding director of Berkeley Lab’s Computational Research Division, which conducts applied research and development in computer science, computational science, and applied mathematics.  His research interests are in the development of sparse matrix algorithms, algorithms for large-scale eigenvalue problems, and domain decomposition algorithms for unstructured domains for parallel processing.

Simon’s recursive spectral bisection algorithm is regarded as a breakthrough in parallel algorithms for unstructured computations, and his algorithm research efforts were honored with the 1988 Gordon Bell Prize for parallel processing research.  He also is one of four editors of the twice-yearly “TOP500” list of the world’s most powerful computing systems.

David Skinner was the lead high-performance computing (HPC) consultant for the Department of Energy’s first six Innovative and Novel Computational Impact on Theory and Experiment (INCITE) projects before starting the SciDAC Outreach Center.  The INCITE program has since blossomed into a multi-lab allocation process for large-scale computing.  In that work and other projects Skinner often focuses on making scientific applications run fast and scale well.  The core areas of Skinner’s present HPC research include improving application performance, characterizing scientific workloads, and analysis of emerging architectures.

Skinner heads the NERSC Center’s Open Software and Programming Group at NERSC, which is active in making software deliver for HPC centers and users and in promoting software development practices that enhance reliability and performance in the overall HPC process. His group works on a variety of software related to parallel computing applications themselves as well as HPC center infrastructure software for system monitoring, allocation banking, and Web services.  Skinner also publishes scientific research in the areas of molecular dynamics, chemical quantum dynamics and kinetics.

Further Reading:
NERSC Center Web page:
http://www.nersc.gov/

NERSC Center strategic plan:
http://www.nersc.gov/news/reports/LBNL-57582.pdf

NERSC Center newsletter archive:
http://www.nersc.gov/news/nerscnews/

Contact:
Horst Simon
HDSimon@lbl.gov

David Skinner
deskinner@lbl.gov

Practicum Coordinator:
Daniel Martin
DFMartin@lbl.gov

The Krell Institute
http://www.krellinst.org/