A Critical Time In High Performance Computing

OpenHPC Member Spotlight

crest_letterhead

OpenHPC recently spoke with Prof. Thomas Sterling, Associate Director and Chief Scientist of the Indiana University Center for Research in Extreme Scale Technologies (CREST), to find out more about CREST, how they are leading academic supercomputing with ParalleX execution model and the HPX-5 runtime system, and why they are strongly supporting OpenHPC.

Prof. Sterling is maybe best known as the father of Beowulf clusters, developed in collaboration with Don Becker, and for his research on Petaflops computing architecture. He is the co-author of 6 books and holds 6 patents. He was awarded the Gordon Bell Prize with collaborators in 1997 and last year was inducted as a Fellow of the AAAS.

What is CREST and how is it different from other university supercomputing centers?
CREST is a research center with a focus on high performance computing, not a supercomputing center. It is organizationally situated under the IU School of Informatics and Computing, and its products and outcomes are strongly related to supercomputing systems and applications for both the present and future. The deliverables are the result of theory and experimentation and application results.

CREST is a context and an environment that allows a full-featured set of skills to work together towards common goals. The purpose of the center is to facilitate academic research. To this end it integrates professors, research scientists, post-docs, graduate students, software developers, and technical and administrative support. About half of the 70+ people at CREST are students, mostly doctoral students.

CREST contributes to 20 or more open source research projects. Can you tell us about them?
Open source is extremely important for CREST. We use open source as a means of technology transfer and sharing. This permits others, if they so choose, to take advantage of our always experimental results and products.

The centerpiece outcome is the HPX-5 runtime system. It serves a number of purposes. It serves on multiple funded projects, either using it directly to advance the state of runtime systems, or less directly in support of different kinds of applications.

For example, the National Science Foundation (NSF) funds a new dynamic library based on HPX for fast multiple methods and N-body codes. Under the Department of Energy (DoE) National Nuclear Security Agency (NNSA) we’re one of six PSAAP-2 projects in which we’re working with our colleagues at the University of Notre Dame and Purdue to develop a hybrid shockwave hybrid materials code to demonstrate the value of dynamic adaptive computing to very difficult, highly non-linear and irregular applications.

Also with DoE, we’re exploring the abstraction of execution models. This is a key contribution that we’ve made to the community. It’s controversial. Some people think it’s important, some people think it’s an anathema, some people think it has no meaning. We find it is an important abstraction to allow us to think holistically about total system structures and operation.

The execution model that we use is called “ParalleX.” It reflects some prior art, it reflects some unique contributions in synergy, and it provides a foundation of the development of our HPX-5 runtime systems as well as guiding possible parallel architecture advances and may inform possible APIs.

Can you tell us about your work with ParalleX?
There was a time when I thought everyone thought in terms of execution models. And it was a shock to me when I realized bringing up execution models as a medium of information exchange turned out to be somewhere between a contribution and a disruption.

Good people can take either side. But there’s a long history, even in the mainstream of computing, of alternative execution models that have been used. You’re certainly familiar with the Vector Model. And there was a point when the technology was ideal for the vector model using underlying pipelining, pipelining of communications to memory, pipelining of register, and pipelining of floating point ALUs back in the mid- to late-70s. You’re familiar with the SIMD Model, it was ideal for large scale integration – not very large scale – where a simple broadcast control stream could manage 100s or 1000s of proto-cores. Over the last couple of decades we’ve been using what Tony Hoare referred to as the Communicating Sequential Processes Model (CSP), or variations thereof, and other message passing models. All of these are execution models.

We have gone through at least 5 epochs of different execution models where the technologies have changed, driving the need for different methodologies. Execution models allow a bridging of programming methods, systems software and physical underlying architecture.

So the question is, after 20 years of technology change, what is the execution model that we need now and that we will need over the next decade? Some say we already have it. Some say it is communicating sequential processes, or some hybrid combined with multiple threads or asynchronous multiple tasking such as, at the language level, OpenMP combined with MPI. I’m sympathetic to that. There will be certain problem classes that will be well served by that into the exascale regions. But there are many others that simply won’t.

We worry about performance effects including starvation, latency, overheads, and the times waiting for the resolution for contention of shared resources, both physical and abstract. ParalleX is an experimental, admittedly academic, abstraction that introduces alternative techniques. Some of them well known in research, some of them unique.

Why did CREST want to participant in OpenHPC?
We want to make a specific contribution. When I did the Beowulf Project, totally by accident we ended up pretty much starting, in the realm of supercomputing, the use of the Linux operating system for commodity clusters; not by doing anything wonderful or brilliant, but by filling in a desperate gap with Ethernet drivers. We did this because we were looking for low-cost. And because Berkeley Software stack, which was funded by DARPA, was being litigated against by AT&T. At that time, my team and I were supported by NASA, and NASA would not allow us to use BSD which we otherwise would have for our experiments in scientific cluster computing. So having been – in those days – a hacker – the good kind! – and associated with people like Don Becker who were already playing with piles of floppy disks from Linus’ activities, I realized that there was a possibility to achieve our goals if we made necessary contributions. So Linux, which now dominates supercomputing, something in excess of 95% of the world’s supercomputers run one of many distributions of Linux, that was one of the contributions we made.

What was important was the fact that no one – individual or small group – could literally create a whole new class of supercomputing. But many people, across the country and around the world, could together. By associating ourselves with an emergent framework, in which we could benefit from the work of many different people interested in different things but under a unifying guidance of scaffolding interfaces, we were able to achieve our objectives in low cost, HPC for end users.

If, and I have to say if, OpenHPC does this right, you will provide that framework. And CREST could be a proactive contributor that others can benefit from. Or if they desire to go forward without our work, they can choose to do so. But ideally our work in runtime systems will complement the work of others.

We’re focusing on determining how to make dynamic adaptive execution work for architectures and operating systems and runtime systems and programming interfaces. Our principal contribution, other than our conceptual work and our experimental work, is in the deployment of the HPX-5 runtime system. That’s why we want to be part of OpenHPC.

The key thing here is the opportunity to integrate with others without intruding. We’re looking for a win-win opportunity.

What does the head of a supercomputing center do on weekends?
Sadly, I’m normally returning from a trip on Saturdays or leaving for a trip on Sundays! My interests are in four areas: (1) is Machine Intelligence. This is not the typical AI or machine learning, this is where a machine actually does understand something which they don’t currently do. (2) is maybe my only sport, it’s sailing. I love sailing! And (3) is amateur astronomy. And (4) I like to read history. In particular, about the Bronze Age. So I am a human being, not just a geek!