In Search of an Exascale Roadmap
In this exclusive interview with Indiana University’s Thomas Sterling, we break down a number of exascale topics with a candid look at the good, the bad, and the – well, not so attractive efforts and results of 2011.
If you are interested in exascale, you really should read the article and listen to the audio podcast which is not a transcription of the article, but a separate interview discussion that expands on points made in this thought provoking piece.
In Search of an Exascale Roadmap
The past year, 2011, has been a critical year for Exascale computing in that it has been both a disappointment in this regard and at the same time has created an emergent context in which responsible revolution for essential innovation may occur. In a sense the Exascale community, if that can be considered a meaningful term, has grown up. The disappointment has come from DARPA; the agency historically noted for leading HPC system research and development. This strategic commitment had wavered but was seen in resurgence with important funded studies and the initiation of the UHPC and OHPC programs. While UHPC was not formally an Exascale program, it would have developed the necessary hardware and software technologies necessary and capable of supporting large, but not prohibitively large, Exascale computational systems. OHPC was an explicitly Exascale technologies-related program that would have augmented the principal UHPC projects (there are 4). OHPC was cancelled shortly after the awards were made and there appears little likelihood that UHPC will extend beyond its first phase although it was originally planned for four such phases to produce final proof-of-concept experimental platforms. In truth, much has been learned even from this single 2-year phase but it is disappointing that the leadership momentum capturing talents of industry, national laboratories, and academia is being dissipated and will be lost.
If not DARPA, then who? In the US, it may be DOE. But the US is no longer the only, perhaps not even the leading, nation in the field of HPC and, more importantly, its application to critical economic and national security related objectives. Asia has dominated, at least visibly, with peak performance systems over the last year. Europe has demonstrated impressive progress in HPC applications as well as user tools.
The International Exascale Software Project (IESP) is an important initiative that more than any other has engaged the combined interests and leadership across many nations to consider and organize future directions in the domain of supporting software for Exascale computing. There have been seven such meetings organized by such key leaders in the field as Pete Beckman, Jack Dongarra, Paul Messina (all from the US), Bernd Mohr (Germany), and Satoshi Matsuoka (Japan) among others. This has been successful in creating a worldwide forum to draw attention to the goal of Exascale software and provide a medium of exchange. Much information from this cooperative endeavor is available at www.exascale.org. However, (and this is a personal opinion) there has been a relatively narrow view exerted as to how the problem of Exascale software should be addressed with strong resistance to considering revolutionary approaches. Indeed, those who would otherwise have represented dramatic change have had limited speaking time while incremental changes to conventional practices has been advocated by the mainstream. This intransigence has spilled over to the US debate as a whole, slowing down progress in needed advances, again a personal opinion. It should be noted that in Cologne at IESP-7, a working group on “Revolutionary Approaches” was organized and responsibly conducted and well received.
As might be expected there are differences of opinion. For example, the EU through the European Exascale Software Initiative (EESI) represents an intention of taking a strong leadership position internationally with little willingness to simply follow the US lead. Asia through its actions is exhibiting the same intent. The Japanese Kei machine not only boasts a quantum leap in performance but an original system structure. We can expect an all-Chinese (no US designed parts) HPC system soon. So a clear lesson is: the US may not own the Exascale space. The key question is: can it even influence its direction to maintain a strategic lead in its deployment and application for critical economic and security related interests? My opinion is: not if we continue to eschew essential innovation in favor of shortsighted near-term incrementalism.
As suggested by the IESP experience, there is a strong, perhaps even majority view, that Exascale can be achieved without having to break the old model. It is captured in the phrase: “MPI+X” which suggests maintaining the MPI framework while augmenting it with additional parallel semantics. The intent is to expose the necessary parallelism without losing continuity with past codes and practices. This can even work, to a degree, and there are examples of such methods from UT Knoxville, UT Austin (different ‘T’), and Barcelona Supercomputing Center, among others. There is a push to incorporate OpenMP as the unknown X. There are even those who suggest that there should be a Y as well (“MPI+X+Y”), perhaps CUDA or OpenCL. I would suggest that such a composite programming language be named “BABEL” both for its multiplicity of disparate language components and also for a construction of one language on top of another and another … In the quiet recesses of the debate, the question is whispered: “Why not just ‘X’?”
While sympathetic to the desire to retain unbroken continuity with the past for the future, I believe it is not only infeasible it is a dangerous disregard of a history that exhibits multiple paradigm shifts across its 7 decades. The myopic focus on MPI+X is distracting us from taking leadership in determining the next paradigm of HPC to establish a strong foundation for Exascale and beyond. If we continue down what I believe to be a dead-end path leading us into a cul-de-sac that will trap us, we will not be prepared for Exascale computing systems and applications and we will have left open the opportunity for others in the international community to own the space and dictate their new standards to us. We will have become a nation of domesticated users as we have in the economic product domain of consumer electronics. It is frankly deeply worrisome to me that we are abrogating our traditional role as conceptual leaders with the false hope that standards established decades ago would continue to suffice with a few patches here and there. The strategy of identifying “gaps” that don’t work and simply trying to fill them is naïve as it assumes that what we are doing now is mostly adequate for what has to occur 3 orders of magnitude from now in multiple dimensions as we approach Nano-scale technologies and Exascale systems.
But this is not just some polemic representing those who wish the answer to take some radically different form (and there are those). Rather it is representative of a recognition of the demonstrable unalterable changes that are a consequence of Moore’s Law both in hardware and software. We are all conscious of multicore but must be reminded that everything has to be parallel now to improve performance with improved technology; not just a few experts in national labs. Parallel computing is now central to the US economy and security. Yet, an increasing number of applications is failing to scale beyond a moderate number of cores. For them, Moore’s Law (or its performance consequence) is dead. And of course those applications written for sequential execution are even more constrained. While many real world problems do scale using conventional practices, their measured efficiencies are very low; often in single digit percentiles. With power a premiere concern along with cost, even achieving the status quo at Exascale would be prohibitively expensive and preclude its wide adoption or application. In this sense, incremental advances suggest a success that itself would be a failure.
Fortunately, in 2011 there is emerging a set of experimental evidence suggesting elements of a future roadmap to Exascale that is feasible, addresses key challenges, opens new opportunities for problems left behind, and can serve as a conceptual framework for the development of an Exascale software system and codesign with that of Exascale hardware system architecture and end applications. Some of this work is a direct product of the projects sponsored by UHPC now in its twilight. NSF and NSA are also contributors to early pathfinding work as is DOE. Conspicuously absent is NASA, but that is another sad story in HPC. Taking many forms, they all share some or all characteristics that are suggestive of a radical departure from the MPI+X strategy, setting a new trajectory to Exascale. While differences abound, many similarities include: 1) moving from static to dynamic resource management and task scheduling through advanced runtime systems to exploit runtime information and introspective closed loop optimizing control, using lightweight user-threads with rapid context switching, 2) message-driven computation to reduce the effects of latency by moving work to the data rather than always having to move the data to a fixed local of work, 3) a global address space that allows systems to directly and efficiently deal with global access, 4) powerful synchronization mechanisms for rich control of asynchronous cross-system operation, 5) work queue methods for employing heterogeneous components while continuing operation of multicore elements, and 6) in memory partial checkpointing and fault containment for resiliency. Work in these areas during 2011 have been conducted at UC Berkeley and LBNL, MIT and Intel, ETI, UIUC, LSU and Indiana University, SNL, PNNL, Stanford and Nvidia, and others. A host of contributors such as Sanjay Kale, John Shalf, Bill Dally, Rishi Kahn, Anant Agarwal, Hartmut Kaiser, Richard Murphy, Guang Gao, Kathy Yelick, Adolfy Hoisie, Andrew Lumsdaine, myself and a number of others have been exploring this promising space of opportunity.
Its now 2012 and enter DOE. Over the last year DOE appears to have taken up the challenge, the responsibility, and US leadership in the domain of Exascale computing. As this is clearly mandatory in meeting its own mission-critical role it is understandable that its senior management might do so. But its impact on US economic and defense-related needs will extend far beyond the ethereal domains of the national labs. Indeed, it may be the last chance the US has in reasserting its leadership and placing it again at the forefront of HPC influence and standards definition for its industry and commerce as well as defense agencies and national security. Over the last year, DOE has hosted a series of interrelated workshops on Exascale architecture and programming models among others. It has also initiated a couple of projects related to execution models. This resurgence is being driven through the management of Sonia Sachs, Lenore Mullin under the unique leadership of Bill Harrod and Dan Hitchcock, all at ASCR but also in close coordination and cooperation with NNSA. From this has come the new X-stack program for which an FOA has been issued for proposals due in February, 2012. The goal of the X-stack program is to derive and provide the results from necessary new research to establish the conceptual and experimental foundation for setting the path to the development of Exascale software. This program will work closely with earlier projects related to the codesign of DOE applications and systems for Exascale computing. But even as this pivotal program is being put in to place, new ones are being considered with RFPs also expected in 2012. If DOE is permitted to follow through on what must be a decade’s long concerted program, the US will provide the innovation, systems, tools and methodologies, programming languages, and most importantly the applications. Through this series of programs it will inform industry in their development of core and system architectures, operating systems, and programming models that they must ultimately support.
There cannot be success until all three pillars of US technical innovation work in harness towards the same broad goal and that means are found to coordinate and fund such innovative work. In my view, we have most of the resources we need in industry but not enough of the necessary will. Again, the majority view is to “stay the course.” In a brief talk given recently in South Africa by an Intel representative, the speaker said that Intel knows how to get to Exascale and the users will be programming their applications in the same way they have been. In my view, this is disingenuous. Intel is among many at the forefront of this work and their technical leaders understand the challenge if not the full solution. Cray has continued to find a path that combines solvency with sustained performance growth. Sadly, they have moved away from processor architecture. IBM is a mainstream contributor although its recent decision about Blue Waters was a disappointment for the HPC community. I think there is a critical need for small commercial concerns like PGI, Reservoir, and ETI to provide early environments and tool sets to give programmers adequate time to tool up and train in the new techniques. But much of the innovative ideas required will come from academic institutions and they must be integrated in to the process. This was not done well in the DARPA HPCC program, but must be a key element of any future program. The real value of running a national program from DOE is that it is responsible to a strong and challenging user base thus providing a strong dose of reality at all stages of the development of future systems. Of course I have thus far not mentioned NVIDIA as many readers might have expected. Certainly this company has tremendous exposure and much buzz with significant instances of dramatic success. But their products are not well positioned within a broader system concept and certainly hang at the wrong end of the I/O buss. Very bright people like Steve Scott, David Kirk, Steve Keckler, and Don Becker are all looking at the problem and one cannot but expect dramatic advances from them. But I am still waiting to see a holistic execution model-based approach before I can understand the long-term impact of this technology.
Ironically, even as the US must be in a world leadership position to ensure its ownership of the technologies that serve its welfare, it must also be a partner, on equal terms, with the international community. Only as a one-World community can HPC succeed in advancing the needs of civilization in the 21st century. No one has a monopoly on good ideas, financial resources, or skilled talent. Only with a goal towards synergy should future systems be devised across the globe. This was perhaps the great success of MPI and it is this legacy that must be retained in to the next HPC phase. China, Japan, Europe, and Russia all have a seminal role to play in setting direction, devising and integrating tools, and exploiting these for needed applications and performance.
There is an epilogue to be considered, that of legacy codes. There is no barrier to implementing current execution models and their representative APIs targeted towards the new dynamic adaptive modalities and intermediate forms of the future models and X-stack software. This is not to suggest that there is a silver bullet; that some magical transformation tool will take a sow’s ear and turn it into gold (perhaps an unfortunate metaphor). But if an MPI program on a new system cannot perform as well as it would have on a native system, then there is something clearly wrong with the new system. It is possible that the new strengths of such innovative systems may help even old codes by taking advantage of runtime information such that the performance will be somewhat improved. And, yes, clever people will be able to find ways to do some of the parallelism extraction automatically to improve operational characteristics even more.
In closing, 2012 is the year that the US and the world will choose a launch vector to the new undiscovered world of Exascale. If we have the courage to get it right, we will accelerate computational opportunity by perhaps years. If we fail to embrace innovation, we may have let such opportunities slip through our fingers and passed on the future opportunities to others.