With an ever-increasing number of hardware architectures with numerous processor and accelerator options to choose from, finding the right system architecture for future high performance computing systems is getting an increasingly complex task. It requires a forward-looking understanding of the users’ and applications’ evolving needs together with an understanding of the selections’ performance, power, and cost implications. To address this challenge, the Future Computing Group at Leibniz Supercomputing Centre (LRZ) works closely with Intel and MEGWARE to plan, install and operate a wide variety of latest technology hardware for evaluation. In the presentation, we will explain the configuration of the latest upgrade to the test cluster, which is based on 3rd Generation Intel® Xeon® Scalable processors (both Cooper Lake and Ice Lake based) and Intel® Optane™ Persistent Memory. In addition, the systems are already prepared for Intel® Xe GPGPUs and will be upgraded once they become available. We will go into further detail, how the workloads at LRZ are evolving and explain our reasoning for the system’s architecture based on the requirements of different use cases. By using DAOS, Intel-optimized software frameworks and the oneAPI toolkit, the system also makes an ideal case for a flexible approach to tackling both current HPC tasks and emerging workloads in the field of data analytics and machine learning. Only by joining forces and expertise, LRZ, MEGWARE and Intel together help to ensure that HPC centers continue to work at the forefront of technology while providing the best computing infrastructures for their scientific customers.
Josef Weidendorfer is the head of the Future Computing Group at the Leibniz Supercomputing Centre (LRZ). The group is developing smooth migration strategies for future HPC systems and evaluating novel technologies, including improvements of system level and workload analysis tools but also thinking about which parallel programming models best serve LRZ users for upcoming systems. Before, he was researcher and teaching assistant at Technical University of Munich in the field of computer architecture, parallel programming models, and performance analysis tools. He still runs lectures and lab courses at TUM.
Axel Auweter is CTO and general manager at MEGWARE. In his role, he oversees a team of hardware, firmware and software engineers working on improving MEGWARE's award-winning ColdCon® liquid cooling technology for energy efficient high-performance computing systems and MEGWARE's ClustWare® cluster management software. His academic background is in system monitoring, computer architecture, system level programming and operating systems.