Summary of Containerisation for High Performance Computing Systems

Summary Containerisation for High Performance Computing Systems arxiv.org

17,670 words - PDF document - View PDF document

One Line

The text discusses the widespread use of containerization in cloud and HPC systems, highlighting challenges such as library mismatches and security threats, and potential research opportunities including containerizing AI apps and improving performance and security.

Slides

Slide Presentation (12 slides)

Copy slides outline Copy embed code Download as Word

Containerisation for High Performance Computing Systems

Source: arxiv.org - PDF - 17,670 words - view

Introduction to Containerisation in HPC Systems

• Containerisation improves application deployment efficiency in both cloud and HPC environments

• Key differences between containerisation in cloud and HPC systems

• Potential research opportunities in containerising AI applications and improving performance and security

[Visual: Image comparing cloud and HPC systems]

Container Engines for HPC Systems

• Docker, Shifter, Charliecloud, Singularity, SARUS, and UDocker are popular container engines for HPC systems

• Features of HPC container engines, including non-root privileges and support for MPI and GPU

• Performance evaluations showing near-native performance in CPU, memory, network bandwidth, and GPU usage

[Visual: Graph comparing performance of different container engines]

Container Orchestration in HPC Systems

• HPC systems rely on workload managers like PBS, Spectrum LSF, and Slurm for container orchestration

• Cloud systems use platforms like Kubernetes and Docker Swarm for container orchestration

• Leveraging existing cloud orchestrators or HPC workload managers for container orchestration in HPC systems

[Visual: Diagram showing container orchestration in HPC and cloud systems]

Challenges in HPC Containerisation

• Compatibility issues due to library mismatches between container images and host systems

• Security concerns such as privilege escalation and denial-of-service attacks

• Performance degradation with GPUs and customised libraries

[Visual: Image illustrating challenges in HPC containerisation]

Research Opportunities in Containerisation for HPC Systems

• Containerising AI applications in HPC systems to leverage compute power and resources

• Private container registries within HPC centres for security and accessibility

• Integration of DevOps practices to improve reproducibility and streamline application deployment

[Visual: Image depicting AI applications in HPC containers]

Linux Namespaces for Security in HPC Environments

• Clear instructions on Linux namespaces for isolation and resource control

• Minimal set of namespaces enabled for general user groups, additional sets for advanced use cases

• Starting containers with appropriate namespaces enabled using workload managers

[Visual: Diagram illustrating Linux namespaces in HPC environments]

Integration of DevOps in HPC Systems

• Containerisation enables integration of DevOps workflows in HPC systems

• Singularity container integration with Jenkins for automated workflows

• Middleware systems as future research direction for enhancing DevOps capabilities in HPC environments

[Visual: Image showcasing DevOps integration in HPC systems]

Enabling Resource Elasticity in HPC Systems

• Integration of container orchestration platforms like Kubernetes with HPC workload managers

• Dynamic instantiation of containerized HPC workload managers for resource elasticity

• Creating single-tenant or multi-tenant environments using containerization

[Visual: Diagram demonstrating resource elasticity in HPC systems]

Minimal Operating Systems in HPC Environments

• Containerizing the HPC software stack to reduce maintenance efforts

• Simplifying system management and updates by maintaining a minimal OS kernel

• Quick replacement of containerized services without affecting the entire system during failures

[Visual: Image depicting minimal operating systems in HPC environments]

Bridging the Gap between HPC and Cloud Computing

• Containers as a solution for reducing performance gap and deployment complexity between HPC clusters and public clouds

• Moving containers between HPC and cloud to leverage hardware resources and relieve peak demands

• Benefits of containerization in improving application development and reducing complexity of HPC software stacks

[Visual: Image illustrating the connection between HPC and cloud computing]

Containerisation in HPC Systems - Opportunities and Challenges

• Containerisation offers benefits but presents challenges in HPC systems

• Research and engineering efforts required to fully implement container orchestrators in HPC clusters

• Importance of addressing challenges and finding innovative solutions for containerised applications in HPC environments

Key Points

Containerization is widely used in both cloud and high-performance computing (HPC) environments to improve application deployment efficiency.
There are differences in containerization between cloud and HPC systems, such as security levels and container size/portability.
HPC container engines like Singularity and Shifter have shown near-native performance in terms of CPU, memory, network bandwidth, and GPU usage.
HPC systems typically rely on workload managers like PBS and Slurm for container orchestration, while cloud systems use platforms like Kubernetes.
Challenges in HPC containerization include library mismatches, compatibility issues, security concerns, and performance degradation with GPUs.
Research opportunities include containerizing AI applications, integrating DevOps practices, enabling resource elasticity, and using minimal operating systems.
Containerization can bridge the gap between on-premise HPC clusters and public clouds, providing flexibility in resource usage.
Further research and engineering efforts are needed to fully implement container orchestrators in HPC clusters and address challenges in containerization.

Summaries

46 word summary

Containerization is widely used in cloud and HPC systems, with engines like Shifter, Charliecloud, Singularity, SARUS, and UDocker. Challenges include library mismatches, compatibility issues, security threats, and degraded GPU performance. Research opportunities include containerizing AI apps, integrating DevOps, enabling resource elasticity, and improving performance and security.

92 word summary

Containerization is widely used in cloud and HPC systems, with engines like Shifter, Charliecloud, Singularity, SARUS, and UDocker supporting non-root privileges, MPI, and GPU. HPC systems use workload managers like PBS, LSF, Grid Engine, OAR, and Slurm. Challenges include library mismatches, compatibility issues, security threats, and degraded GPU performance. Research opportunities include containerizing AI apps, integrating DevOps, enabling resource elasticity, and improving performance and security. Containerization optimizes resource usage and bridges the gap between on-premise HPC clusters and public clouds. Further research is needed to fully implement container orchestrators in HPC clusters.

166 word summary

Containerization is widely used in both cloud and high-performance computing (HPC) systems. Efforts have been made to enable container orchestration on HPC systems, with container engines like Shifter, Charliecloud, Singularity, SARUS, and UDocker offering features such as non-root privileges and support for MPI and GPU. Performance evaluations show that HPC container engines can achieve near-native performance. HPC systems rely on workload managers like PBS, Spectrum LSF, Grid Engine, OAR, and Slurm for container orchestration. Challenges in containerization for HPC systems include library mismatches, compatibility issues, kernel optimization limitations, security threats, and degraded performance with GPUs and accelerators. Research opportunities include containerizing AI applications, integrating DevOps practices, enabling resource elasticity, and improving performance, security, and usability of containerized applications in HPC environments. Containerization can bridge the gap between on-premise HPC clusters and public clouds, optimizing resource usage. Further research and engineering are needed to fully implement container orchestrators in HPC clusters. Containerization will continue to play a crucial role in application development and simplifying HPC software stacks.

428 word summary

Containerization is a widely used technology in both cloud and high-performance computing (HPC) systems. While there are differences in containerization and container orchestration between these two types of systems, efforts have been made to enable container orchestration on HPC systems. Several container engines designed for HPC systems, such as Shifter, Charliecloud, Singularity, SARUS, and UDocker, offer features like non-root privileges and support for MPI and GPU. Performance evaluations have shown that HPC container engines can achieve near-native performance in terms of CPU, memory, network bandwidth, and GPU usage.

HPC systems typically rely on workload managers like PBS, Spectrum LSF, Grid Engine, OAR, and Slurm for container orchestration. Cloud orchestrators like Kubernetes and Docker Swarm automate configuration, coordination, and management of cloud systems. Container orchestration strategies for HPC systems often leverage the mechanisms of existing cloud orchestrators or utilize the capabilities of HPC workload managers or software tools.

Challenges in containerization for HPC systems include library mismatches, compatibility issues between container engines and images, kernel optimization limitations, security threats, and performance degradation with GPUs and accelerators. Research opportunities include containerizing AI applications on HPC systems, integrating DevOps practices, enabling resource elasticity, moving towards minimal operating systems, and improving the performance, security, and usability of containerized applications in HPC environments.

To facilitate containerization of AI applications on HPC systems, up-to-date documentation, versatile base container images, and instructions on software package installation or updates are important. Container registries can provide pre-built container images for easy access and ensure container security. Linux namespaces offer isolation and resource control, and clear instructions on their availability should be provided by HPC centers.

DevOps integration in HPC environments can be achieved through containerization with tools like Jenkins. Middleware systems can bridge container building environments with HPC resource managers and schedulers, providing a portable way to enable DevOps in HPC centers. Improving the elasticity of HPC infrastructure can be done by introducing containerization to workload managers. Containers can also replace parts of the HPC software stack, reducing complexity and enabling quick replacement of services.

Containerization can help reduce the performance gap and deployment complexity between on-premise HPC clusters and public clouds. It allows for the movement of containers between HPC and cloud environments to optimize resource usage.

The paper concludes by emphasizing the need for further research and engineering to fully implement container orchestrators within HPC clusters. Containerization will continue to play a crucial role in application development, resource elasticity, and simplifying HPC software stacks.

References to various container technologies, container orchestration platforms, container engines, middleware systems, and HPC workload managers are provided in the paper.

1495 word summary

Containerisation has become widely utilized in both cloud and high-performance computing (HPC) environments. Containers offer improved efficiency in application deployment by encapsulating complex programs with their dependencies in isolated environments. However, there are differences in containerization between cloud and HPC systems. HPC systems often have higher security levels, which restrict users' ability to customize environments. As a result, containers on HPC systems include a heavy package of libraries, making their size larger and compromising portability. In contrast, cloud containers are smaller and more portable. Additionally, container orchestration, which facilitates the deployment and management of containers at scale, is more prevalent in cloud systems compared to HPC systems. However, there have been proposals to enable container orchestration on HPC systems. This paper provides a survey of containerization and its orchestration strategies on HPC systems, highlighting the differences with cloud systems. It also discusses challenges and envisions potential directions for research and engineering.

Containerization is a virtualization technology that provides separation of application execution environments. Containers utilize the dependencies in their host kernel, resulting in faster startup times compared to virtual machines (VMs). Docker is one of the most popular container engines, supporting multiple platforms and providing resource isolation and limitation through namespaces and cgroups. Other container engines designed for HPC systems include Shifter, Charliecloud, Singularity, SARUS, and UDocker. These engines are designed to meet the high-security requirements of HPC systems and offer features such as non-root privileges and support for MPI and GPU.

Performance evaluations of HPC container engines have shown that containers can achieve near-native performance in terms of CPU, memory, network bandwidth, and GPU usage. Singularity, in particular, has been found to provide close to native performance on CPU, memory, and network bandwidth, with a slight overhead on GPU usage. Shifter has demonstrated comparable CPU performance to bare metal, while Charliecloud has shown large overhead on Lustre's MDS and OSS due to its bare tree structure. SARUS has shown strong scaling capability on Cray XC systems with hybrid GPU and CPU nodes.

In terms of container orchestration, HPC systems typically rely on workload managers such as PBS, Spectrum LSF, Grid Engine, OAR, and Slurm. These workload managers allocate resources, schedule jobs, and enforce resource limits. Cloud orchestrators, on the other hand, automate configuration, coordination, and management of cloud systems. They include platforms like Kubernetes and Docker Swarm. Container orchestration strategies for HPC systems often leverage the mechanisms of existing cloud orchestrators or utilize the capabilities of HPC workload managers or software tools.

In conclusion, containerization has been widely adopted in both cloud and HPC systems. While there are differences in containerization and container orchestration between these two types of systems, efforts have been made to enable container orchestration on HPC systems. Performance evaluations have shown that HPC container engines can achieve near-native performance in various aspects. Future research and engineering efforts should focus on addressing the challenges specific to containerization and container orchestration in HPC systems.

Containerisation is being increasingly used in high-performance computing (HPC) systems. However, there are several challenges and open issues that need to be addressed. Compatibility issues arise due to library mismatches between container images and host systems, as well as compatibility issues between container engines and images. Standardisation efforts such as the Open Container Initiative (OCI) aim to address these challenges. Kernel optimisation is another area of concern, as containers are generally not allowed to install their own kernel modules on the host. Security is also a major consideration, with threats such as privilege escalation, denial-of-service attacks, and information leaks. Performance degradation can occur when using containers with GPUs and accelerators, as customised libraries may be required for optimal performance.

To overcome these challenges, several research and engineering opportunities have been identified. One area of focus is the containerisation of AI applications in HPC systems. Leveraging the compute power and resources of HPC clusters can greatly benefit AI model training. Private container registries within HPC centres can ensure container security and provide pre-built images accessible to users. Guidelines for Linux namespaces can help ensure security within HPC environments.

DevOps practices can also be integrated into HPC systems to improve reproducibility and streamline application deployment. Singularity containers can be integrated with DevOps tools such as Jenkins for automated workflows. Middleware systems that are flexible and easy to plugin or plugout new components can further enhance DevOps capabilities in HPC environments.

Resource elasticity is another important area of research, with the goal of enabling flexible usage of hardware resources in HPC systems. Integrating container orchestration platforms like Kubernetes with HPC workload managers can introduce resource elasticity to traditional batch scheduling systems.

Moving towards minimal operating systems (OS) can help reduce maintenance efforts in HPC environments. By maintaining a minimal OS kernel and containerising the rest of the HPC software stack, administrators can simplify system management and updates.

Overall, containerisation in HPC systems offers numerous benefits, but also presents several challenges. Ongoing research efforts are focused on addressing these challenges and finding innovative solutions to improve the performance, security, and usability of containerised applications in HPC environments.

Containerization is a valuable solution for deploying AI applications on high-performance computing (HPC) systems. Unlike traditional programming languages like C/C++, AI applications written in Python cannot be compiled into an executable file with all dependencies included. This poses a challenge for deploying AI applications on HPC infrastructures, which often rely on closed-source applications and have restricted user privileges and security restrictions. Containerization offers a way to customize execution environments while taking advantage of HPC hardware and optimized AI libraries. To facilitate containerization of AI applications on HPC systems, it is important to provide up-to-date documentation and tutorials, maintain versatile base container images, and give instructions on software package installation or updates.

A container registry is a useful repository for providing pre-built container images that can be easily accessed by users. It can be portable for deploying applications on cloud clusters and can ensure container security by signing and pulling images from trusted registries. To simplify usage, future work can enable HPC workload managers to boot default containers on compute nodes, matching the environments of user login nodes. This allows jobs to be started without user awareness of the presence of containers or additional user intervention.

Linux namespaces are used within an implementation to provide isolation and resource control. Clear instructions on the availability of namespaces should be provided by HPC centers, with a minimal set of namespaces enabled for general user groups. Additional sets of namespaces may be required for advanced use cases. Workload managers can start containers with appropriate namespaces enabled when users submit container jobs.

DevOps, which integrates development and operations, has been widely adopted in cloud computing but is not well suited for HPC environments. HPC-specific DevOps tools are needed to overcome the inflexibility and optimization challenges of HPC environments. Containerization can provision DevOps environments in HPC systems, enabling the integration of DevOps workflows. Singularity has been integrated with Jenkins, a popular automation platform, to bring continuous integration, delivery, and deployment practices into HPC workflows.

Middleware systems can bridge container building environments with HPC resource managers and schedulers. They perform job deployment, management, and data staging, and can be located on an HPC cluster or connected to it with secure authentication. Middleware systems provide a portable way to enable DevOps in HPC centers and can be a future research direction.

Improving the elasticity of HPC infrastructure is a major difference between HPC and cloud computing. Containerization can contribute to improving the elasticity of HPC infrastructure by introducing it to workload managers. Kubernetes has been used to instantiate containerized HPC workload managers dynamically, creating a single-tenant or multi-tenant environment.

Containers can partially substitute the current HPC software stack by using minimal operating system (OS) base images on compute nodes. This reduces the number of components in the kernel image and simplifies post-boot configurations. Containerized services can be quickly replaced without affecting the entire system when failures occur. Long-term research is needed to control the software stack and workloads that are partially native and partially containerized on HPC systems.

Containerization plays a vital role in reducing the performance gap and deployment complexity between on-premise HPC clusters and public clouds. With advancements in low-latency networks and accelerators like GPUs and TPUs, containers can be moved from HPC to cloud to temporarily relieve peak demands or from cloud to HPC to exploit powerful hardware resources.

The paper concludes by discussing the opportunities and challenges of containerization in HPC systems. It emphasizes the need for further research and engineering to fully implement container orchestrators within HPC clusters. Containerization will continue to play an essential role in application development, resource elasticity, and reducing the complexity of HPC software stacks. The authors acknowledge the funding received for their projects and express gratitude to Dr. Joseph Schuchart for proofreading the contents.

References: The paper includes references to various container technologies, container orchestration platforms, container engines, middleware systems, and HPC workload managers. It also references specific software frameworks like TensorFlow, Singularity

Raw indexed text (117,544 chars / 17,670 words / 2,500 lines)

arXiv:2212.08717v

2022 IEEE. Personal use of this material is permitted. Permission from

IEEE must be obtained for all other uses, in any current or future media,

including reprinting/republishing this material for advertising or promotional

purposes, creating new collective works, for resale or redistribution to servers

or lists, or reuse of any copyrighted component of this work in other works.

1JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

Containerisation for High Performance

Computing Systems: Survey and Prospects

Naweiluo Zhou, Huan Zhou, Dennis Hoppe

Abstract—Containers improve the efficiency in application deployment and thus have been widely utilised on Cloud and lately in High

Performance Computing (HPC) environments. Containers encapsulate complex programs with their dependencies in isolated

environments making applications more compatible and portable. Often HPC systems have higher security levels compared to Cloud

systems, which restrict users’ ability to customise environments. Therefore, containers on HPC need to include a heavy package of

libraries making their size relatively large. These libraries usually are specifically optimised for the hardware, which compromises

portability of containers. Per contra, a Cloud container has smaller volume and is more portable. Furthermore, containers would benefit

from orchestrators that facilitate deployment and management of containers at a large scale. Cloud systems in practice usually

incorporate sophisticated container orchestration mechanisms as opposed to HPC systems. Nevertheless, some solutions to enable

container orchestration on HPC systems have been proposed in state of the art. This paper gives a survey and taxonomy of efforts in

both containerisation and its orchestration strategies on HPC systems. It highlights differences thereof between Cloud and HPC. Lastly,

challenges are discussed and the potentials for research and engineering are envisioned.

Index Terms—HPC, Container, Orchestration, Resource Management, Job Scheduling, Cloud Computing, AI

I NTRODUCTION

ONTAINERS have been widely adopted on Cloud sys-

tems. Applications together with their dependencies

are encapsulated into containers [1], which can ensure

environment compatibility and enable users to move and

deploy programs easily among clusters. Containerisation

is a virtualisation technology [2]. Rather than creating an

entire operating system (called guest OS) on top of a host

OS as in a Virtual Machine (VM), containers only share the

host kernel, which makes containers more lightweight than

VMs. Containers on Cloud are often dedicated to run micro-

services [3] and one container mostly hosts one application

or a part of it.

High Performance Computing (HPC) systems are tra-

ditionally employed to perform large-scale financial, en-

gineering and scientific simulations [4] that demand low

latency (e.g. interconnect) and high throughput (e.g. the

number of jobs completed over a specific time). To satisfy

different user requirements, HPC systems normally provide

predefined modules with specific software versions that

users can switch by loading or unloading the modules with

the desired packages [5]. This approach requires assistance

of system administrators and therefore limits increasing

user demands for environment customisation. On a multi-

tenant environment as on HPC systems, especially HPC

production systems, installation of new software packages

on-demand by users is restricted, as it may alter the work-

ing environments of existing users and even raise security

risks. Module-enabled software environments are also in-

convenient for dynamic Artificial Intelligence (AI) software

stacks [6]. Big Data Analytics hosted on Cloud are compute-

intensive or data-intensive, mainly due to deployments

•

All authors are with High Performance Computing Center Stuttgart

(HLRS), University of Stuttgart, Stuttgart, Germany.

E-mail: [email protected], [email protected], [email protected]

of AI or Machine Learning (ML) applications, which de-

mand extremely fast knowledge extraction in order to make

rapid and accurate decisions. HPC-enabled AI can offer

optimisation of supply chains, complex logics, manufactur-

ing, simulation and underpin modelling to solve complex

problems [7]. Typically, AI applications have sophisticated

requirements of software stacks and configurations. Con-

tainerisation not only enables customised environments on

HPC systems, but also brings research reproducibility into

practice.

Containerised applications can become complex, e.g.

thousands of separate containers may be required in pro-

duction, and containers may require network isolation

among each other for security reasons. Sophisticated strate-

gies for container orchestration [8] have been developed on

Cloud or big-data clusters to meet such requirements. HPC

systems, per contra, lack features of efficiency in container

scheduling and management (e.g. load balancing and auto

container scaling), and often provide no integrated support

for environment provisioning (i.e. infrastructure, configura-

tions and dependencies).

There have been numerous studies on containerisation

and container orchestration on Cloud [2], [9], [10], [11], [12],

[13], [14], [15], however, there is no comprehensive survey

on these technologies and techniques for HPC systems

existing as of yet. This article:

•

Investigates state-of-the-art works in containerisa-

tion on HPC systems and underscores their differ-

ences with respect to the Cloud;

Introduces the representative orchestration frame-

works on both HPC and Cloud environments, and

highlights their feature differences;

Gathers the related studies in the integration of con-

tainer orchestration strategies on Cloud into HPCJOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

•

environments;

Discusses the challenges and envisions the potential

directions for research and engineering.

C ONCEPTS AND T ECHNOLOGIES FOR C ON -

TAINERISATION

The main differences between containerisation technologies

on Cloud and HPC systems are in terms of security and

the types of workloads. The HPC applications tend to

require more resources as to not only CPUs, but also the

amount of memory and network speed. HPC communities

have, therefore, developed sophisticated workload man-

agers to leverage hardware resources and optimise appli-

cation scheduling. Since the typical applications on Cloud

differ significantly from those in HPC centres with respect to

the sizes, execution time and requirements of the availability

of hardware resources [16], the management systems on

Cloud are evolved to include architectures different from

those on HPC systems.

Research and engineering on containerisation technolo-

gies and techniques for HPC systems can be classified into

two broad categories:

Container engines/runtimes;

Container orchestration.

In the first category, various architectures of container en-

gines have been developed which vary in usage of names-

paces (see Section 2.1), image formats and programming

languages. The research in the latter category is still in its

primitive stage, which will be discussed in Section 4.

2.1

Containerisation Concepts

Containerisation is an OS-level virtualisation technology

[17] that provides separation of application execution envi-

ronments. A container is a runnable instance of an image

that encapsulates a program together with its libraries,

data, configuration files, etc. [1] in an isolated environment,

hence it can ensure library compatibility and enables users

to move and deploy programs easily among clusters. A

container utilises the dependencies in its host kernel. The

host merely needs to start a new process that is isolated

from the host itself to boot a new container [18], thus making

container start-up time comparable to that of a native appli-

cation. In contrary, a traditional VM loads an entire guest

kernel (simulated OS) into memory, which can occupy giga-

bytes of storage space on the host and requires a significant

fraction of system resources to run. VMs are managed by

hypervisor which is also known as Virtual Machine Monitor

(VMM) that partitions and provisions VMs with hardware

resources (e.g. CPU and memory). The hypervisor gives the

hardware-level virtualisation [19], [20]. Fig. 1 highlights the

Apps

The rest of the paper is organised as follows. First, Sec-

tion 2 introduces the background on containerisation tech-

nologies and techniques. Key technologies of state-of-the-

art container engines (Section 3) and orchestration strategies

(Section 4) are presented, and the feature differences thereof

between HPC and Cloud systems are discussed. Next, Sec-

tion 5 describes research challenges and the vision. Lastly,

Section 6 concludes this paper.

Virtualisation layer

Apps Apps

Guest OS Guest OS

Virtualisation layer

Host OS Host OS

Hardware Hardware

Container VM

Fig. 1. Structure comparison of VMs and containers. On the VM side, the

virtualisation layer often appears to be hypervisor while on the container

side it is the container runtimes.

architecture distinction of VMs and containers. It is worth

noting that containers can also run inside VMs [21]. Besides

portability, containers also enable reproducibility, i.e. once a

program has been defined inside the container, its included

working environment remains unchanged regardless of its

running occurrences. Nevertheless, the shared kernel strat-

egy presents an obvious pitfall: a Windows containerised

application cannot execute on Unix kernels. Obviously, this

should not become an impediment to its usage as Unix-like

OS are often the preference for HPC systems.

HPC applications are often highly optimised for proces-

sor architectures, interconnects, accelerators and other hard-

ware aspects. Containerised applications, therefore, need

to compromise between performance and portability. The

studies have shown that containers can often achieve near-

native performance [18], [22], [23], [24], [25], [26], [27] (see

Section 3.2).

Linux has several namespaces [28] that isolate various

kernel resources: mount (file system tree and mounts), PID

(process ID), UTS (hostname and domain name), network

(e.g. network devices, ports, routing tables and firewall

rules), IPC (inter-process communication resources) and

user. The last namespace is an unprivileged namespace

that grants the unprivileged process access to traditional

privileged functionalities under a safe context. More specif-

ically, the user namespace allows to map user ID (UID)

and group ID (GID) from hosts to containers, meaning

that a user having UID 0 (root) inside a container can be

mapped to a non-root ID (e.g. 100000) outside the container.

Cgroups (Control Groups) is another namespace that is

targeted to limit, isolate and measure resource consumption

of processes. Cgroups is useful for a multi-tenant setting as

excess resource consumption of certain users will be only

adverse to themselves. One application of Linux names-

paces is the implementation of containers, e.g. Docker, the

most widely-used container engine, uses namespaces to pro-

vide the isolated workspace that is called container. When a

container executes, Docker creates a set of namespaces for

that container.

2.2

Docker

There are multiple techniques that realise the concept of

containers. Docker is among the most popular ones [27].JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

After its appearance in 2013, various container solutions

aimed for HPC have emerged [22]. Docker, initially based

on LXC [29], is a container engine that supports multiple

platforms, i.e. Linux, OSX and Windows. A Docker container

image is composed of a readable/writable layer above a

series of read-only layers. A new writable layer is added

to the underlying layers when a new Docker container is

created. All changes that are made to the running container,

such as writing new files, modifying or deleting existing

files, are written to this thin writable container layer. Docker

adopts namespaces including Cgroups to provide resource

isolation and resource limitation, respectively. Table 1 high-

lights the usage of namespaces with respect to Docker and

a list of container engines targeted for HPC environments.

Docker provides network isolation and communication

by creating three types of networks: host, bridge and none.

The bridge network is the default Docker network. The

Docker engine creates a subset or gateway to the bridged

network. This software bridge allows Docker containers to

communicate within the same bridged network; meanwhile,

isolates the containers from a different bridged network.

Containers in the same host can communicate via the

default network by the host IP address. To communicate

with the containers located on a different host, the host

needs to allocate ports on its IP address. Managing ports

brings overhead which can intensify at scale. Dynamically

managing ports can solve this issue which is better handled

by orchestration platforms as introduced in Section 4.2.

Docker is widely adopted in Cloud where users often

have root privileges. The root privilege is required to ex-

ecute the Docker application and its Daemon process that

provides the essential services. Originally running Docker

with root permission brings some advantages to Cloud

users. For instance, users can run their applications and

alternative security modules to provide separation among

different allocations [30]; users can also mount host filesys-

tems to their containers. Root privilege can cause security

issues. Therefore, the latest updates of Docker engine start

to support rootless daemon and enable users to execute

containers without root. Nevertheless, other security con-

cerns still persist. For instance, usage of Unix socket can

be changed to TCP socket which will grant an attacker a

remote control to execute any containers in the privileged

mode. Additionally, rootless Docker does not run out of box,

system administrators need to carefully set the namespaces

of hosts to separate resources and user groups in order to

guarantee security. Hence HPC centres that typically have

high security requirements are still reluctant to enable the

Docker support on their systems.

3.1

State-of-the-Art Container Engines and Runtimes

A list of representative container engines and runtimes

for HPC systems is given in this section. They differ in

functional extent and implementation, however, also hold

some similarities. Table 1 and Table 2 summarise the feature

differences and similarities between Docker and a list of

main HPC container engines.

3.1.1

This section first reviews the state-of-the-art container en-

gines/runtimes designed for HPC systems and compares

the major differences with the mainstream Cloud container

engine, i.e. Docker. Next, Section 3.2 shows the performance

evaluation of the reviewed HPC container engines.

Shifter

Shifter [31] is a prototypical implementation of container

engine for HPC developed by NERSC. It utilises Docker

for image building workflow. Once an image is built, users

can submit it to an unprivileged gateway which injects

configurations and binaries, flattens it to an ext4 file system

image, and then compresses to squashfs images that are

copied to a parallel filesystem on the nodes. In this way,

Shifter insulates the network filesystem from image meta-

data traffic. Root permission of Docker is naturally deprived

from Shifter that only grant user-level privileges. Existing

directories can be also mounted inside Shifter image by

passing certain flags.

As an HPC container engine, Shifter supports MPICH

that is an implementation of the Message Passing Interface

(MPI) [32], [33] standard. To enable accelerator supports

such as GPU without compromising container portability,

Shifter runtime swaps the built-in GPU driver of a Shifter

container with an ABI (Application Binary Interface) com-

patible version at the container start-up time.

3.1.2

Charliecloud

Charliecloud [28] runs containers without privileged opera-

tions or daemons. Charlicloud can convert a Docker image

into a tar file and unpacks it on the HPC nodes. Installation

of Charliecloud does not require root permission. Such non-

intrusive mechanisms are ideal for HPC systems. Char-

liecloud is considered to be secure against shenanigans, such

as chroot escape, bypass of file and directory permission,

privileged ports bound to all host IP addresses or UID set to

an unmapped UID [15].

MPI is supported by Charliecloud. Injecting host files

into images is used by Charliecloud to solve library com-

patibility issues, such as GPU libraries that may be tied to

specific kernel versions.

3.1.3

Singularity

Singularity is the most-widely used HPC container engine

in academia and industry. Singularity [34] was specifically

designed from the outset for HPC systems. Contrasting with

Docker, it gives the following merits [23]:

•

3 C ONTAINER E NGINES AND R UNTIMES FOR HPC

S YSTEMS

Running with user privileges and no daemon pro-

cess. Only user privileges are required to execute Sin-

gularity applications. Acquisition of root permission

is only necessary when users want to build or rebuild

images, which can be performed on their own work-

ing computers. Unprivileged users can also build an

image from a definition file with a few restrictions by

”fake root” in Singularity, however, some methods

requiring to create block devices (e.g. /dev/null)

may not always work correctly in this way;JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

TABLE 1

Linux namespace supports for HPC-targeted container engines (Section 3) and Docker in the year of 2022.

Note that without certain namespaces, containers may still operate however with restricted functionalities.

Namespaces Singularity Shifter Charliecloud UDocker

mount

PID

UTS

network

IPC

user

Cgroup 3

3 3

7 3

SARUS

Docker

TABLE 2

Comparison of Docker with the list of container engines for HPC systems.

WLM: workload manager. Orchestration is described in Section 4.1 and Section 4.2.

Container engines Docker Singularity .

Shifter Charliecloud SARUS UDocker

Usage of namespaces 3 3 3 3 3 3

MPI support 3 3 3 3 3 3

GPU support 3 3 3 3 3 3

Network support Pluggable net-

work driver (e.g.

bridge) Host network Host network Host network Host network Host network

Image format Layers of files squashfs layers Filesystem bun-

dle Filesystem bun-

dle layers of files

Access to host filesystems 3 Single

image

file, Filesystem

bundle

3 3 3 3 3

Escalation of permission 3 7 7 7 7 7

Privileged daemon* 3 7 7 7 7 7

Orchestration

Programming languages Docker Swarm

Go HPC WLM

C HPC WLM

C++ HPC WLM

Python

*Starting from v19.03, Docker also provides options to change its daemon to be rootless.

•

Seamless integration with HPC systems. Singularity

natively supports GPU, MPI and InfiniBand [16]. No

additional network configurations are expected in

contrast with Docker containers;

Portable via a single image file (SIF format). On the

contrary, Docker is built up on top of layers of files.

Two approaches are often used to execute MPI applica-

tions using Singularity, i.e. hybrid model and bind model.

The former compiles MPI binaries, libraries and the MPI

application into a Singularity image. The latter binds the

container on a host location where the container utilises the

MPI libraries and binaries on the host. The latter model has a

smaller image size since it does not include compiled MPI li-

braries and binaries in the image. Utilising the host libraries

is also beneficial to application performance, however, the

version of MPI implementation that is used to compile the

application inside the container must be compatible with

the version available on the host. The hybrid model is

recommended, as mounting storage volumes on the host

often require privileged operations.

Most Docker images can be converted to singularity im-

ages directly via simple command lines (e.g. docker save,

singularity build). Singularity has quickly become the

ipso facto standard container engine for HPC systems.

3.1.4

SARUS

SARUS [35] is another container engine targeted for HPC

systems. SARUS relies on runc 1 to instantiate containers.

runc is a CLI (Command-Line Interface) tool for spawning

and running containers according to the OCI (Open Con-

tainer Initiative) specification. Different from the aforemen-

tioned engines, the internal structure of SARUS is based on

the OCI standard (see Section 5.1.2). As shown in Fig. 2,

the CLI component takes the command lines which either

invoke the image manager component or the runtime com-

ponent. The latter instantiates and executes containers by

creating a bundle that comprises a root filesystem directory

and a JSON configuration file. The runtime component then

calls runc that will spawn the container processes. It is

worth noting that functionalities of SARUS can be extended

by calling customised OCI hooks, e.g. MPI hook.

3.1.5

UDocker

UDocker 2 is a Python wrapper for the Docker container,

which executes only simple Docker containers in user space

without the acquisition of root privileges. UDocker provides

1. https://github.com/opencontainers/runc

2. https://github.com/indigo-dc/udockerJOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

SARUS

runtime

RootFS

system conﬁg

CLI

base image

writable layer

custom mounts

user

conﬁg.json

environment

uid/gid

selected hooks

capabilities

process args

hostname

namespace isolation

image

manager

docker registry

parallel

ﬁlesystem

OCI runtime

(e.g.runc)

container

process

OCI hooks

Fig. 2. The internal structure of SARUS. OCI hooks include MPI hook.

a Docker-like CLI and only supports a subset of Docker

commands, i.e. search, pull, import, export, load,

save, create and run. It is worth noting that UDocker

neither makes use of Docker nor requires its presence on the

host. It executes containers by simply providing a chroot-

like environment over the extracted container.

3.1.6 Other HPC Container Engines

More and more HPC container engines are being developed,

this section gives an overview of some that are targeted for

special use cases.

Podman [36] makes use of the user namespace to ex-

ecute containers without privilege escalation. A Podman

container image comprises layers of read-write files as

Docker. It adopts the same runtime runc as in SARUS and

Docker. The runtime crun, which is faster than runc, is

also supported. A notable feature of Podman is as its name

denotes: the concept of pod. A pod groups a set of containers

that collectively implements a complex application to share

namespaces and simplify communication. This feature en-

ables the convergence with the Kubernetes [37] environment

(Section 4.2.1), however, requires advanced kernel features

(e.g. version 2 Cgroups and user-space FUSE). These kernel

features are not yet compatible with network filesystems

to make full use of the rootless capabilities of Podman

and consequently restrains its usage from HPC production

systems [38] .

Similar to UDocker, Socker [39] is a simple secure wrap-

per to run Docker in HPC environments, more specifi-

cally SLURM (Section 4.1.2). It does not support the user

namespace, however, it takes the resource limits imposed by

SLURM.

Enroot 3 from NVIDIA can be considered as an enhanced

unprivileged chroot. It removes much of the isolation that

the other container engines normally provide but preserves

filesystem separation. Enroot makes use of user and mount

namespaces.

3.2 Performance Evaluation for HPC Container En-

gines

This section only selects the representative works as given

in Table 3, rather than exhausting the literature, to show the

3. https://github.com/NVIDIA/enroot

performance of containers that are specifically targeted for

HPC systems in terms of CPU, memory, disk (I/O), network

and GPU. Table 4 lists the benchmarks utilised in these

work. Overall, the container startup latency can be high

on the Cloud. This startup overhead is caused by building

containers from multiple image layers, setting read-write

layers and monitoring containers [27]. An HPC container

is composed of a single image or directory (with exception

to Podman) and monitoring is performed by HPC systems.

The work in [24], utilising the IMB [42] benchmark suite

and HPCG [43] benchmarks, proved that little overhead

of network bandwidth and CPU computing overhead is

caused by Singularity when dynamically linking vendor

MPI libraries in order to efficiently leverage advanced

hardware resources. With the Cray MPI library, Singularity

container achieved 99.4% efficiency of native bandwidth

on a Cray XC [44] HPC testbed when running the IMB

benchmark suite. However, the efficacy drastically drops to

39.5% with Intel MPI. Execution time evaluated with the

HPCG benchmarks, indicated that the performance penalty

caused by Singularity is negligible with Cray MPI, though

the overhead can reach 18.1% with Intel MPI. The perfor-

mance degradation with Intel MPI is mostly because of the

vendor-tuned MPI library which does not leverage hard-

ware resources from a different vendor, e.g. interconnect.

Hu et al. [23] evaluated the Singularity performance in

terms of CPU capacity, memory, network bandwidth and

GPU with Linpack benchmarks [45] and four typical HPC

applications (i.e. NAMD [46], VASP [47], WRF [48] and

AMBER [49]). Singularity provides close to native perfor-

mance on CPU, memory and network bandwidth. A slight

overhead (4.135%) is shown on NVIDIA GPU.

Muscianisi et al. [41] illustrated the performance impact

of Singularity with the increasing number of GPU nodes.

The evaluation was carried out on CINECA’s GALILEO sys-

tems with TensorFlow [50] applications. The results again

demonstrated that the container environments caused neg-

ligible performance overhead.

The work by Hale et al. [18] presented the CPU perfor-

mance of Shifter with HPGMG-FE (MPI implementation)

benchmarks [51] on Cray XC30 (192 cores, 24 cores per

compute node) where the performance margin between

Shifter container and bare metal is unnoticeable. Compar-

ison is also given for MPI with implementation in C++ and

Python using a custom benchmark. The authors observed

that it could take over 30 minutes to import the Python

modules when running natively with 1,000 processes. Each

process of a Python application imports modules from the

filesystem on each node. Accesses to many small files on

an HPC filesystem using many processes can be extremely

slow comparing with the accesses to a few large files.

The containerised benchmark has already included all the

modules in its image that is mounted as a single file on each

node, therefore, Shifter container outperforms the native

execution in this case. Bahls [40] also evaluated the execu-

tion time of Shifter on Cray XC and Cray XE/XK systems

exploiting Cray HSN (High Performance Network). Their

results showed that Shifter gave comparable performance

to bare metal.

The study in [22] compared the performance of Shifter

and Singularity against bare metal in terms of computationJOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

TABLE 3

Overview of the related work on container performance evaluation in terms of CPU, memory, disk, network and GPU on HPC systems.

Metrics Performance Overhead Work CPU time Often negligible. Large

overhead

caused

vendor-tuned

libraries

and dynamically linking

libraries; better performance

in many-process Python

programs [18], [22], [23], [24], [27], [40] Singularity, Shifter, Char-

liecloud, Podman, SARUS Cray XC, Cray XE/XK

Memory usage Negligible [23], [27] Charliecloud, -

Disk usage Negligible [27] Charliecloud, -

Network Negligible or slight over-

head. Overhead happens at

start-up time because of the

single file/bundle structure [23], [24], [27] Singularity,

Podman

Singularity,

Podman

Singularity,

Podman Charliecloud, Cray XC

GPU A slight overhead [23], [41]

TABLE 4

The list of HPC benchmarks mentioned in Section 3.2.

Benchmarks Description

IMB

HPCG Intel MPI Benchmark

A complement to the LINPACK

Linpack Measure floating-point computing power

NAMD Simulation for molecular dynamics

VASP Atomic scale materials modelling

WRF Weather Research and Forecasting Model

AMBER Assisted Model Building with Energy Refinement

HPGMG-FE High-Performance Geometric Multigrid, Finite Ele-

ment

time using two biological use cases on three types of super-

computer CPU architectures: Intel Skylake, IBM Power9 and

Arm-v8. Containerised applications can scale at the same

rate as the bare-metal counterparts. However, the authors

also observed that with a small number of MPI ranks,

containers should be built as generic as possible, per contra,

when it comes to a large number of cores, containers need

to be tuned for the hosts.

Without performance comparison with bare-metal appli-

cations, the work in [27] studied the CPU, memory, network

and I/O performance of Charliecloud, Podman and Singu-

larity. All the containers behave similarly with respect to the

CPU and memory usage. Charliecloud and Singularity have

comparable I/O performance. Charliecloud incurs large

overhead on Lustre’s MDS (Metadata Server) and OSS (Ob-

ject Storage Server) due to its bare tree structure. Comparing

with the structures of shared layers (as in Docker), this

structure needs to access a large number of individual files

from the image tree from Lustre. Consequently, it causes

network overhead when data is transmitted from the client

node over the network at container start-up time. Similarly,

as Singularity is stored as a single file on Lustre, a large

amount of data needs to be loaded at starting point resulting

in a data transmission spike on network.

Container engines

Singularity

HPC vendors

IBM

SARUS has shown strong scaling capability on Cray

XC systems with hybrid GPU and CPU nodes [35]. The

performance difference between SARUS and bare metal is

less than 0.5% up to 8 nodes and 6.2% up to 256 nodes. No

specific metrics are given in terms of GPU, though GPU has

been used as accelerators.

3.3

Section Highlights

Containers are introduced to HPC systems, as they enable

environment customisation for users, which offers the solu-

tions to application compatibility issues. This is particularly

important on HPC systems that are typically inflexible for

environment modifications. Notably, HPC container engines

are designed to meet the high-security requirements on HPC

systems. Multiple prevailing engines have been described in

this section, they share some common features:

•

Non-root privileges;

Often can convert Docker images to their own image

formats;

Supports of MPI that are typical HPC applications;

Use host network rather than pluggable network

drivers.

Yet differences exist in their image formats. Layered image

format is seen in Docker (UDocker wraps Docker image

layers to a local directory), which is executed by pulling

the image layers that have not been previously downloaded

on the host. HPC container images are stored in a single

directory or file which can be transferred to the compute

nodes easily avoiding the pulling operations that require

network access. HPC container engines show various ways

to incorporate well-tuned libraries targeting for the hosts

in order to achieve optimised performance, e.g. OCI hooks

(SARUS), injecting host files into images (Charliecloud).

Section 3.2 aims to give examples that can provide

general advices on how to build the container images to

maximise performance. Clearly, performance loss can occur

in certain cases which are summarised in the second column

of Table 3.JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

compute nodes

C ONTAINER O RCHESTRATION

Orchestration under the context herein means automated

configuration, coordination and management of Cloud or

HPC systems. In theory, HPC workload manager can be

also addressed as orchestrator, however, this article takes

the former term as it is the custom terminology that has

been long-used and widely understood in the HPC area.

The driving factors that push HPC workload managers

and Cloud orchestrators to be developed in different di-

rections can be multiple. This will be discussed at the end

of this section (Section 4.4). However, first it is important

to understand the mechanisms of HPC workload managers

(Section 4.1) and Cloud orchestrators (Section 4.2). Mostly,

container orchestration for HPC systems either relies on the

orchestration strategies of the existing Cloud orchestrators

or exploits the mechanisms of current HPC workload man-

agers or software tools. This point will be depicted in Section

4.3.

head node

4.1.1 PBS

PBS stands for Portable Batch System which includes three

versions: OpenPBS, PBS Pro and TORQUE. OpenPBS is

open-source and TORQUE is a fork of OpenPBS. PBS Pro is

dual-licensed under an open-source and commercial license.

The structure of a TORQUE-managed cluster consists of a

head node and many compute nodes as illustrated in Fig. 3

where only three compute nodes are shown. The head node

(coloured in blue in Fig. 3) controls the entire TORQUE

system. A pbs server daemon and a job scheduler daemon

are located on the head node. The batch job is submitted to

the head node (in some cases, the job is first submitted to a

list that records the configured compute nodes is maintained

on the head node. The architecture of this kind as shown

in Fig 3 represents the fundamental cluster structure of

main-stream HPC workload managers. The procedure of job

submission on TORQUE is briefly described as follows:

mother superior

pbs_mom sister MOM

job submission

(qsub)

Fig. 3. TORQUE structure. pbs server , scheduler and pbs mom are the

daemons running on the nodes. Mother Superior is the first node on the

node list (on step4).

Workload Managers for HPC Systems

Cloud aims to exploit economy of scale by consolidating

applications into the same hardware [16] and the hardware

resources can be easily extended based on user demands. In

contrast, HPC centres have large-scale hardware resources

available and reserve computing resources exclusively for

users. Table 5 underscores the main differences between

HPC workload managers and Cloud orchestrators. A typical

HPC system is managed by a workload manager. A workload

manager comprises a resource manager and a job scheduler.

A resource manager [52] allocates resources (e.g. CPU and

memory), schedules jobs and guarantees no interference

from other user processes. A job scheduler determines the

job priorities, enforces resource limits and dispatches jobs to

available nodes [53].

HPC workload managers incorporate a big family, such

as PBS [54], Spectrum LSF [55], Grid Engine [56], OAR

[57] and Slurm [58]. Slurm and PBS are two main-stream

workload managers. The workload managers shares some

common features: a centralised scheduling system, a queu-

ing system and static resource management mechanisms,

which will be detailed in this section.

pbs_mom

pbs_server

scheduler

4.1

The job is submitted to the head node by the com-

mand qsub. A job is normally written in the format

of a PBS script. A job ID is returned to the user as

the standard output of qsub.

The job record, which incorporates a job ID and

the job attributes, is generated and passed to

pbs server .

pbs server transfers the job record to the job sched-

uler daemon. The job scheduler daemon adds the

job into a job queue and applies a scheduling algo-

rithm to it (e.g. FIFO: First In First Out) which deter-

mines the job priority and its resource assignment.

When the scheduler finds the list of nodes for the

job, it returns the job information to pbs server . The

first node on this list becomes the Mother superior

and the rest are called sister MOMs or sister nodes.

pbs server allocates the resources and passes the

job control as well as execution information to the

pbs mom daemon installed on the mom superior

node instructing to launch the job on the assigned

compute nodes.

The pbs mom daemons on the compute nodes man-

age the execution of jobs and monitor resource

usage. pbs mom will capture all the outputs and

direct them to stdout and stderr which are written

into the output and error files and are copied to

the designated location when the job completes suc-

cessfully. The job status (completed or terminated)

will be passed to pbs server by pbs mom . The job

information will be updated.

In TORQUE, nodes are partitioned into different groups

called queues . In each queue, the administrator sets limits

for resources such as walltime and job size. This feature can

be useful for job scheduling in a large HPC cluster where

nodes are heterogeneous or certain nodes are reserved for

special users. This feature is commonly seen in HPC work-

load managers.

TORQUE has a default scheduler FIFO, and is often

integrated with a more sophisticated job scheduler, such as

Maui [59]. Maui is an open source job scheduler that pro-

vides advanced features such as dynamic job prioritisation,

configurable parameters, extensive fair share capabilitiesJOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

TABLE 5

Comparison of HPC workload managers (Section 4.1) and cloud orchestrators (Section 4.2).

HPC workload manager Cloud orchestrator

Deployment Batch queue (queueing time from seconds to

days) Often immediate

Workload type Binary Container, pod

Supports of Parallel and Array Jobs Both Array*

Resource unit

Resource elasticity Bare-metal nodes

No Pods, VM nodes

Yes

Application execution length Long duration & Run to completion Continuously running or short duration †

Application specifics Distributed memory jobs (e.g. MPI) Often micro-services

DevOps environment provision No Yes

API supports No (or very weak) Yes

Job scheduling Backfilling On-demand scheduling

Centralised scheduling system Yes Not always

Job submission scripts batch scripts Declarative files, typically yaml scripts

Checkpointing Yes No. Containers are relaunched upon failure

Support of multiple resource managers No Often yes

Exceptions: *Mesos can support parallel jobs (Section 4.2.3); † YARN targets for long-running batch jobs (Section 4.2.3)

and backfill scheduling. Maui functions in an iterative man-

ner like most job schedulers. It starts a new iteration when

one of the following conditions is met: (1) a job or resource

state alters; (2) a reservation boundary event occurs; (3) an

external command to resume scheduling is issued; (4) a

configuration timer expires. In each iteration, Maui follows

the below steps [60]:

Obtain resource records from TORQUE;

Fetch workload information from TORQUE;

Update statistics;

Refresh reservations;

Select jobs that are eligible for priority scheduling;

Prioritise eligible jobs;

Schedule jobs by priority and create reservations;

Backfill jobs.

Despite an abundance of algorithms, only a few scheduling

strategies are practically in use by job schedulers. Backfilling

scheduling [61] allows jobs to take the reserved job slots

if this action does not delay the start of other jobs having

reserved the resources, thus allowing large parallel jobs to

execute and avoiding resource underutilisation. Differently,

Gang scheduling [62] attempts to take care of the situations

when the runtime of a job is unknown, allowing smaller

jobs to get fairer access to the resources. Both scheduling

strategies are also seen in SLURM and backfilling can be

also found in LSF.

4.1.2 SLURM

The structure of a SLURM (Simple Linux Utility for Re-

source Management) [58] managed cluster is composed of

one or two SLURM servers and many compute nodes. Its

procedure of job submission is similar to that of TORQUE.

Fig. 4 illustrates the structure of SLURM. Its server hosts

the slurmctld daemon which is responsible for cluster

resource and job management. SLURM servers and the

corresponding slurmctld daemons can be deployed in an

controller daemons

SLURM commands

slurmctld

(primary)

scontrol

sinfo

slurmctld

(backup)

slurmctld

(optional)

squeue

sancel

other

clusters

database

sacct

srun

slurmd

...

slurmd

compute node daemons

Fig. 4. SLURM structure.

active/passive mode in order to provide services of high

reliability for computing clusters. Each compute node hosts

one instance of the slurmd daemon, which is responsible

for job staging and execution. There are additional daemons,

e.g. slurmdbd which allows to collect and record account-

ing information for multiple SLURM-managed clusters and

slurmrestd that can be used to interact with SLURM

through a REST API (RESTful Application Programming

Interface). The SLURM resource list is held as a part of the

slurm.conf file located on SLURM server nodes, which

contains a list of nodes including features (e.g. CPU speed

and model, amount of memory) and configured partitions

(named queue in PBS) including partition names, list of

associated nodes and job priority.

Both PBS and SLURM have little (if at all) dedicated sup-

ports for container workloads. Containers are only sched-JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

LSF

•

session scheduler

(optional)

resource

management

license scheduler

(optional)

command line

interface

•

application

API

platform

application center

Fig. 5. Spectrum LSF structure.

uled as conventional HPC workloads, e.g lacking of load-

balancing supports.

4.1.3 Spectrum LSF

IBM platform Load Sharing Facility (LSF), targeted for enter-

prises, is designed for distributed HPC deployments. LSF is

based on the Utopia job scheduler [55] developed at the Uni-

versity of Toronto. Its Session Scheduler runs and manages

short-duration batch jobs, which enables users to submit

multiple tasks as a single LSF job, consequently reduces

the number of job scheduling decisions. Session Scheduler

can efficiently share resources regardless of job execution

time and can make thousands of scheduling decisions per

second. These capabilities create a focus on throughput

which is often critical for HPC workloads. Fig. 5 illustrates

the structure of LSF. Its license scheduler allows to make

policies that control the way software licenses are shared

among users within an organisation. Jobs are submitted via

the command line interface, API or IBM platform applica-

tion centre. Job submission carries similar procedure as in

TORQUE.

LSF supports container workloads: Docker, Singularity

and Shifter. LSF configures container runtime control in the

application profile 4 that is managed by the system adminis-

trator. Users do not need to consider which containers are

used for their jobs, instead only need to submit their jobs

to the application profile and LSF automatically manages

the container runtime control. Section 4.3.3 elaborates this

feature in more details.

4.2

Orchestration Frameworks on Cloud

Cloud clusters often include orchestration mechanisms

to coordinate tasks and hardware resources. Cloud has

evolved mature orchestrators to manage containers effi-

ciently. Container orchestrators can offer [11], [15], [37]:

•

Resource limit control. Reserve a specific amount of

CPUs and memory for a container, which restrains

interference from other containers and provides in-

formation for scheduling decisions;

Scheduling. It determines the policies that optimise

the placement of containers on nodes;

Load balancing. It distributes workloads among con-

tainer instances;

Health check. It verifies if a faulty container needs to

be destroyed or replaced;

4. LSF application profile: it is used to refine queue-level settings, or

to exclude some jobs from queue-level parameters.

Fault tolerance. It allows to maintain a desired num-

ber of containers;

Auto-scaling. It automatically adds and removes

containers.

Additionally, a container orchestrator should also sim-

plify networking, enable service discovery and support

continuous deployment [63].

4.2.1

Kubernetes

Kubernetes originally developed by Google is among the

most popular open-source container orchestrators, which

has a rapidly growing community and ecosystem with

numerous platforms being developed upon it. The archi-

tecture of Kubernetes comprises a master node and a set

of worker nodes. Kubernetes runs containers inside pods

that are scheduled to run either on master or worker nodes.

A pod can include one or multiple containers. Kubernetes

provides its services via deployments that are created by

submission of yaml files. Inside a yaml file, users can specify

services and computation to perform on the cluster. A user

deployment can be performed either on the master node or

the worker nodes.

Kubernetes is based on a highly modular architecture

which abstracts the underlying infrastructure and allows

internal customisation, such as the deployment of software-

defined networks or storage solutions. It also supports

various big-data frameworks, such as Hadoop MapReduce

[64], Spark [65] and Kafka [66]. Kubernetes incorporates a

powerful set of tools to control the life cycle of applications,

e.g. parameterised redeployment in case of failures and state

management. Furthermore, it supports software-defined in-

frastructures 5 [67] and resource disaggregation [68] by lever-

aging container-based deployments and particular drivers

(e.g. Container Runtime Interface driver, Container Storage

Interface driver and Container Network Interface driver)

based on standardised interfaces. These interfaces enable

the definition of abstractions for fine-grain control of com-

putation, states and communication in multi-tenant Cloud

environments along with optimal usage of the underlying

hardware resources.

Kubernetes incorporates a scheduling system that per-

mits users to specify different schedulers for each job. The

scheduling system makes the decisions based on two steps

before the actual scheduling operations:

Node filtering. The scheduler locates the node(s)

that fit(s) the workload, e.g. a pod is specified with

node affinity, therefore, only certain nodes can meet

the affinity requirements or some nodes may not

include enough CPU resources to serve the request.

Normally the scheduler does not traverse the entire

node list, instead it selects the one/ones detected

first.

Node priority calculation. The scheduler calculates

a score for each node, and the highest scoring node

will run that pod.

5. Software-defined infrastructure (SDI) is the definition of comput-

ing infrastructure entirely under the control of software with no oper-

ator or human intervention. It operates independent of any hardware-

specific dependencies and is programmatically extensible.JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

Kubernetes has started being utilised to assist HPC sys-

tems in container orchestration (Section 4.3).

4.2.2

Docker Swarm

Docker Swarm [69] is built for the Docker engine. It is a

much simpler orchestrator comparing with Kubernetes, e.g.

it offers less rich functionalities, limited customisations and

extensions. Docker Swarm is hence lightweight and suitable

for small workloads. In contrast, Kubernetes is heavyweight

for individual developers who may only want to set up

an orchestrator for simplistic applications and perform in-

frequent deployments. Nevertheless, Docker Swarm still

has its own API, and provides filtering, scheduling and

load-balancing. API is a strong feature commonly used in

Cloud orchestrators, as it enables applications or services

to talk to each other and provides connections with other

orchestrators.

The functionalities of Docker Swarm may be applied to

perform container orchestration on HPC systems as detailed

in Section 4.3.3.

4.2.3

Apache Mesos and YARN

Apache Mesos [70] is a cluster manager that provides

efficient resource isolation and sharing across distributed

applications or frameworks. Mesos removes the centralised

scheduling model that would otherwise require to compute

global schedules for all the tasks running on the different

frameworks connected to Mesos. Instead, each framework

on a Mesos cluster can define its own scheduling strate-

gies. For instance, Mesos can be connected with MPI or

Hadoop [71]. Mesos utilises a master process to manage

slave daemons running on each node. A typical Mesos

cluster includes 3 ∼ 5 masters with one acting as the leader

and the rest on standby. The master controls scheduling

across frameworks through resource offers that provide re-

source availability of the cluster to slaves. However, the

master process only suggests the amount of resources that

can be given to each framework according to the policies of

organisations, e.g fair sharing. Each framework rules which

resources or tasks to accept. Once a resource offer is accepted

by a framework, the framework passes Mesos a description

of the tasks. The slave comprises two components, i.e. a

scheduler registered to the master to receive resources and

an executor process to run tasks from the frameworks.

Mesos is a non-monolithic scheduler which acts as an

arbiter that allocates resources across multiple schedulers,

resolves conflicts, and ensures fair distribution of resources.

Apache YARN (Yet Another Resource Negotiator) [72] is

a monolithic scheduler which was developed in the first

place to schedule Hadoop jobs. YARN is designed for long-

running batch jobs and is unsuitable for long-running ser-

vices and short-lived interactive queries.

Mesosphere Marathon 6 is a container orchestration

framework for Apache Mesos. Literature has seen the usage

of Mesos together with Marathon in container orchestration

on HPC systems as detailed in Section 4.3.3.

6. https://mesosphere.github.io/marathon/

4.2.4 Ansible

Ansible [73] is a popular software orchestration tool. More

specifically, it handles configuration management, applica-

tion deployment, cloud provisioning, ad-hoc task execu-

tion, network automation and multi-node orchestration. The

architecture of Ansible is simple and flexible, i.e. it does

not require a special server or daemons running on the

nodes. Configurations are set by playbooks that utilise yaml

to describe the automation jobs, and connections to other

nodes are via ssh. Nodes managed by Ansible are grouped

into inventories that can be defined by users or drawn from

different Cloud environments.

Ansible is adopted by the SODALITE framework (Sec-

tion 4.3.4) as a key component to automatically build con-

tainer images.

4.2.5 OpenStack

OpenStack [74] is mostly deployed as infrastructure-as-a-

service (IaaS) 7 [75] on Cloud. It can be utilised to deploy and

manage cloud-based infrastructures that support various

use cases, such as web hosting, big data projects, software

as a service (SaaS) [76] delivery and deployment of contain-

ers, VMs or bare-metal. It presents a scalable and highly

adaptive open source architecture for Cloud solutions and

helps to leverage hardware resources [77]. It also manages

heterogeneous compute, storage and network resources.

Together with its support of containers, container or-

chestrators such as Docker Swarm, Kubernetes and Mesos,

Openstack enables the possibilities to quickly deploy, main-

tain, and upgrade complex and highly available infras-

tructures. OpenStack is also used in HPC communities to

provide IaaS to end-users, enabling them to dynamically

create isolated HPC environments.

Academia and industry have developed a plethora of

Cloud orchestrators. This article only reviews the ones that

are mostly relevant to the HPC communities and the ones

that have seen their usage in container orchestration for

HPC systems, and the rest is out of the scope herein.

4.3 Bridge Orchestration Strategies Between HPC and

Cloud

There are numerous works in literature [11], [78], [79], [80]

on container orchestration for Cloud clusters, however, they

are herein out of the scope. This section reviews the works

that have been performed on the general issues of bridging

the gap between conventional HPC and service-oriented

infrastructures (Cloud). Overall, the state-of-the-art works

on container orchestration for HPC systems fall into four

categories as illustrated in Fig. 6.

Added functionalities to HPC workload managers.

It relies on workload managers for resource man-

agement and scheduling; meanwhile adopts addi-

tional software such as MPI for container orchestra-

tion.

Connector between Cloud and HPC. Containers are

scheduled from Cloud clusters to HPC clusters. This

architecture isolates the HPC resources from Cloud

7. IaaS offers resources such as compute, storage and network as

services to users based on demand.JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

so as to ensure HPC environment security; mean-

while offers application developments with flexible

environments and powerful computing resources.

Cohabitation. Workload managers and Cloud or-

chestrators co-exist on an HPC cluster, such as IBM

LSF-Kubernetes. This gives a direction for the pro-

vision of HPC resources as services. In practice, the

HPC workload managers and Cloud orchestrators

do not coexist in one cluster.

Meta-orchestration. An additional orchestrator is

implemented on top of the Cloud orchestrator and

HPC workload manager.

There are pros and cons of the above four categories,

which are outlined in Table 6. In addition, a research and

engineering trend [12], [30], [81], [82], [83] is to move

HPC applications to Cloud, as Cloud provides flexible

and cost-effective services which are favoured by small-

sized or middle-sized business. Beltre etal. [84] proposed

to manage HPC applications by Kubernetes on a Cloud

cluster with powerful computing resources, e.g. InfiniBand,

which demonstrated comparable performance in container-

ised and bare-metal environments. The approach of this

kind may be extended to HPC systems, however, remains

unpractical for HPC centres to completely substitute their

existing workload managers.

4.3.1 Added Functionalities to WLM

A potential research direction is to complement workload

managers with container orchestration or make use of the

existing HPC software stacks. Wofford et al. [85] simply

adopt Open Runtime Environment (orted) reference imple-

mentation from Open MPI to orchestrate container launch

suitable for arbitrary batch schedulers.

Julian et al. [86] proposed their prototype for container

orchestration in an HPC environment. A PBS-based HPC

cluster can automatically scale up and down as load de-

mands by launching Docker containers using the job sched-

uler Moab [98]. Three containers serve as the front-end

system, scheduler (it runs PBS and Moab inside) and com-

pute node (launches pbs mom daemon, see Section 4.1.1).

More compute node containers are scheduled when there

is no sufficient number of physical nodes. Unused contain-

ers are destroyed via external Python scripts when jobs

complete. This approach may offer a solution for resource

elasticity on HPC systems (Section 5.2.6). Similarly, an early

study [87] described two models that can orchestrate Docker

containers using an HPC workload manager. The former

model launches a container to behave as one compute node

which holds all assigned processes, whilst the latter boots

a container per process by MPI launchers. The latter work

seems to be outdated as to MPI applications which can be

now automatically scaled with Singularity support.

4.3.2 Connector between Cloud and HPC

Cloud technologies are evolving to be able to support com-

plex applications of HPC, big data and AI. Nevertheless,

the applications with intensive computation and high inter-

processor communication could not scale well, particularly

due to the lack of low latency networks (e.g. InfiniBand) and

the usage of network virtualisation for network isolation.

A research and development trend is to converge HPC and

Cloud in order to take advantage of the resource manage-

ment and scheduling of both HPC and Cloud infrastructures

with minimal intrusion to HPC environments. Furthermore,

the software stack and workflows in Cloud and HPC are

usually developed and maintained by different organisa-

tions and users with various goals and methodologies,

hence a connector between HPC and Cloud systems would

bridge the gap and solve compatibility problems.

Zhou et al. [88], [89], [90], [91] described the design of a

plugin named Torque-Operator that serves as the key com-

ponent to its proposed hybrid architecture. The container-

ised AI applications are scheduled from the Kubernetes-

managed Cloud cluster to the TORQUE-managed HPC

cluster where the performance of the compute-intensive or

data-demanding applications can be significantly enhanced.

This approach is less intrusive to HPC systems, however, its

architecture shows one drawback: the latency of the network

bridging the Cloud and HPC clusters can be high, when a

large amount of data needs to be transferred in-between.

DKube 8 is a commercial software that is able to execute

a wide range of AI/ML components scheduled from Ku-

bernetes to SLURM. The software comprises a Kubernetes

plugin and a SLURM Plugin. The former is represented

as a hub that runs MLOps (Machine Learning Operations)

management and associated Kubernetes workloads, while

the latter connects to SLURM.

4.3.3 Cohabitation

Liu et al. [92] showed how to dynamically migrate comput-

ing resources between HPC and OpenStack clusters based

on demands. At a higher level, IBM has demonstrated the

ability to run Kubernetes pods on Spectrum LSF where

LSF acts as a scheduler for Kubernetes. An additional

Kubernetes scheduler daemon needs to be installed into

the LSF cluster, which acts as a bridge between LSF and

the Kuberentes server. Kubelet will execute and manage

pod lifecycle on target nodes in the normal fashion. IBM

released LSF connector to Kubernetes, which makes use

of the core LSF scheduling technologies and Kubernetes

API functionalities. Kubernetes needs to be installed in a

subset of the LSF managed HPC cluster. This architecture

allows users to run Kubernetes and HPC batch jobs on

the same infrastructure. The LSF scheduler is packed into

containers and users submit jobs via kubectl. The LSF

scheduler listens to the Kubernetes API server and translates

pod requests into jobs for the LSF scheduler. This approach

can add additional heavy workloads to HPC systems, as

Kubernetes relies deployments of services across clusters to

perform load balancing, scheduling, auto scheduling, etc.

Piras et al. [93] implemented a method that expanded

Kubernetes clusters with HPC clusters through Grid Engine.

Submission is performed by PBS jobs to launch Kubernetes

jobs. Therefore, HPC nodes are added to Kubernetes clusters

by installing Kubernetes core components (i.e. kubeadm

and Kubelet) and Docker container engine. On HPC, es-

pecially HPC production systems in HPC centres, adding

new software packages that require using root privileges can

cause security risks and alter the working environments of

8. https://www.dkube.io/products/datascience/hpc-slurm.phpJOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

TABLE 6

A list of the related work on container orchestration for HPC systems.

Orchestration approaches Advantages Disadvantages

Added functionalities to WLM [85], [86], [87] Less intrusive Limited functionalities, security issues for us-

age of Docker on HPC

Connector between Cloud and HPC [88],

[89], [90], [91] Non-intrusive; flexible environments mean-

while powerful computing resources; exploit

orchestration strategies of orchestration plat-

forms High network latency between Cloud and

HPC

Cohabitation [1], [84], [86], [87], [92], [93],

[94], [94] Fully exploit the functionalities of orches-

tration platforms; flexible execution environ-

ments; enable HPC as services Intrusive, security issues

Meta-orchestration [95], [96], [97] Less-intrusive; flexible environments mean-

while powerful computing resources; con-

tainer orchestration strategies in addition to

the ones given by Cloud orchestrator Increase architecture complexity; increase

maintenance efforts

Additional

orchestrator

HPC WLM

MPI or Docker

HPC WLM

plugins

Cloud HPC WLM

orchestrator Cloud orchestrator

HPC WLM

Added functionalities

Connector

Cohabitation

Cloud

orchestrator

Meta-orchestration

Fig. 6. The four types of container orchestration on HPC systems.

current users. The security issues will be further elaborated

in Section 5.1.4.

Khan et al. [1] proposed to containerise HPC workloads

and install Mesos and Marathon (Section 4.2.3) on HPC

clusters for resource management and container orchestra-

tion. Its orchestration system can obtain the appropriate

resources satisfying the needs of requested services within

defined Quality-of-Service (QoS) parameters, which is con-

sidered to be self-organised and self-managed meaning that

users do not need to specifically request resource reserva-

tion. Nevertheless, this study has not shown insight into

novel strategies of container orchestration for HPC systems.

Wrede et al. [94] performed their experiments on HPC

clusters using Docker Swarm as the container orchestra-

tor for automatic node scaling and using C++ algorithmic

skeleton library Muesli [99] for load balance. Its proposed

working environment is targeted for Cloud clusters. Usage

of Docker cannot be easily extended to HPC infrastructures

especially to HPC production systems due to the security

risks.

4.3.4 Meta-Orchestration

Croupier [95] is a plugin implemented on Cloudify 9 server

that is located at a separate node in addition to the nodes

that are managed by an HPC workload manager and a

Cloud orchestrator. Croupier establishes a monitor to collect

the status of every infrastructure and the operations (e.g.

status of the HPC batch queue). Croupier together with

Cloudify, can orchestrate batch applications in both HPC

9. https://cloudify.co/

and Cloud environments. Similarly, Di Nitto et al. [96] pre-

sented the SODALITE 10 framework by utilising XOpera 11

to manage the application deployment in heterogeneous

infrastructures.

Colonnelli et al. [97] presented a proof-of-concept frame-

work (i.e. Streamflow) to execute workflows on top of

the hybrid architecture consisting of Kubernetes-managed

Cloud and OCCAM [100] HPC cluster.

4.4

Section Highlights

HPC workload managers and Cloud orchestrators have

distinct ways to manage clusters mainly because of their

types of workloads and hardware resource availabilities.

Table 5 summaries the differences of key features between

HPC workload managers and Cloud Orchestrators. Typical

HPC jobs are large workloads with long but ascertainable

execution time and large throughput. HPC jobs are often

submitted to a batch queue within a workload manager

where jobs wait to be scheduled from minutes to days. Per

contra, job requests can be granted immediately on Cloud as

resources are available on demand. Batch-queuing is insuffi-

cient to satisfy the needs of Cloud communities: most of jobs

are short in duration and the Cloud services are persistently

long-running programs. Most of the HPC workload man-

agers support Checkpointing that allows applications to save

the execution states of a running job and restart the job from

the checkpointing when a crash happens. This feature is

10. SODALITE: SOftware-Defined AppLication Infrastructures man-

agemenT and Engineering. https://www.sodalite.eu/

11. https://github.com/xlab-si/xopera-opera.JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

critical for an HPC application with execution time typically

from hours to months. Because it enables the application to

recover from error states or resume from the state when it

was previously terminated by the workload manager when

its walltime limit had been reached or resource allocation

had been exceeded. In contrary, jobs on Cloud, which are

often micro-service programs, are usually relaunched in

case of failures [101]. A container orchestrator offers an

important property, i.e. container status monitoring. This is

practical for long-running Cloud services, as it can monitor

and replace unhealthy containers per desired configuration.

HPC systems do not offer the equivalence of container pod

which bundle performance monitoring services with the

application itself as in Cloud systems [13]. Additionally,

HPC workload managers often do not provide capabilities

of application elasticity or necessary API at execution time,

however, these capabilities are important for task migration

and resource allocation changes at runtime on Cloud [102].

Section 4.3 has reviewed the approaches to address the

issues of container orchestration on HPC systems, which

are summarised in Table 6. Overall, a container orchestrator

on its own does not address all the requirements of HPC

systems [3], as a result cannot replace existing workload

managers in HPC centres. An HPC workload manager

lacks micro-service support and deeply-integrated container

management capabilities in which container orchestrators

manifest their efficiency.

R ESEARCH C HALLENGES AND V ISION

The distinctions between Cloud and HPC clusters are di-

minishing, especially with the trend of HPC Clouds in

industry [103]. HPC Cloud is becoming an alternative to

on-premise HPC clusters for executing scientific applica-

tions and business analytics models [16]. Containerisation

technologies help to ease the efforts of moving applications

between Cloud and HPC. Nevertheless, not all applications

are suitable for containerisation. For instance, in the typical

HPC applications such as weather forecast or modelling of

computational fluid dynamics, any virtualisation or high-

latency networks can become the bottlenecks for perfor-

mance. Containerisation in HPC still faces challenges of

different folds (Section 5.1).

Interest in using containers on HPC systems is mainly

due to the encapsulation and portability that yet may trade

off with performance. In practice, containers deployed on

HPC clusters often have large image size and as a result

each HPC node can only host a few containers that are

CPU-intensive and memory-demanding. In addition, im-

plementation of AI frameworks such as TensorFlow and

PyTorch [104] typically also have large container image size.

Architecture of HPC containers should be able to easily

integrate seamlessly with HPC workload managers. The

research directions (Section 5.2) which can be envisioned

are not only to adapt the existing functionalities from Cloud

to HPC, but to also explore the potentials of containerisation

so as to improve the current HPC systems and applications.

5.1

Challenges and Open Issues

Although containerisation enables compatibility, portability

and reproducibility, containerised environments still need

to match the host architecture and exploit the underlying

hardware. The challenges that containerisation faces on

HPC systems are in three-fold: compatibility, security and

performance. Some issues still remain as open questions.

Table 7 summarises the potential solutions to the research

challenges and the open questions that will be discussed in

this section.

5.1.1 Library Compatibility Issues

Mapping container libraries and their dependencies to the

host libraries can cause incompatibility. Glibc [105], which is

an implementation of C standard library that provides core

supports and interfaces to kernel features, can be a common

library dependency. The version of Glibc on the host may

be older or newer than the one in the container image,

consequently introducing symbol mismatches. Additionally,

when the container OS (e.g. Ubuntu 18.04) and the host OS

are different (e.g. CentOS 7), it is likely that some kernel

ABI are incompatible, which may lead to container crashes

or abnormal behaviours. This issue can also occur to MPI

applications. As a result users must either build an exact

version of the host MPI or have the privilege to mount the

host MPI dependency path into the container.

A research direction to handle library mismatches be-

tween container images and hosts is to implement a con-

tainer runtime library at a lower level. For instance, Nvidia

implemented the library libnvidia-container 12 that

manages driver or library matching at container runtime, i.e.

using a hook interface to inject and/or activate the correct

library versions. However, the libnvidia-container li-

brary can be only applied to Nividia GPUs. A significant

modification of this library code is likely to be needed in

order to be adapted for other GPU suppliers. In practice,

such a compatibility layer would also require supports from

different HPC interconnect and accelerator vendors.

5.1.2 Compatibility Issues of Container Engines and Im-

ages

Not all Docker images can be converted by HPC container

engines to their own formats. Moreover, to reuse HPC

container implementations between container engines, users

need to learn different container command lines to build the

corresponding images, which further complicates adoption

of containers for HPC applications.

This issue calls for container standardisation. OCI is

a Linux foundation project that designs open standards

for container image formats (a filesystem bundle or

rootfs) and multiple data volume [106]. Some guidelines

were proposed in [63] , i.e. a container should be:

•

Not bound to higher-level frameworks, e.g. an or-

chestration stack;

Not tightly associated with any particular vendor or

project;

Portable across a wide variety of OSs, hardware,

CPUs, clusters, etc.

Unfortunately, this standard cannot guarantee that the run-

time hooks built for one runtime can be used by another. For

example, container privileges (e.g. mount host filesystems)

12. https://github.com/NVIDIA/libnvidia-containerJOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

TABLE 7

Overview of research challenges and potential solutions.

Research challenges

Potential solutions Open Questions

Compatibility of engines and images OS updates,low-level container run-

time libraries

Container standardisation (e.g. OCI) Reuse container images across platforms

Kernel optimisation Using OS kernel to be library OS

Library compatibility

Compatibility issues

Security issues Private container registry, namespace

settings, OS updates, rootless instal-

lation of container engines, avoid

root processes inside containers Risk of using namespaces

Performance degradation Trade-off between performance and

portability Leverage hardware resources with-

out losing portability

assumed by one container runtime may not be translated to

unprivileged runtimes (e.g. not all HPC centres have mount

namespace enabled) [107].

5.1.3 Kernel Optimisation

In general, containers are forbidden by the host to install

their own kernel modules for the purpose of application iso-

lation [108]. This is a limitation for the applications requiring

kernel customisation, because the kernels of their HPC hosts

cannot be tuned and optimised. Shen et al. [108] proposed

an Xcontainer to address this issue by tuning the Linux

kernel into library OS that supports binary compatibility.

This functionality is yet to be explored in HPC containers.

5.1.4 Security Issues

Containers face three major threats [109]:

•

Privilege Escalation. Attackers gain access to hosts

and other containers by breaking out of their current

containers.

Denial-of-Service (DoS). An attack causes services

to become inaccessible to users by disruption of a

machine or network resources.

Information Leak. Confidential details of other con-

tainers are leaked and utilised for further attacks.

Multiple or many containers share a host kernel, there-

fore, one container may infect other containers. In this case,

a container does not reduce attack surfaces, but rather brings

multiple instances of attack surfaces. For example, starting

from version V3.0, Singularity has added Cgroups support

that allows users to limit the resources consumed by con-

tainers without the help from a batch scheduling system (e.g.

TORQUE). This feature helps to prevent DoS attacks when

a container seizes control of all available system resources

which prohibits other containers from operating properly.

Execution of HPC containers (including the Docker En-

gine starting from v19.03) does not require root privileges

on the host. Containers in general adopt namespaces to

isolate resources among users and map a root user inside

a container to a non-root user on the host. The User names-

pace nevertheless is not a panacea to resolve all problems

of resource isolation. User exposes code in the kernel to

non-privileged users, which was previously limited to root

users. A container environment is generated by users, and

it is likely that some software inside a container may be

embedded with security vulnerabilities. Root users inside a

container may escalate their privileges via application level

vulnerability. This can bring security issues to the kernel

that does not account for mapped PIDs/GIDs. This issue

can be addressed in two ways: (1) avoiding root processes

inside HPC containers; (2) installing container engines with

user permission instead of sudo installation. Security issues

of the user namespace continue to be discovered even in

the latest version of Linux kernels. Therefore, many HPC

production centres have disabled the configuration of this

namespace, which prevents usage of almost any state-of-

the-art HPC containers. How to address the risks of using

namespaces still remains an open question.

5.1.5 Performance Degradation

GPU and accelerators often require customised or propri-

etary libraries that need to be bound to container images

so as to leverage performance. This operation is at the

cost of portability [107]. It is de facto standard to utilise

the optimised MPI libraries for HPC interconnects, such

as InfiniBand and Slingshot [110], and it is likely that the

container performance degrades in a different HPC infras-

tructure [22] (see Section 3.2). There is no simple solution to

address this issue.

Another example presented in [111] identified the per-

formance loss due to increasing communication cost of

MPI processes. This occurs when the number of con-

tainers (MPI processes running inside containers) rises

on a single node, e.g. point to point communication

(MPI_Irecv, MPI_Isend), polling of pending asyn-

chronous messages (MPI_Test) and collective communica-

tion (MPI_Allreduce).

5.2

Research and Engineering Opportunities

Research studies should continue working on solutions to

the open question identified in Section 5.1. This section

discusses current research and engineering directions that

are interesting, yet still need further development. This

section also identifies new research opportunities that yet

need to be explored. The presentation of this section is

arranged from short-term vision to long-term efforts. Table 8

summarises the potentials discovered in literature and the

prospects given by the authors.JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

TABLE 8

Future directions of research and engineering.

Topics Importance State-of-art trends Prospects given by the authors

Containerisation of AI in HPC Leverage HPC systems

for ML/DL training Containerised AI

apps & frameworks Improve scalability;

Enable out-of-box usage

HPC container registry Pre-build images accessible

within HPC centres,

ensure container security HPC centres set up

private registries WLMs boot containers from reg-

istries without users awareness

Linux namespace guideline Ensure security HPC centres provide namespace

guidelines Different user groups to have dif-

ferent set of namespaces enabled

DevOps Research reproducibility Integration of Singularity with

Jenkins HPC-specific DevOps tools

Middleware system Flexible and easy to plugin

or plugout new components Transfer Docker to HPC containers Enable DevOps on HPC systems

and perform the deployment

onto HPC systems

Resource elasticity Flexible usage of hardware

resources Kubernetes to instantiate the

containerised HPC

schedulers Integration of containers to intro-

duce resource elasticity to WLM

Moving toward minimal OS Reduce maintenance

efforts – Maintain minimal OS kernel and

containerised the rest of the HPC

software stack

5.2.1 Containerisation of AI in HPC

Model training of AI/DL applications can immensely ben-

efit from the compute power (GPU or CPU), storage and

security [112] of HPC clusters in addition to the superior

GPU-aware scheduling and features of workflow automa-

tion provided by workload managers. The trained models

are subsequently deployed on Cloud for scalability at low

cost and on HPC for computation speed. Exploiting HPC

infrastructures for ML/DL training is becoming a topic of

increasing importance [113]. For example, Fraunhofer 13 has

developed the software framework Carme 14 that combines

established open source ML and Data Science tools with

HPC backends. The execution environments of the tools are

provided by predefined Singularity containers.

AI applications are usually developed with high-level

scripting languages or frameworks, e.g. TensorFlow and Py-

Torch, which often require connections to external systems

to download a list of open-source software packages during

execution. For instance, an AI application written in Python

cannot be compiled into an executable that has included all

the dependencies ready for execution as in C/C++. There-

fore, the developers need flexibility to customise the exe-

cution environments. Since HPC environments, especially

on HPC production systems, are often based on closed-

source applications and their users have restricted account

privileges and security restrictions [6], deployment of AI ap-

plications on HPC infrastructures is challenging. Besides the

13. Fraunhofer: A German research organisation. https://www.

fraunhofer.de/

14. https://www.itwm.fraunhofer.de/en/departments/hpc/

data-analysis-and-machine-learning/carme-softwarestack.html

predefined module environments or virtual environments

(such as Anaconda), containerisation can be an alternative

candidate, which enables easy transition of AI workloads to

HPC while fully taking advantage of HPC hardware and the

optimised libraries of AI applications without compromis-

ing security. Huerta et al. [114] recommend three guidelines

for containerisation of AI applications for HPC centres:

•

Provide up-to-date documentation and tutorials to

set up or launch containers.

Maintain versatile and up-to-date base container

images that users can clone and adapt, such as a

container registry (see Section 5.2.2).

Give instructions on installation or updates of soft-

ware packages into containers. The AI program

depends on distributed training software, such as

Horovod [115], which then depends on system ar-

chitecture and specific versions of software packages

such as MPI.

Increasing amount of new software frameworks are be-

ing developed using containerisation technologies to facili-

tate deployment of AI applications on HPC systems. Further

research is still needed to improve scalability and enable

out-of-box usage.

5.2.2 HPC Container Registry

Container registry is a useful repository to provide pre-built

container images that can be accessed easily either by public

or private users by pulling images to the host directly. It

is portable to deploy applications in this way on Cloud

clusters. Accesses to external networks are often blockedJOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

in HPC centres, so users need to upload images onto the

clusters manually. One solution is to set up a private registry

within the HPC centres that offer pre-built images suitable

for the targeted systems and architectures.

A container registry is also a way to ensure container

security. It is a good security practice to ensure that images

executed on the HPC systems are signed and pulled from

a trusted registry. Scanning vulnerabilities on the registry

should be regularly performed.

To simplify usage, the future work can enable HPC

workload managers to boot the default containers on the

compute nodes (by pulling images from the private reg-

istry) which match the environments with all the required

libraries and configuration files of user login nodes where

users implement their own workflows and submit their

jobs. The jobs should be started without user awareness

of the presence of containers and without additional user

intervention.

5.2.3 Linux Namespace Guidelines

The set of Linux namespaces used within an implementa-

tion depends on the policies of HPC centres [116]. HPC cen-

tres should provide clear instructions on the availabilities of

namespaces. For example, different user groups may have

different namespaces enabled or disabled. A minimal set of

namespaces should be enabled for a general user group:

mount and user, which are suitable for node-exclusive

scheduling. PID and Cgroups should be provided to re-

strict resource usage and enforce process privacy, which

are useful for shared-node scheduling. Advanced use cases

may require additional sets of namespaces. When users

submit the container jobs, workload managers can start the

containers with appropriate namespaces enabled.

5.2.4 DevOps

DevOps aims at integrating efforts of development (Dev)

and operations (Ops) to automate fast software delivery

while ensuring correctness and reliability [117], [118]. This

concept is influential in Cloud Computing and has been

widely adopted in industry, as DevOps tools minimise the

overhead of managing a large amount of micro-services. In

HPC environments, typical applications have large work-

loads, hence the usage of DevOps should concentrate on

research reproducibility. Nevertheless, the off-the-shelf De-

vOps tools are not well fitted for HPC environments, e.g.

the dependencies of MPI applications are too foreign for

the state-of-the-art DevOps tools. A potential solution is

to develop HPC-specific DevOps tools for the applications

that are built and executed on on-premise clusters [16].

Unfortunately, HPC environments are known to be inflex-

ible and typical HPC applications are optimised to leverage

resources, thereby generation of DevOps workflows can be

restricted and slow. Such obstacles can be overcome by con-

tainerisation, which may provision DevOps environments.

For instance, Sampedro et al. [119] integrate Singularity with

Jenkins [120] that brings CICD 15 practices into HPC work-

flows. Jenkins is an open-source automation platform for

building and deploying software, which has been applied

at some HPC sites as a general-purpose automation tool.

15. CICD: Continuous integration, delivery and deployment. It is

widely used in DevOps communities.

5.2.5

Middleware System

A middleware system, which bridges container building

environments with HPC resource managers and schedulers,

can be flexible. A middleware system can be either located

on an HPC cluster or connect to it with secured authenti-

cation. The main task of the middleware is to perform job

deployment, job management, data staging and generating

non-root container environments [121]. Different container

engines can be swiftly switched, optimisation mechanisms

can be adapted to the targeted HPC systems and workflow

engines [122] can be easily plugged in. Middleware systems

can be a future research direction that provides a portable

way to enable DevOps in HPC centres.

5.2.6

Resource Elasticity

One major difference between resource management on

HPC and Cloud is the elasticity [123], i.e. an HPC workload

manager runs on a fixed set of hardware resources and

the workloads of its jobs at any point can not exceed the

resource capacity, while Cloud orchestrators can scale up

automatically the hardware resources to satisfy user needs

(e.g. AWS spot instances). Static reservation is a limitation

for efficient resource usages on HPC systems [124]. One

future direction of containerisation for HPC systems can

work towards improvement of the elasticity of HPC infras-

tructure, which can be introduced to its workload manager.

In [123], [125], the authors presented a novel architecture

that utilises Kubernetes to instantiate the containerised HPC

workload manager. In this way, the HPC infrastructure is

dynamically instantiated on demand and can be served as a

single-tenant or multi-tenant environment. A complete con-

tainerised environments on HPC system may be impractical

and much more exploration is still needed.

5.2.7

Moving Towards Minimal OS

Containers may be utilised to partially substitute the cur-

rent HPC software stack. Typical compute nodes on HPC

clusters do not contain local storage (e.g. hardware disk),

therefore lose states after reboots. The compute node boots

via a staged approach [116]: (1) a kernel and initial RAM

disk are loaded via a network device; (2) a root filesystem is

mounted via the network. In a monolithic stateless system,

modification of the software components often requires

system rebooting to completely activate the functions of

updates. Using containerised software packages on top of

a minimal OS (base image) on the compute nodes, reduces

the number of components in the kernel image, hence

decreasing the frequency of node reboots. Furthermore, the

base image of reduced size also simplifies the post-boot

configurations that need to run in the OS image itself, conse-

quently the node rebooting time is minimised. Additionally,

when a failure occurs, a containerised service can be quickly

replaced without affecting the entire system. Long-term

research is required on HPC workload managers to control

the software stack and workloads that are partially native

and partially containerised. Moreover, it needs to explored

whether containerisation of the entire OS on HPC systems

is feasible.JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

C ONCLUDING R EMARKS

This paper presents a survey and taxonomy for the state-of-

the-art container engines and container orchestration strate-

gies specifically for HPC systems. It underlines differences

of containerisation on Cloud and HPC systems. The research

and engineering challenges are also discussed and the op-

portunities are envisioned.

HPC systems start to utilise containers as thereof reduce

environment complexity. Efforts have been also made to

ameliorate container security on HPC systems. This article

identified three points to increase the security level: (1)

set on-site container registry, (2) give Linux namespaces

guidelines (3) and remove root privilege meanwhile avoid

permission escalation. Ideally, HPC containers should re-

quire no pre-installation of container engines or installation

can be performed without root privileges, which not only

meets the HPC security requirements but also simplifies the

container usability.

Containers will continue to play a role in reducing the

performance gap and deployment complexity between on-

premise HPC clusters and public Clouds. Together with the

advancement of low-latency networks and accelerators (e.g.

GPUs, TPUs [126]), it may eventually reshape the two fields.

Containerised workloads can be moved from HPC to Cloud

so as to temporarily relieve the peak demands and can be

also scheduled from Cloud to HPC in order to exploit the

powerful hardware resources. The research and engineering

trend are working towards implementation of the present

container orchestrators within HPC clusters, which however

still remains experimental. Many studies have been devoted

to container orchestration on Cloud, however, it can be

foreseen that the strategies will be eventually introduced

to HPC workload managers.

In the future, it can be presumed that containerisation

will play an essential role in application development, im-

prove resource elasticity and reduce complexity of HPC

software stacks.

A CKNOWLEDGMENTS

The project has received funding from the European Unions

Horizon 2020 research and innovation programme under

grant agreement no 825355 (CYBELE), as well as through

the project CATALYST funded by the Ministry of Science,

Research and the Arts of the State of Baden-Württemberg,

Germany.

The authors would like to express the gratitude to Dr.

Joseph Schuchart for proof-reading the contents.

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

R EFERENCES

[1]

[2]

M. Khan, T. Becker, P. Kuppuudaiyar, and A. C. Elster,

“Container-Based Virtualization for Heterogeneous HPC Clouds:

Insights from the EU H2020 CloudLightning Project,” in 2018

IEEE International Conference on Cloud Engineering (IC2E), (Piscat-

away, New Jersey, US), pp. 392–397, IEEE, April 2018.

M. A. Rodriguez and R. Buyya, “Container-based cluster orches-

tration systems: A taxonomy and future directions,” Software:

Practice and Experience, vol. 49, no. 5, pp. 698–719, 2019.

[20]

[21]

L. Abdollahi Vayghan, M. A. Saied, M. Toeroe, and F. Khendek,

“Deploying Microservice Based Applications with Kubernetes:

Experiments and Lessons Learned,” in 2018 IEEE 11th Interna-

tional Conference on Cloud Computing (CLOUD), (Piscataway, New

Jersey, US), pp. 970–973, IEEE, July 2018.

J. M. Olivier Terzo, ed., HPC, Big Data, and AI Convergence Towards

Exascale: Challenge and Vision. Boca Raton: CRC Press, 1st ed., Jan.

2022.

R. McLay, K. W. Schulz, W. L. Barth, and T. Minyard, “Best prac-

tices for the deployment and management of production HPC

clusters,” in SC ’11: Proceedings of 2011 International Conference

for High Performance Computing, Networking, Storage and Analysis,

pp. 1–11, Nov 2011.

D. Brayford, S. Vallecorsa, A. Atanasov, F. Baruffa, and W. Riv-

iera, “Deploying AI Frameworks on Secure HPC Systems with

Containers,” in 2019 IEEE High Performance Extreme Computing

Conference (HPEC), (Piscataway, New Jersey, US), pp. 1–6, IEEE,

Sep. 2019.

G. Yi and V. Loia, “High-performance computing systems and

applications for AI,” The Journal of Supercomputing, vol. 75, 06

2019.

E. Casalicchio, “Autonomic Orchestration of Containers: Problem

Definition and Research Challenges,” in Proceedings of the 10th

EAI International Conference on Performance Evaluation Methodolo-

gies and Tools on 10th EAI International Conference on Performance

Evaluation Methodologies and Tools, VALUETOOLS16, (Brussels,

BEL), p. 287C290, ICST (Institute for Computer Sciences, Social-

Informatics and Telecommunications Engineering), 2017.

A. Tosatto, P. Ruiu, and A. Attanasio, “Container-Based Orches-

tration in Cloud: State of the Art and Challenges,” in Proceedings

of the 2015 Ninth International Conference on Complex, Intelligent,

and Software Intensive Systems, CISIS ’15, (USA), p. 70C75, IEEE

Computer Society, 2015.

C. Pahl, A. Brogi, J. Soldani, and P. Jamshidi, “Cloud Container

Technologies: a State-of-the-Art Review,” IEEE Transactions on

Cloud Computing, vol. PP, pp. 1–1, 05 2017.

E. Casalicchio, “Container Orchestration: A Survey,” in Systems

Modeling: Methodologies and Tools (A. Puliafito and K. S. Trivedi,

eds.), pp. 221–235, Cham: Springer International Publishing,

2019.

N. Nguyen and D. Bein, “Distributed MPI cluster with Docker

Swarm mode,” in 2017 IEEE 7th Annual Computing and Communi-

cation Workshop and Conference (CCWC), pp. 1–7, Jan 2017.

D. Bernstein, “Containers and Cloud: From LXC to Docker to

Kubernetes,” IEEE Cloud Computing, vol. 1, no. 3, pp. 81–84, 2014.

Z. Zhong and R. Buyya, “A Cost-Efficient Container Orches-

tration Strategy in Kubernetes-Based Cloud Computing Infras-

tructures with Heterogeneous Resources,” ACM Trans. Internet

Technol., vol. 20, Apr. 2020.

E. Casalicchio and S. Iannucci, “The state-of-the-art in container

technologies: Application, orchestration and security,” Concur-

rency and Computation: Practice and Experience, vol. n/a, no. n/a,

p. e5668, 2019. e5668 cpe.5668.

M. A. S. Netto, R. N. Calheiros, E. R. Rodrigues, R. L. F.

Cunha, and R. Buyya, “HPC Cloud for Scientific and Business

Applications: Taxonomy, Vision, and Research Challenges,” ACM

Comput. Surv., vol. 51, Jan. 2018.

D. Merkel, “Docker: Lightweight Linux Containers for Consistent

Development and Deployment,” Linux J., vol. 2014, pp. 76–90,

Mar. 2014.

J. S. Hale, L. Li, C. N. Richardson, and G. N. Wells, “Containers

for Portable, Productive, and Performant Scientific Computing,”

Computing in Science Engineering, vol. 19, pp. 40–50, November

2017.

P. Sharma, L. Chaufournier, P. Shenoy, and Y. C. Tay, “Containers

and Virtual Machines at Scale: A Comparative Study,” in Proceed-

ings of the 17th International Middleware Conference, Middleware

’16, (New York, NY, USA), Association for Computing Machin-

ery, 2016.

R. Morabito, J. Kj0Ł1llman, and M. Komu, “Hypervisors vs.

Lightweight Virtualization: A Performance Comparison,” in 2015

IEEE International Conference on Cloud Engineering, pp. 386–393,

2015.

“Containers on Virtual Machines or Bare Metals?,” tech. rep.,

VMWare, VMware, Inc. 3401 Hillview Avenue Palo Alto CA

94304 USA, Dec. 2018.JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

O. Rudyy, M. Garcia-Gasulla, F. Mantovani, A. Santiago, R. Sir-

vent, and M. Vzquez, “Containers in HPC: A Scalability and

Portability Study in Production Biological Simulations,” in 2019

IEEE International Parallel and Distributed Processing Symposium

(IPDPS), pp. 567–577, May 2019.

G. Hu, Y. Zhang, and W. Chen, “Exploring the Performance of

Singularity for High Performance Computing Scenarios,” in 2019

IEEE 21st International Conference on High Performance Computing

and Communications; IEEE 17th International Conference on Smart

City; IEEE 5th International Conference on Data Science and Systems

(HPCC/SmartCity/DSS), (Piscataway, New Jersey, US), pp. 2587–

2593, IEEE, Aug 2019.

A. J. Younge, K. Pedretti, R. E. Grant, and R. Brightwell, “A

Tale of Two Systems: Using Containers to Deploy HPC Applica-

tions on Supercomputers and Clouds,” in 2017 IEEE International

Conference on Cloud Computing Technology and Science (CloudCom),

(Piscataway, New Jersey, US), pp. 74–81, IEEE, 2017.

J. Zhang, X. Lu, and D. K. Panda, “Is Singularity-Based Con-

tainer Technology Ready for Running MPI Applications on HPC

Clouds?,” in Proceedings of The10th International Conference on

Utility and Cloud Computing, UCC 17, (New York, NY, USA),

Association for Computing Machinery, 2017.

J. P. Martin, A. Kandasamy, and K. Chandrasekaran, “Exploring

the Support for High Performance Applications in the Container

Runtime Environment,” Hum.-Centric Comput. Inf. Sci., vol. 8,

Dec. 2018.

S. Abraham, A. K. Paul, R. I. S. Khan, and A. R. Butt, “On the Use

of Containers in High Performance Computing Environments,”

in 2020 IEEE 13th International Conference on Cloud Computing

(CLOUD), pp. 284–293, Oct 2020.

R. Priedhorsky and T. Randles, “Charliecloud: Unprivileged Con-

tainers for User-Defined Software Stacks in HPC,” in Proceedings

of the International Conference for High Performance Computing,

Networking, Storage and Analysis, SC 17, (New York, NY, USA),

Association for Computing Machinery, 2017.

S. K. S., Practical LXC and LXD: Linux Containers for Virtualization

and Orchestration. USA: Apress, 1st ed., 2017.

P. Saha, A. Beltre, P. Uminski, and M. Govindaraju, “Evaluation

of Docker Containers for Scientific Workloads in the Cloud,”

in Proceedings of the Practice and Experience on Advanced Research

Computing, PEARC 18, (New York, NY, USA), Association for

Computing Machinery, 2018.

L. Gerhardt, W. Bhimji, S. Canon, M. Fasel, D. Jacobsen,

M. Mustafa, J. Porter, and V. Tsulaia, “Shifter: Containers for

HPC,” Journal of Physics: Conference Series, vol. 898, p. 082021, oct

2017.

W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel

Programming with the Message-passing Interface. Cambridge, MA,

USA: MIT Press, 1994.

“MPI: A Message-Passing Interface Standard.”

G. M. Kurtzer, V. V. Sochat, and M. Bauer, “Singularity: Scientific

containers for mobility of compute,” in PloS one, (San Francisco,

California, United States), PLOS, 2017.

L. Benedicic, F. A. Cruz, A. Madonna, and K. Mariotti, “Sarus:

Highly Scalable Docker Containers for HPC Systems,” in High

Performance Computing (M. Weiland, G. Juckeland, S. Alam, and

H. Jagode, eds.), (Cham), pp. 46–60, Springer International Pub-

lishing, 2019.

H. Gantikow, S. Walter, and C. Reich, “Rootless Containers with

Podman for HPC,” in High Performance Computing (H. Jagode,

H. Anzt, G. Juckeland, and H. Ltaief, eds.), (Cham), pp. 343–354,

Springer International Publishing, 2020.

K. Hightower, B. Burns, and J. Beda, Kubernetes: Up and Running

Dive into the Future of Infrastructure. Sebastopol, California, US:

O/’Reilly Media, Inc., 1st ed., 2017.

A. Ruhela, M. Vaughn, S. L. Harrell, G. J. Zynda, J. Fonner,

R. T. Evans, and T. Minyard, “Containerization on Petascale HPC

Clusters,” Texas ScholarWorks, November 2020.

A. Azab, “Enabling Docker Containers for High-Performance

and Many-Task Computing,” in 2017 IEEE International Confer-

ence on Cloud Engineering (IC2E), pp. 279–285, April 2017.

D. Bahls, “Evaluating Shifter for HPC Applications,” in Cray User

Group Conf., 2016.

G. Muscianisi, G. Fiameni, and A. Azab, “Singularity GPU Con-

tainers Execution on HPC Cluster,” in High Performance Comput-

ing (M. Weiland, G. Juckeland, S. Alam, and H. Jagode, eds.),

(Cham), pp. 61–68, Springer International Publishing, 2019.

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

IMB,

“Introducing

Intel

MPI

Benchmarks.”

https:

//www.intel.com/content/www/us/en/developer/articles/

technical/intel-mpi-benchmarks.html (Sep, 2022).

P. L. Jack Dongarra, Michael A. Heroux, “HPCG Benchmark:

a New Metric for Ranking High Performance Computing Sys-

tems,” tech. rep., Electrical Engineering and Computer Sciente

Department, 2015.

Cray XC 40, “Overviw of Cray XC40 architecture.”

https://www.alcf.anl.gov/files/CrayXC40Brochure.pdf

(Sep,

2022).

J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart,

LINPACK Users’ Guide. Society for Industrial and Applied Math-

ematics, 1979.

NAMD, “Simulation for molecular dynamics.” https://www.ks.

uiuc.edu/Research/namd/ (Sep,2022).

VASP, “Atomic scale materials modelling.” https://www.hpc.

cineca.it/content/vasp-benchmark (Sep. 2022).

WRF, “Weather Research and Forecasting Model.” https://

openbenchmarking.org/test/pts/wrf-1.0.0 (Sep,2022).

AMBER, “Assisted Model Building with Energy Refinement.”

http://ambermd.org/doc12/Amber18.pdf (Sep, 2022).

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean,

M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Lev-

enberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker,

V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng, “Tensor-

Flow: A System for Large-scale Machine Learning,” in Proceedings

of the 12th USENIX Conference on Operating Systems Design and Im-

plementation, OSDI’16, (Berkeley, CA, USA), pp. 265–283, USENIX

Association, 2016.

HPGMG, “High-performance Geometric Multigrid ,Github.”

https://github.com/hpgmg/hpgmg (Sep, 2022).

M. Hovestadt, O. Kao, A. Keller, and A. Streit, “Scheduling in

HPC Resource Management Systems: Queuing vs. Planning,”

in Job Scheduling Strategies for Parallel Processing (D. Feitelson,

L. Rudolph, and U. Schwiegelshohn, eds.), (Berlin, Heidelberg),

pp. 1–20, Springer Berlin Heidelberg, 2003.

D. Klusáček, V. Chlumský, and H. Rudová, “Planning and op-

timization in torque resource manager,” in Proceedings of the

24th International Symposium on High-Performance Parallel and

Distributed Computing, (New York, NY, USA), Association for

Computing Machinery, 2015.

G. Staples, “Torque resource manager,” in Proceedings of the 2006

ACM/IEEE Conference on Supercomputing, (New York, NY, USA),

p. 8, Association for Computing Machinery, 2006.

S. Zhou, X. Zheng, J. Wang, and P. Delisle, “Utopia: A Load

Sharing Facility for Large, Heterogeneous Distributed Computer

Systems,” Softw. Pract. Exper., vol. 23, p. 1305C1336, Dec. 1993.

W. Gentzsch, “Sun Grid Engine: towards creating a compute

power grid,” in Proceedings First IEEE/ACM International Sympo-

sium on Cluster Computing and the Grid, pp. 35–36, May 2001.

N. Capit, G. Da Costa, Y. Georgiou, G. Huard, C. Martin,

G. Mounie, P. Neyron, and O. Richard, “A Batch Scheduler with

High Level Components,” in Proceedings of the Fifth IEEE Inter-

national Symposium on Cluster Computing and the Grid (CCGrid’05)

- Volume 2 - Volume 02, CCGRID ’05, (USA), p. 776C783, IEEE

Computer Society, 2005.

M. A. Jette, A. B. Yoo, and M. Grondona, “SLURM: Simple Linux

Utility for Resource Management,” in Proceedings of Job Scheduling

Strategies for Parallel Processing (JSSPP), (publisher addr), pp. 44–

60, Springer Berlin Heidelberg, 2003.

D. Jackson, Q. Snell, and M. Clement, “Core Algorithms of the

Maui Scheduler,” in Job Scheduling Strategies for Parallel Processing

(D. G. Feitelson and L. Rudolph, eds.), (Berlin, Heidelberg),

pp. 87–102, Springer Berlin Heidelberg, 2001.

S. Prabhakaran, Dynamic Resource Management and Job Scheduling

for High Performance Computing. PhD thesis, Technische Univer-

sität Darmstadt, Darmstadt, August 2016.

A. W. Mu’alem and D. G. Feitelson, “Utilization, predictability,

workloads, and user runtime estimates in scheduling the IBM

SP2 with backfilling,” IEEE Transactions on Parallel and Distributed

Systems, vol. 12, no. 6, pp. 529–543, 2001.

D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn, “Parallel

Job Scheduling — A Status Report,” in Job Scheduling Strate-

gies for Parallel Processing (D. G. Feitelson, L. Rudolph, and

U. Schwiegelshohn, eds.), (Berlin, Heidelberg), pp. 1–16, Springer

Berlin Heidelberg, 2005.JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

A. Khan, “Key Characteristics of a Container Orchestration Plat-

form to Enable a Modern Application,” IEEE Cloud Computing,

vol. 4, pp. 42–48, Sep. 2017.

S. Pandey and V. Tokekar, “Prominence of MapReduce in Big

Data Processing,” in 2014 Fourth International Conference on Com-

munication Systems and Network Technologies, (Piscataway, New

Jersey, US), pp. 555–560, IEEE, 2014.

M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave,

X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi,

J. Gonzalez, S. Shenker, and I. Stoica, “Apache Spark: A Unified

Engine for Big Data Processing,” Commun. ACM, vol. 59, pp. 56–

65, Oct. 2016.

N. Narkhede, G. Shapira, and T. Palino, Kafka: The Definitive

Guide Real-Time Data and Stream Processing at Scale. Sebastopol,

California, US: O’Reilly Media, Inc., 1st ed., 2017.

G. Kandiraju, H. Franke, M. D. Williams, M. Steinder, and S. M.

Black, “Software defined infrastructures,” IBM Journal of Research

and Development, vol. 58, pp. 2:1–2:13, March 2014.

P. X. Gao, A. Narayan, S. Karandikar, J. Carreira, S. Han, R. Agar-

wal, S. Ratnasamy, and S. Shenker, “Network Requirements for

Resource Disaggregation,” in Proceedings of the 12th USENIX Con-

ference on Operating Systems Design and Implementation, OSDI16,

(USA), p. 249C264, USENIX Association, 2016.

F. Soppelsa and C. Kaewkasi, Native docker clustering with swarm.

Packt Publishing Ltd, 2016.

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph,

R. Katz, S. Shenker, and I. Stoica, “Mesos: A Platform for Fine-

Grained Resource Sharing in the Data Center,” in Proceedings

of the 8th USENIX Conference on Networked Systems Design and

Implementation, NSDI11, (USA), p. 295C308, USENIX Association,

2011.

T. White, Hadoop: The Definitive Guide. OReilly Media, Inc., 2012.

V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar,

R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino,

O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler, “Apache

Hadoop YARN: Yet Another Resource Negotiator,” in Proceedings

of the 4th Annual Symposium on Cloud Computing, SOCC ’13, (New

York, NY, USA), Association for Computing Machinery, 2013.

G. Sammons, Exploring Ansible 2: Fast and Easy Guide. North

Charleston, SC, USA: CreateSpace Independent Publishing Plat-

form, 2016.

O. Sefraoui, M. Aissaoui, and M. Eleuldj, “OpenStack: Toward

an Open-Source Solution for Cloud Computing,” International

Journal of Computer Applications, vol. 55, pp. 38–42, 10 2012.

S. S. Manvi and G. K. Shyam, “Resource management for Infras-

tructure as a Service (IaaS) in cloud computing: A survey,” Journal

of Network and Computer Applications, vol. 41, pp. 424–440, 2014.

W. Sun, K. Zhang, S.-K. Chen, X. Zhang, and H. Liang, “Software

as a Service: An Integration Perspective,” in Service-Oriented Com-

puting – ICSOC 2007 (B. J. Krämer, K.-J. Lin, and P. Narasimhan,

eds.), (Berlin, Heidelberg), pp. 558–569, Springer Berlin Heidel-

berg, 2007.

T. Rosado and J. Bernardino, “An Overview of Openstack Archi-

tecture,” in Proceedings of the 18th International Database Engineer-

ing & Applications Symposium, IDEAS ’14, (New York, NY,

USA), p. 366C367, Association for Computing Machinery, 2014.

G. P. Fernandez and A. Brito, “Secure Container Orchestration

in the Cloud: Policies and Implementation,” in Proceedings of

the 34th ACM/SIGAPP Symposium on Applied Computing, SAC 19,

(New York, NY, USA), p. 138C145, Association for Computing

Machinery, 2019.

P.-J. Maenhaut, B. Volckaert, V. Ongenae, and F. De Turck,

“Resource Management in a Containerized Cloud: Status and

Challenges,” Journal of Network and Systems Management, vol. 28,

pp. 197–246, 11 2019.

R. Buyya and S. N. Srirama, A Lightweight Container Middleware

for Edge Cloud Architectures, pp. 145–170. Wiley, 2019.

T. S. Somasundaram and K. Govindarajan, “CLOUDRB: A frame-

work for scheduling and managing High-Performance Comput-

ing (HPC) applications in science cloud,” Future Generation Com-

puter Systems, vol. 34, pp. 47–65, 2014. Special Section: Distributed

Solutions for Ubiquitous Computing and Ambient Intelligence.

K. Cho, H. Lee, K. Bang, and S. Kim, “Possibility of HPC

Application on Cloud Infrastructure by Container Cluster,” in

2019 IEEE International Conference on Computational Science and

Engineering (CSE) and IEEE International Conference on Embedded

and Ubiquitous Computing (EUC), pp. 266–271, Aug 2019.

[83]

C. Evangelinos and C. Hill, “Cloud Computing for parallel

Scientific HPC Applications: Feasibility of Running Coupled

Atmosphere-Ocean Climate Modelson Amazons EC2,” 2008.

[84] A. M. Beltre, P. Saha, M. Govindaraju, A. Younge, and R. E.

Grant, “Enabling HPC Workloads on Cloud Infrastructure Us-

ing Kubernetes Container Orchestration Mechanisms,” in 2019

IEEE/ACM International Workshop on Containers and New Orches-

tration Paradigms for Isolated Environments in HPC (CANOPIE-

HPC), (Los Alamitos, CA, USA), pp. 11–20, IEEE Computer

Society, nov 2019.

[85] Q. Wofford, P. G. Bridges, and P. Widener, “A Layered Approach

for Modular Container Construction and Orchestration in HPC

Environments,” in Proceedings of the 11th Workshop on Scientific

Cloud Computing, ScienceCloud ’21, (New York, NY, USA), p. 1C8,

Association for Computing Machinery, 2021.

[86] S. Julian, M. Shuey, and S. Cook, “Containers in Research: Initial

Experiences with Lightweight Infrastructure,” in Proceedings of

the XSEDE16 Conference on Diversity, Big Data, and Science at Scale,

XSEDE16, (New York, NY, USA), Association for Computing

Machinery, 2016.

[87] J. Higgins, V. Holmes, and C. Venters, “Orchestrating Docker

Containers in the HPC Environment,” in High Performance Com-

puting (J. M. Kunkel and T. Ludwig, eds.), (Cham), pp. 506–513,

Springer International Publishing, 2015.

[88] N. Zhou, Y. Georgiou, L. Zhong, H. Zhou, and M. Pospieszny,

“Container Orchestration on HPC Systems,” in 2020 IEEE Interna-

tional Conference on Cloud Computing (CLOUD), (Piscataway, New

Jersey, US), IEEE, 2020.

[89] N. Zhou, Y. Georgiou, M. Pospieszny, L. Zhong, H. Zhou,

C. Niethammer, B. Pejak, O. Marko, and D. Hoppe, “Container

Orchestration on HPC Systems through Kubernetes,” Journal of

Cloud Computing: Advances, Systems and Applications, 2021.

[90] N. Zhou, “Containerization and Orchestration on HPC Systems,”

in Sustained Simulation Performance 2019 and 2020, Springer Inter-

national Publishing, 2021.

[91] N. Zhou, L. Zhong, D. Hoppe, B. Pejak, O. Marko, J. Cardona,

M. Czerkawski, I. Andonovic, C. Michie, C. Tachtatzis, E. Alex-

akis, P. Mavrepis, D. Kyriazis, and M. Pospieszny, CYBELE: A

Hybrid Architecture for HPC and Big Data for AI applications in

Agriculture, ch. 13. CRC press, 2022.

[92] F. Liu, K. Keahey, P. Riteau, and J. Weissman, “Dynamically

Negotiating Capacity between On-Demand and Batch Clusters,”

in Proceedings of the International Conference for High Performance

Computing, Networking, Storage, and Analysis, SC 18, (Piscataway,

New Jersey, United States), IEEE Press, 2018.

[93] M. E. Piras, L. Pireddu, M. Moro, and G. Zanetti, “Container

Orchestration on HPC Clusters,” in High Performance Computing

(M. Weiland, G. Juckeland, S. Alam, and H. Jagode, eds.), (Cham),

pp. 25–35, Springer International Publishing, 2019.

[94] F. Wrede and V. von Hof, “Enabling Efficient Use of Algorithmic

Skeletons in Cloud Environments: Container-Based Virtualiza-

tion for Hybrid CPU-GPU Execution of Data-Parallel Skeletons,”

in Proceedings of the Symposium on Applied Computing, SAC 17,

(New York, NY, USA), p. 1593C1596, Association for Computing

Machinery, 2017.

[95] J. Carnero and F. J. Nieto, “Running Simulations in HPC and

Cloud Resources by Implementing Enhanced TOSCA Work-

flows,” in 2018 International Conference on High Performance Com-

puting Simulation (HPCS), pp. 431–438, July 2018.

[96] E. Di Nitto, J. Gorro0Ł9ogoitia, I. Kumara, G. Meditskos,

D. Radolovi04, K. Sivalingam, and R. S. Gonzlez, “An Approach

to Support Automated Deployment of Applications on Hetero-

geneous Cloud-HPC Infrastructures,” in 2020 22nd International

Symposium on Symbolic and Numeric Algorithms for Scientific Com-

puting (SYNASC), pp. 133–140, Sep. 2020.

[97] I. Colonnelli, B. Cantalupo, I. Merelli, and M. Aldinucci, “Stream-

Flow: cross-breeding cloud with HPC,” IEEE Transactions on

Emerging Topics in Computing, pp. 1–1, aug 5555.

[98] “Moab HPC Suite.” https://support.adaptivecomputing.com/wp-

content/uploads/2019/06/Moab-HPC-

Suite datasheet 20190611.pdf (Access on 08/07/2020).

[99] P. Ciechanowicz, M. Poldner, and H. Kuchen, “The Münster

Skeleton Library Muesli: A comprehensive overview,” tech. rep.,

University of Mnster, European Research Center for Information

Systems (ERCIS), 01 2009.

[100] M. Aldinucci, S. Bagnasco, S. Lusso, P. Pasteris, S. Rabellino,

and S. Vallero, “OCCAM: a flexible, multi-purpose and extend-JOURNAL OF SOFTWARE ENGINEERING, VOL. X, NO. X, MAY 2022 (THIS IS THE AUTHORS’ VERSION)

[101]

[102]

[103]

[104]

[105]

[106]

[107]

[108]

[109]

[110]

[111]

[112]

[113]

[114]

[115]

[116]

[117]

[118]

[119]

able HPC cluster,” Journal of Physics: conference series, vol. 898,

p. 082039, oct 2017.

A. Reuther, C. Byun, W. Arcand, D. Bestor, B. Bergeron,

M. Hubbell, M. Jones, P. Michaleas, A. Prout, A. Rosa, and

J. Kepner, “Scalable system scheduling for HPC and big data,”

Journal of Parallel and Distributed Computing, vol. 111, pp. 76 – 92,

2018.

A. Souza, M. Rezaei, E. Laure, and J. Tordsson, “Hybrid Resource

Management for HPC and Data Intensive Workloads,” in 2019

19th IEEE/ACM International Symposium on Cluster, Cloud and Grid

Computing (CCGRID), pp. 399–409, May 2019.

P. Marshall, H. Tufo, and K. Keahey, “High-Performance Com-

puting and the Cloud: A Match Made in Heaven or Hell?,”

XRDS, vol. 19, p. 52C57, Mar. 2013.

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,

T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison,

A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chil-

amkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch:

An Imperative Style, High-Performance Deep Learning Library,”

in Advances in Neural Information Processing Systems 32: Annual

Conference on Neural Information Processing Systems 2019, NeurIPS

2019, 8-14 December 2019, Vancouver, BC, Canada (H. M. Wallach,

H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and

R. Garnett, eds.), pp. 8024–8035, 2019.

B. Gough, GNU Scientific Library Reference Manual - Third Edition.

Network Theory Ltd., 3rd ed., 2009.

S. Nadgowda, S. Suneja, N. Bila, and C. Isci, “Voyager: Complete

Container State Migration,” in 2017 IEEE 37th International Con-

ference on Distributed Computing Systems (ICDCS), pp. 2137–2142,

June 2017.

R. S. Canon and A. Younge, “A Case for Portability and Repro-

ducibility of HPC Containers,” in 2019 IEEE/ACM International

Workshop on Containers and New Orchestration Paradigms for Isolated

Environments in HPC (CANOPIE-HPC), pp. 49–54, Nov 2019.

Z. Shen, Z. Sun, G.-E. Sela, E. Bagdasaryan, C. Delimitrou,

R. Van Renesse, and H. Weatherspoon, “X-Containers: Break-

ing Down Barriers to Improve Performance and Isolation of

Cloud-Native Containers,” in Proceedings of the Twenty-Fourth

International Conference on Architectural Support for Programming

Languages and Operating Systems, ASPLOS ’19, (New York, NY,

USA), p. 121C135, Association for Computing Machinery, 2019.

H. Gantikow, C. Reich, M. Knahl, and N. Clarke, “Providing Se-

curity in Container-Based HPC Runtime Environments,” in High

Performance Computing (M. Taufer, B. Mohr, and J. M. Kunkel,

eds.), (Cham), pp. 685–695, Springer International Publishing,

2016.

K. Shafie Khorassani, C. C. Chen, B. Ramesh, A. Shafi, H. Subra-

moni, and D. Panda, “High Performance MPI over the Slingshot

Interconnect: Early Experiences,” in Practice and Experience in

Advanced Research Computing, PEARC ’22, (New York, NY, USA),

Association for Computing Machinery, 2022.

J. Zhang, X. Lu, and D. K. Panda, “High Performance MPI Library

for Container-Based HPC Cloud on InfiniBand Clusters,” in 2016

45th International Conference on Parallel Processing (ICPP), pp. 268–

277, Aug 2016.

G. Mateescu, W. Gentzsch, and C. J. Ribbens, “Hybrid

Computing-Where HPC Meets Grid and Cloud Computing,”

Future Gener. Comput. Syst., vol. 27, p. 440C453, May 2011.

R. Mayer and H.-A. Jacobsen, “Scalable Deep Learning on

Distributed Infrastructures: Challenges, Techniques, and Tools,”

ACM Comput. Surv., vol. 53, Feb. 2020.

E. A. Huerta, A. Khan, E. Davis, C. Bushell, W. D. Gropp, D. S.

Katz, V. Kindratenko, S. Koric, W. T. C. Kramer, B. McGinty, and

et al., “Convergence of artificial intelligence and high perfor-

mance computing on NSF-supported cyberinfrastructure,” Jour-

nal of Big Data, vol. 7, Oct 2020.

A. Sergeev and M. D. Balso, “Horovod: fast and easy distributed

deep learning in TensorFlow,” CoRR, vol. abs/1802.05799, 2018.

B. S. Allen, M. A. Ezell, P. Peltz, D. Jacobsen, E. Roman, C. Lu-

eninghoener, and J. L. Wofford, “Modernizing the HPC System

Software Stack,” CoRR, vol. abs/2007.10290, 2020.

M. Hüttermann, DevOps for Developers. Apress, 2012.

L. Leite, C. Rocha, F. Kon, D. Milojicic, and P. Meirelles, “A

Survey of DevOps Concepts and Challenges,” ACM Comput.

Surv., vol. 52, Nov. 2019.

Z. Sampedro, A. Holt, and T. Hauser, “Continuous Integration

and Delivery for HPC: Using Singularity and Jenkins,” in Proceed-

[120]

[121]

[122]

[123]

[124]

[125]

[126]

ings of the Practice and Experience on Advanced Research Computing,

PEARC ’18, (New York, NY, USA), Association for Computing

Machinery, 2018.

J. Muli and A. Okoth, Jenkins Fundamentals: Accelerate Deliverables,

Manage Builds, and Automate Pipelines with Jenkins. Packt Publish-

ing, 2018.

M. H02b and D. Kranzlmller, “Enabling EASEY Deployment of

Containerized Applications for Future HPC Systems,” in Lecture

Notes in Computer Science, pp. 206–219, Springer International

Publishing, 2020.

M. Barika, S. Garg, A. Y. Zomaya, L. Wang, A. V. Moorsel,

and R. Ranjan, “Orchestrating Big Data Analysis Workflows in

the Cloud: Research Challenges, Survey, and Future Directions,”

ACM Comput. Surv., vol. 52, Sept. 2019.

C. Crin, N. Greneche, and T. Menouer, “Towards Pervasive Con-

tainerization of HPC Job Schedulers,” in 2020 IEEE 32nd Inter-

national Symposium on Computer Architecture and High Performance

Computing (SBAC-PAD), pp. 281–288, Sep. 2020.

D. Huber, M. Streubel, I. Comprés, M. Schulz, M. Schreiber, and

H. Pritchard, “Towards Dynamic Resource Management with

MPI Sessions and PMIx,” in Proceedings of the 29th European MPI

Users’ Group Meeting, EuroMPI/USA’22, (New York, NY, USA),

p. 57C67, Association for Computing Machinery, 2022.

N. Greneche, T. Menouer, C. Cérin, and O. Richard, “A methodol-

ogy to scale containerized HPC infrastructures in the Cloud,” in

European Conference on Parallel Processing, pp. 203–217, Springer,

2022.

N. P. Jouppi and etc., “In-Datacenter Performance Analysis of

a Tensor Processing Unit,” in Proceedings of the 44th Annual

International Symposium on Computer Architecture, ISCA ’17, (New

York, NY, USA), p. 1C12, Association for Computing Machinery,

2017.