Parallel programming in Python: mpi4py (part 1)

In previous posts we have introduced the multiprocessing module which makes it possible to parallelize Python programs on shared memory systems. The limitation of the multiprocessing module is that it does not support parallelization over multiple compute nodes (i.e. on distributed memory systems). To overcome this limitation and enable cross-node parallelization, we can use MPI for Python, that is, the mpi4py module. This module provides an object-oriented interface that resembles the message passing interface (MPI),  and hence allows Python programs to exploit multiple processors on multiple compute nodes. The mpi4py module supports both point-to-point and collective communications for Python objects as well as buffer-like objects. This post will briefly introduce the use of the mpi4py module in communicating generic Python objects, via all-lowercase methods including send, recv, isend, irecv, bcast, scatter, gather, and reduce.


Top500 list: a brief introduction

You may have heard of the Top500 list. It ranks the world’s 500 most powerful supercomputers based on their performance as measured by the Linpack benchmark. Published twice per year (in June and November) since 1993, the Top500 list records the development of supercomputers over the past two to three decades. In addition to performance, the Top500 list also summarises the main characteristics of the supercomputers in the list. Therefore, it contains much richer information than a mere ranking of the supercomputers by performance. In this post, we’ll have a closer look at the Top500 list and relevant topics, including supercomputers, performance, and statistics on processors and co-processors.


The Top500 list is all about supercomputers. It therefore makes sense to have a brief overview of supercomputers before going into the details of the Top500 list. As shown in the image below, a supercomputer usually consists of many cabinets (also called racks), which are each about the size of a fridge. Each cabinet contains a stack of blades (with each blade being about the size of a PC turned sideways). Each blade has several compute nodes mounted in it, with each compute node having one or more multicore processors. For example, PDC’s Beskow system has 11 cabinets, 515 blades, 2,060 compute nodes, and a total of 67,456 cores . Because of their extraordinary computational capability, supercomputers have been used in many fields including molecular modelling, quantum mechanics, physical simulations, and climate research. This is vividly reflected in the design on the Titan supercomputer, which is still among the world’s top 10 most powerful supercomputers (as of the November 2018 Top500 list).


Parallel programming in Python: multiprocessing (part 2)

In the previous post we introduced the Pool class of the multiprocessing module. In this post we continue on and introduce the Process class, which makes it possible to have direct control over individual processes.

A process can be created by providing a target function and its input arguments to the Process constructor. The process can then be started with the start method and ended using the join method. Below is a very simple example that prints the square of a number.

import multiprocessing as mp

def square(x):
    print(x * x)

p = mp.Process(target=square, args=(5,))


Parallel programming in Python: multiprocessing (part 1)

Parallel programming solves big numerical problems by dividing them into smaller sub-tasks, and hence reduces the overall computational time on multi-processor and/or multi-core machines. Parallel programming is well supported in traditional programming languages like C and FORTRAN, which are suitable for “heavy-duty” computational tasks. Traditionally, Python is considered to not support parallel programming very well, partly because of the global interpreter lock (GIL). However, things have changed over time. Thanks to the development of a rich variety of libraries and packages, the support for parallel programming in Python is now much better.

This post (and the following part) will briefly introduce the multiprocessing module in Python, which effectively side-steps the GIL by using subprocesses instead of threads. The multiprocessing module provides many useful features and is very suitable for symmetric multiprocessing (SMP) and shared memory systems. In this post we focus on the Pool class of the multiprocessing module, which controls a pool of worker processes and supports both synchronous and asynchronous parallel execution.


Using Jupyter Notebooks to manage SLURM jobs

Jupyter Notebooks are gaining in popularity across many academic and industrial research fields . The in-browser, cell-based user interface of the Notebook application enables researchers to interleave code (in many different programming languages) with rich text, graphics, equations, and so forth. Typical use cases involve quick prototyping of code, interactive data analysis and visualisation, keeping digital notebooks for daily tasks, and as a teaching tool. Notebooks are also being used for reproducible workflows and to share scientific analysis with colleagues or whole research communities. High Performance Computing (HPC) is rapidly catching up with this trend and many HPC providers, including PDC, now offer Jupyter Notebooks as a way to interact with their HPC resources.

In October last year, PDC organised a workshop on “HPC Tools for the Modern Era“. One of the workshop modules focused on possible use cases of Jupyter Notebooks in an HPC environment. The notebooks that were used during the workshop are available from the PDC Support GitHub repository (, and the rest of this post discusses an example use case based on one of those workshop notebooks.

Jupyter Notebooks are suitable for various HPC usage patterns and workflows. In this blog post we will demonstrate one possible use case: Interacting with the SLURM scheduler from a notebook running in your browser, submitting and monitoring batch jobs and performing light-weight interactive analysis of running jobs. Note that this usage of Jupyter is possible on both Tegner and Beskow. In a future blog post, we will demonstrate how to run interactive analysis directly on a Tegner compute node to perform heavy analysis on large datasets, which might, for instance, be generated from other jobs running at Tegner or Beskow (since both clusters share the klemming file system). This use case will however not be possible on Beskow since the Beskow compute nodes have restricted network access to the outside world.

The following overview will assume that you have some familiarity with Jupyter Notebooks, but if you’ve never tried them out, there are plenty of online resources to get you started. Apart from using the notebooks from the PDC/PRACE workshop (which were mentioned earlier), you can, for example: