In part 1 of this post, we introduced the
mpi4py module (MPI for Python) which provides an object-oriented interface for Python resembling the message passing interface (MPI) and enables Python programs to exploit multiple processors on multiple compute nodes.
mpi4py module provides methods for communicating various types of Python objects in different ways. In part 1 of this post we showed you how to communicate generic Python objects between MPI processes – the methods for doing this have names that are all lowercase letters. Some of these methods were introduced in part 1 of this post. It is also possible to directly send buffer-like objects, where the data is exposed in a raw format and can be accessed without copying, between MPI processes. The methods for doing this start with an uppercase letter.
In this post we continue introducing the
mpi4py module, with a focus on the direct communication of buffer-like objects using the latter type of methods (that is, those starting with a capital letter), including
Reduce, as well as
Gatherv, which are vector variants of
In previous posts we have introduced the
multiprocessing module which makes it possible to parallelize Python programs on shared memory systems. The limitation of the
multiprocessing module is that it does not support parallelization over multiple compute nodes (i.e. on distributed memory systems). To overcome this limitation and enable cross-node parallelization, we can use MPI for Python, that is, the
mpi4py module. This module provides an object-oriented interface that resembles the message passing interface (MPI), and hence allows Python programs to exploit multiple processors on multiple compute nodes. The
mpi4py module supports both point-to-point and collective communications for Python objects as well as buffer-like objects. This post will briefly introduce the use of the
mpi4py module in communicating generic Python objects, via all-lowercase methods including
You may have heard of the Top500 list. It ranks the world’s 500 most powerful supercomputers based on their performance as measured by the Linpack benchmark. Published twice per year (in June and November) since 1993, the Top500 list records the development of supercomputers over the past two to three decades. In addition to performance, the Top500 list also summarises the main characteristics of the supercomputers in the list. Therefore, it contains much richer information than a mere ranking of the supercomputers by performance. In this post, we’ll have a closer look at the Top500 list and relevant topics, including supercomputers, performance, and statistics on processors and co-processors.
The Top500 list is all about supercomputers. It therefore makes sense to have a brief overview of supercomputers before going into the details of the Top500 list. As shown in the image below, a supercomputer usually consists of many cabinets (also called racks), which are each about the size of a fridge. Each cabinet contains a stack of blades (with each blade being about the size of a PC turned sideways). Each blade has several compute nodes mounted in it, with each compute node having one or more multicore processors. For example, PDC’s Beskow system has 11 cabinets, 515 blades, 2,060 compute nodes, and a total of 67,456 cores . Because of their extraordinary computational capability, supercomputers have been used in many fields including molecular modelling, quantum mechanics, physical simulations, and climate research. This is vividly reflected in the design on the Titan supercomputer, which is still among the world’s top 10 most powerful supercomputers (as of the November 2018 Top500 list).
In the previous post we introduced the
Pool class of the
multiprocessing module. In this post we continue on and introduce the
Process class, which makes it possible to have direct control over individual processes.
A process can be created by providing a target function and its input arguments to the
Process constructor. The process can then be started with the
start method and ended using the
join method. Below is a very simple example that prints the square of a number.
import multiprocessing as mp
print(x * x)
p = mp.Process(target=square, args=(5,))
Parallel programming solves big numerical problems by dividing them into smaller sub-tasks, and hence reduces the overall computational time on multi-processor and/or multi-core machines. Parallel programming is well supported in traditional programming languages like C and FORTRAN, which are suitable for “heavy-duty” computational tasks. Traditionally, Python is considered to not support parallel programming very well, partly because of the global interpreter lock (GIL). However, things have changed over time. Thanks to the development of a rich variety of libraries and packages, the support for parallel programming in Python is now much better.
This post (and the following part) will briefly introduce the
multiprocessing module in Python, which effectively side-steps the GIL by using subprocesses instead of threads. The
multiprocessing module provides many useful features and is very suitable for symmetric multiprocessing (SMP) and shared memory systems. In this post we focus on the
Pool class of the
multiprocessing module, which controls a pool of worker processes and supports both synchronous and asynchronous parallel execution.