PDC Blog

Working with Python Virtual Environments

Xin Li

, 2020-11-10

Note: This post has been updated to reflect the modules on Dardel (December 2021).

When we use Python in our work or personal projects, it is often necessary to use a number of packages that are not distributed as standard Python libraries. We therefore need to install those packages based on the specific requirements of every project. In the scenario of working on multiple projects, it is not uncommon that different projects have conflicting requirements in terms of packages. For example, project A may require version 1.0 of a certain package, while project B may require version 2.0 of the same package. A solution to this conflict is to separate the packages for different projects or purposes with the help of a so-called “virtual environment”.

A Python virtual environment is an isolated run-time environment that makes it possible to install and execute Python packages without interfering with the outside world. Without a virtual environment, Python packages are installed either in the system site directory, which can be located via the following command:

$ python -c 'import site; print(site.getsitepackages())'

or in the so-called Python user base, which is usually in the “$HOME/.local” folder. A Python package installed in this way can have only one version, and it is therefore not possible to work with two or more projects that have conflicting requirements regarding the versions of a certain Python package. With the help of a virtual environment, we can have different Python site directories for different projects and have those site directories isolated from each other and from the system site directory.

This blog post will briefly introduce two ways of creating and managing a Python virtual environment: venv or conda.

(more…)

Parallel programming in Python: mpi4py (part 2)

Xin Li

, 2019-11-06

In part 1 of this post, we introduced the mpi4py module (MPI for Python) which provides an object-oriented interface for Python resembling the message passing interface (MPI) and enables Python programs to exploit multiple processors on multiple compute nodes.

The mpi4py module provides methods for communicating various types of Python objects in different ways. In part 1 of this post we showed you how to communicate generic Python objects between MPI processes – the methods for doing this have names that are all lowercase letters. Some of these methods were introduced in part 1 of this post. It is also possible to directly send buffer-like objects, where the data is exposed in a raw format and can be accessed without copying, between MPI processes. The methods for doing this start with an uppercase letter.

In this post we continue introducing the mpi4py module, with a focus on the direct communication of buffer-like objects using the latter type of methods (that is, those starting with a capital letter), including Send, Recv, Isend, Irecv, Bcast, and Reduce, as well as Scatterv and Gatherv, which are vector variants of Scatter and Gather, respectively.

(more…)

Parallel programming in Python: mpi4py (part 1)

Xin Li

, 2019-08-02

In previous posts we have introduced the multiprocessing module which makes it possible to parallelize Python programs on shared memory systems. The limitation of the multiprocessing module is that it does not support parallelization over multiple compute nodes (i.e. on distributed memory systems). To overcome this limitation and enable cross-node parallelization, we can use MPI for Python, that is, the mpi4py module. This module provides an object-oriented interface that resembles the message passing interface (MPI), and hence allows Python programs to exploit multiple processors on multiple compute nodes. The mpi4py module supports both point-to-point and collective communications for Python objects as well as buffer-like objects. This post will briefly introduce the use of the mpi4py module in communicating generic Python objects, via all-lowercase methods including send, recv, isend, irecv, bcast, scatter, gather, and reduce.

(more…)

Top500 list: a brief introduction

Xin Li

, 2019-04-15

You may have heard of the Top500 list. It ranks the world’s 500 most powerful supercomputers based on their performance as measured by the Linpack benchmark. Published twice per year (in June and November) since 1993, the Top500 list records the development of supercomputers over the past two to three decades. In addition to performance, the Top500 list also summarises the main characteristics of the supercomputers in the list. Therefore, it contains much richer information than a mere ranking of the supercomputers by performance. In this post, we’ll have a closer look at the Top500 list and relevant topics, including supercomputers, performance, and statistics on processors and co-processors.

Supercomputer

The Top500 list is all about supercomputers. It therefore makes sense to have a brief overview of supercomputers before going into the details of the Top500 list. As shown in the image below, a supercomputer usually consists of many cabinets (also called racks), which are each about the size of a fridge. Each cabinet contains a stack of blades (with each blade being about the size of a PC turned sideways). Each blade has several compute nodes mounted in it, with each compute node having one or more multicore processors. For example, PDC’s Beskow system has 11 cabinets, 515 blades, 2,060 compute nodes, and a total of 67,456 cores . Because of their extraordinary computational capability, supercomputers have been used in many fields including molecular modelling, quantum mechanics, physical simulations, and climate research. This is vividly reflected in the design on the Titan supercomputer, which is still among the world’s top 10 most powerful supercomputers (as of the November 2018 Top500 list).

(more…)

Parallel programming in Python: multiprocessing (part 2)

Xin Li

, 2019-03-08

In the previous post we introduced the Pool class of the multiprocessing module. In this post we continue on and introduce the Process class, which makes it possible to have direct control over individual processes.

A process can be created by providing a target function and its input arguments to the Process constructor. The process can then be started with the start method and ended using the join method. Below is a very simple example that prints the square of a number.

import multiprocessing as mp

def square(x):
    print(x * x)

p = mp.Process(target=square, args=(5,))
p.start()
p.join()

(more…)

Studies

Research

Collaboration

About KTH

Library

Working with Python Virtual Environments

Parallel programming in Python: mpi4py (part 2)

Parallel programming in Python: mpi4py (part 1)

Top500 list: a brief introduction

Supercomputer

Parallel programming in Python: multiprocessing (part 2)