KTH Logo

Working with Python Virtual Environments


Note: This post has been updated to reflect the modules on Dardel (December 2021).


When we use Python in our work or personal projects, it is often necessary to use a number of packages that are not distributed as standard Python libraries. We therefore need to install those packages based on the specific requirements of every project. In the scenario of working on multiple projects, it is not uncommon that different projects have conflicting requirements in terms of packages. For example, project A may require version 1.0 of a certain package, while project B may require version 2.0 of the same package. A solution to this conflict is to separate the packages for different projects or purposes with the help of a so-called “virtual environment”.

A Python virtual environment is an isolated run-time environment that makes it possible to install and execute Python packages without interfering with the outside world. Without a virtual environment, Python packages are installed either in the system site directory, which can be located via the following command:

$ python -c 'import site; print(site.getsitepackages())'

or in the so-called Python user base, which is usually in the “$HOME/.local” folder. A Python package installed in this way can have only one version, and it is therefore not possible to work with two or more projects that have conflicting requirements regarding the versions of a certain Python package. With the help of a virtual environment, we can have different Python site directories for different projects and have those site directories isolated from each other and from the system site directory.

This blog post will briefly introduce two ways of creating and managing a Python virtual environment: venv or conda.

(more…)

Parallel programming in Python: mpi4py (part 2)

In part 1 of this post, we introduced the mpi4py module (MPI for Python) which provides an object-oriented interface for Python resembling the message passing interface (MPI) and enables Python programs to exploit multiple processors on multiple compute nodes.

The mpi4py module provides methods for communicating various types of Python objects in different ways. In part 1 of this post we showed you how to communicate generic Python objects between MPI processes – the methods for doing this have names that are all lowercase letters. Some of these methods were introduced in part 1 of this post. It is also possible to directly send buffer-like objects, where the data is exposed in a raw format and can be accessed without copying, between MPI processes. The methods for doing this start with an uppercase letter.

In this post we continue introducing the mpi4py module, with a focus on the direct communication of buffer-like objects using the latter type of methods (that is, those starting with a capital letter), including Send, Recv, Isend, Irecv, Bcast, and Reduce, as well as Scatterv and Gatherv, which are vector variants of Scatter and Gather, respectively.

(more…)

Parallel programming in Python: mpi4py (part 1)

In previous posts we have introduced the multiprocessing module which makes it possible to parallelize Python programs on shared memory systems. The limitation of the multiprocessing module is that it does not support parallelization over multiple compute nodes (i.e. on distributed memory systems). To overcome this limitation and enable cross-node parallelization, we can use MPI for Python, that is, the mpi4py module. This module provides an object-oriented interface that resembles the message passing interface (MPI),  and hence allows Python programs to exploit multiple processors on multiple compute nodes. The mpi4py module supports both point-to-point and collective communications for Python objects as well as buffer-like objects. This post will briefly introduce the use of the mpi4py module in communicating generic Python objects, via all-lowercase methods including send, recv, isend, irecv, bcast, scatter, gather, and reduce.

(more…)

Parallel programming in Python: multiprocessing (part 2)

In the previous post we introduced the Pool class of the multiprocessing module. In this post we continue on and introduce the Process class, which makes it possible to have direct control over individual processes.

A process can be created by providing a target function and its input arguments to the Process constructor. The process can then be started with the start method and ended using the join method. Below is a very simple example that prints the square of a number.

import multiprocessing as mp

def square(x):
    print(x * x)

p = mp.Process(target=square, args=(5,))
p.start()
p.join()

(more…)

Parallel programming in Python: multiprocessing (part 1)

Parallel programming solves big numerical problems by dividing them into smaller sub-tasks, and hence reduces the overall computational time on multi-processor and/or multi-core machines. Parallel programming is well supported in traditional programming languages like C and FORTRAN, which are suitable for “heavy-duty” computational tasks. Traditionally, Python is considered to not support parallel programming very well, partly because of the global interpreter lock (GIL). However, things have changed over time. Thanks to the development of a rich variety of libraries and packages, the support for parallel programming in Python is now much better.

This post (and the following part) will briefly introduce the multiprocessing module in Python, which effectively side-steps the GIL by using subprocesses instead of threads. The multiprocessing module provides many useful features and is very suitable for symmetric multiprocessing (SMP) and shared memory systems. In this post we focus on the Pool class of the multiprocessing module, which controls a pool of worker processes and supports both synchronous and asynchronous parallel execution.

(more…)