Xin Li – Page 2

Parallel programming in Python: multiprocessing (part 1)

Xin Li

, 2019-02-13

Parallel programming solves big numerical problems by dividing them into smaller sub-tasks, and hence reduces the overall computational time on multi-processor and/or multi-core machines. Parallel programming is well supported in traditional programming languages like C and FORTRAN, which are suitable for “heavy-duty” computational tasks. Traditionally, Python is considered to not support parallel programming very well, partly because of the global interpreter lock (GIL). However, things have changed over time. Thanks to the development of a rich variety of libraries and packages, the support for parallel programming in Python is now much better.

This post (and the following part) will briefly introduce the multiprocessing module in Python, which effectively side-steps the GIL by using subprocesses instead of threads. The multiprocessing module provides many useful features and is very suitable for symmetric multiprocessing (SMP) and shared memory systems. In this post we focus on the Pool class of the multiprocessing module, which controls a pool of worker processes and supports both synchronous and asynchronous parallel execution.

(more…)

Scalability: strong and weak scaling

Xin Li

, 2018-11-09

High performance computing (HPC) clusters are able to solve big problems using a large number of processors. This is also known as parallel computing, where many processors work simultaneously to produce exceptional computational power and to significantly reduce the total computational time. In such scenarios, scalability or scaling is widely used to indicate the ability of hardware and software to deliver greater computational power when the amount of resources is increased. For HPC clusters, it is important that they are scalable, in other words that the capacity of the whole system can be proportionally increased by adding more hardware. For software, scalability is sometimes referred to as parallelization efficiency — the ratio between the actual speedup and the ideal speedup obtained when using a certain number of processors.

In this post we focus on software scalability and discuss two common types of scaling. The speedup in parallel computing can be straightforwardly defined as

speedup = t₁ / t_N

where t₁ is the computational time for running the software using one processor, and t_N is the computational time running the same software with N processors. Ideally, we would like software to have a linear speedup that is equal to the number of processors (speedup = N), as that would mean that every processor would be contributing 100% of its computational power. Unfortunately, this is a very challenging goal for real applications to attain.

(more…)

Getting Started with SLURM

Xin Li

, 2018-08-07

Note: This post has been updated to reflect the changes in the queueing system after the software upgrade of Beskow in June, 2019.

Our supercomputer clusters at PDC, equipped with thousands of multi-core processors, can be used to solve large scientific/engineering problems. Because of their much higher performance compared to desktop computers or workstations, supercomputer clusters are also called high performance computing (HPC) clusters. Common application fields of HPC clusters include machine learning, galaxy simulation, climate modelling, bioinformatics, computational physics, quantum chemistry, etc.

Building an HPC cluster demands sophisticated technologies and hardware, but fortunately a regular HPC user doesn’t have to worry too much about that. As an HPC user you can submit jobs that request the compute nodes (physical groups of processors) to do the calculations/simulations you want. But note that you are not the only user of an HPC cluster, there are typically many users using the cluster for the same time period and all of them will be submitting their own jobs. You may have realized by now that there needs to be some soft of queueing system that organizes the jobs and distributes them to the compute nodes. This post will briefly introduce SLURM, which is used in all PDC clusters and is the most widely used workload manager for HPC clusters.

What is SLURM?

SLURM, or Simple Linux Utility for Resource Management, is an open-source cluster management and job scheduling system. It provides three key functions

Allocation of resources (compute nodes) to users’ jobs
Framework for starting/executing/monitoring jobs
Queue management to avoid resource contention

In other words, SLURM oversees all the resources in the whole HPC cluster. Users then send their jobs (requests to run calculations/simulations) to SLURM for later execution. SLURM will keep all the submitted jobs in a queue and decide what priorities the jobs have and how the jobs are distributed to the compute nodes. SLURM provides a series of useful commands for the user which we will now go through.

(more…)

Studies

Research

Collaboration

About KTH

Library

Parallel programming in Python: multiprocessing (part 1)

Scalability: strong and weak scaling

Getting Started with SLURM

What is SLURM?