{"id":471,"date":"2019-08-02T14:27:52","date_gmt":"2019-08-02T12:27:52","guid":{"rendered":"https:\/\/www.kth.se\/blogs\/pdc\/?p=471"},"modified":"2019-08-02T14:27:52","modified_gmt":"2019-08-02T12:27:52","slug":"parallel-programming-in-python-mpi4py-part-1","status":"publish","type":"post","link":"https:\/\/www.kth.se\/blogs\/pdc\/2019\/08\/parallel-programming-in-python-mpi4py-part-1\/","title":{"rendered":"Parallel programming in Python: mpi4py (part 1)"},"content":{"rendered":"<div class=\"post-content-wrapper\"><p>In <a href=\"https:\/\/www.kth.se\/blogs\/pdc\/2019\/02\/parallel-programming-in-python-multiprocessing-part-1\/\">previous posts<\/a> we have introduced the <code>multiprocessing<\/code> module which makes it possible to parallelize Python programs on shared memory systems. The limitation of the <code>multiprocessing<\/code> module is that it does not support parallelization over multiple compute nodes (i.e. on distributed memory systems). To overcome this limitation and enable cross-node parallelization, we can use <a href=\"https:\/\/mpi4py.readthedocs.io\/en\/stable\/index.html\">MPI for Python<\/a>, that is, the <code>mpi4py<\/code> module. This module provides an object-oriented interface that resembles the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Message_Passing_Interface\">message passing interface (MPI)<\/a>,\u00a0 and hence allows Python programs to exploit multiple processors on multiple compute nodes. The <a href=\"https:\/\/mpi4py.readthedocs.io\/en\/stable\/index.html\"><code>mpi4py<\/code><\/a> module supports both point-to-point and collective communications for Python objects as well as buffer-like objects. This post will briefly introduce the use of the <code>mpi4py<\/code> module in communicating generic Python objects, via all-lowercase methods including <code>send<\/code>, <code>recv<\/code>, <code>isend<\/code>, <code>irecv<\/code>, <code>bcast<\/code>, <code>scatter<\/code>, <code>gather<\/code>, and <code>reduce<\/code>.<\/p>\n<p><!--more--><\/p>\n<h2>Basics of mpi4py<\/h2>\n<p>The <code>mpi4py<\/code> module has been installed on <a href=\"https:\/\/www.pdc.kth.se\/hpc-services\/computing-systems\/beskow-1.737436\">Beskow<\/a>. To use <code>mpi4py<\/code> on Beskow, we need to load the module first<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"># on Beskow\r\nmodule load mpi4py\/3.0.2\/py37<\/pre>\n<\/div>\n<\/div>\n<p>The <code>mpi4py\/3.0.2\/py37<\/code> module will automatically load the <code>anaconda\/2019.03\/py37<\/code> module, which is an open-source distribution of Python for scientific computing (see <a href=\"https:\/\/www.anaconda.com\/distribution\/\">Anaconda<\/a>). We recommend Python 3 because <a href=\"https:\/\/pythonclock.org\/\">Python 2 is going to be retired soon<\/a>. After loading the <code>mpi4py<\/code> module, we can check that the <code>python3<\/code> command (which is needed to run\u00a0 <code>mpi4py<\/code> code) is available in the <code>PATH<\/code> environment variable, using the <code>which<\/code> command<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">$ which python3\r\n\/pdc\/vol\/anaconda\/2019.03\/py37\/bin\/python3<\/pre>\n<\/div>\n<\/div>\n<p>Now we can write our first Python program with <code>mpi4py<\/code><\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">from<\/span> mpi4py <span style=\"color: #800000;font-weight: bold\">import<\/span> MPI\r\n\r\ncomm <span style=\"color: #808030\">=<\/span> MPI<span style=\"color: #808030\">.<\/span>COMM_WORLD\r\nrank <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_rank<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\nsize <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_size<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'Hello from process {} out of {}'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">,<\/span> size<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>To understand the code we need to know several concepts in MPI. In parallel programming with MPI, we need the so-called <strong>communicator<\/strong>, which is a group of processes that can talk to each other. To identify the processes with that group, each process is assigned a <strong>rank<\/strong> that is unique within the communicator. It also makes sense to know the total number of processes, which is often referred to as the <strong>size<\/strong> of the communicator.<\/p>\n<p>In the above code, we defined the variable <code>comm<\/code> as the default communicator <code>MPI.COMM_WORLD<\/code>. You may recognize that <code>MPI.COMM_WORLD<\/code> in <code>mpi4py<\/code> corresponds to <code>MPI_COMM_WORLD<\/code> in <a href=\"https:\/\/en.wikipedia.org\/wiki\/Message_Passing_Interface#Example_program\">MPI programs written in C<\/a>. The rank of each process is retrieved via the <code>Get_rank<\/code> method of the communicator, and the size of the communicator is obtained by the <code>Get_size<\/code> method. Now that the rank and size are known, we can print a \u201cHello\u201d message from each process.<\/p>\n<p>The usual way to execute an <code>mpi4py<\/code> code in parallel is to use <code>mpirun<\/code> and <code>python3<\/code>, for example \u201c<code>mpirun -n 4 python3 hello.py<\/code>\u201d will run the code on 4 processes, assuming that the code is saved in a file named \u201chello.py\u201d. On <a href=\"https:\/\/www.pdc.kth.se\/hpc-services\/computing-systems\/beskow-1.737436\">Beskow<\/a>, however, the setup is different since the resources (compute nodes) are managed by the <a href=\"https:\/\/slurm.schedmd.com\/quickstart.html\">SLURM workload manager<\/a>. We&#8217;ll need to first request one or more compute nodes using <code>salloc<\/code>, and then execute \u201c<code>python3 hello.py<\/code>\u201d via <code>srun<\/code>. For example, the output from running the code on 4 processes is as follows:<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">$ salloc --nodes 1 -t 00:10:00 -A &lt;your-project-account&gt;\r\nsalloc: Granted job allocation &lt;your-job-id&gt;\r\n\r\n$ srun -n 4 python3 hello.py\r\nHello from process 0 out of 4\r\nHello from process 1 out of 4\r\nHello from process 2 out of 4\r\nHello from process 3 out of 4<\/pre>\n<\/div>\n<\/div>\n<h2>Point-to-point communication<\/h2>\n<p>For point-to-point communication between Python objects, <code>mpi4py<\/code> provides the <code>send<\/code>\u00a0and <code>recv<\/code> methods that are similar to those in MPI. An example of code for passing (which is usually referred to as \u201ccommunicating\u201d) a Python dictionary object between the master process (which has rank = 0) and the worker processes (that all have rank &gt; 0) is given below.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">from<\/span> mpi4py <span style=\"color: #800000;font-weight: bold\">import<\/span> MPI\r\n\r\ncomm <span style=\"color: #808030\">=<\/span> MPI<span style=\"color: #808030\">.<\/span>COMM_WORLD\r\nrank <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_rank<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\nsize <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_size<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #696969\"># master process<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">if<\/span> rank <span style=\"color: #44aadd\">==<\/span> <span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">:<\/span>\r\n    data <span style=\"color: #808030\">=<\/span> <span style=\"color: #800080\">{<\/span><span style=\"color: #0000e6\">'x'<\/span><span style=\"color: #808030\">:<\/span> <span style=\"color: #008c00\">1<\/span><span style=\"color: #808030\">,<\/span> <span style=\"color: #0000e6\">'y'<\/span><span style=\"color: #808030\">:<\/span> <span style=\"color: #008000\">2.0<\/span><span style=\"color: #800080\">}<\/span>\r\n    <span style=\"color: #696969\"># master process sends data to worker processes by<\/span>\r\n    <span style=\"color: #696969\"># going through the ranks of all worker processes<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">for<\/span> i <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #008c00\">1<\/span><span style=\"color: #808030\">,<\/span> size<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n        comm<span style=\"color: #808030\">.<\/span>send<span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">,<\/span> dest<span style=\"color: #808030\">=<\/span>i<span style=\"color: #808030\">,<\/span> tag<span style=\"color: #808030\">=<\/span>i<span style=\"color: #808030\">)<\/span>\r\n        <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'Process {} sent data:'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">,<\/span> data<span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #696969\"># worker processes<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">else<\/span><span style=\"color: #808030\">:<\/span>\r\n    <span style=\"color: #696969\"># each worker process receives data from master process<\/span>\r\n    data <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>recv<span style=\"color: #808030\">(<\/span>source<span style=\"color: #808030\">=<\/span><span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">,<\/span> tag<span style=\"color: #808030\">=<\/span>rank<span style=\"color: #808030\">)<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'Process {} received data:'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">,<\/span> data<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>In this example (which we will name \u201csend-recv.py\u201d), the dictionary object <code>data<\/code> is sent from the master process to each worker process. We check the value of <code>rank<\/code> to determine if a process is the master process. The <code>for<\/code> loop is only executed for the master process, and <code>i<\/code> (starting from 1) goes through the ranks of the worker processes. We can also use <code>tag<\/code> in the <code>send<\/code>\u00a0and <code>recv<\/code> methods to distinguish between the messages if there are multiple communications between two processes. The output from running the example on 4 processes is as follows:<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">$ srun -n 4 python3 send-recv.py\r\nProcess 0 sent data: {'x': 1, 'y': 2.0}\r\nProcess 0 sent data: {'x': 1, 'y': 2.0}\r\nProcess 0 sent data: {'x': 1, 'y': 2.0}\r\nProcess 1 received data: {'x': 1, 'y': 2.0}\r\nProcess 2 received data: {'x': 1, 'y': 2.0}\r\nProcess 3 received data: {'x': 1, 'y': 2.0}<\/pre>\n<\/div>\n<\/div>\n<p>Note, however, that the above example uses blocking communication methods, which means that the execution of code will not proceed until the communication is completed. This blocking behaviour is not always desirable in parallel programming; in some cases it is beneficial to use non-blocking communication methods. As shown in the code below, <code>isend<\/code> and <code>irecv<\/code> are non-blocking methods that immediately return <code>Request<\/code> objects, and we can use the\u00a0<code>wait<\/code> method to manage the completion of the communication. If necessary, one can do some computations between <code>comm.isend(...)<\/code> and <code>req.wait()<\/code> to increase the efficiency by overlapping the communications and computations.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">if<\/span> rank <span style=\"color: #44aadd\">==<\/span> <span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">:<\/span>\r\n    data <span style=\"color: #808030\">=\u00a0<span style=\"color: #800080\">{<\/span><span style=\"color: #0000e6\">'x'<\/span>: <span style=\"color: #008c00\">1<\/span>, <span style=\"color: #0000e6\">'y'<\/span>: <span style=\"color: #008000\">2.0<\/span><span style=\"color: #800080\">}<\/span><\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">for<\/span> i <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #008c00\">1<\/span><span style=\"color: #808030\">,<\/span> size<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n        req <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>isend<span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">,<\/span> dest<span style=\"color: #808030\">=<\/span>i<span style=\"color: #808030\">,<\/span> tag<span style=\"color: #808030\">=<\/span>i<span style=\"color: #808030\">)<\/span>\r\n        req<span style=\"color: #808030\">.<\/span>wait<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\n        <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'Process {} sent data:'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">,<\/span> data<span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #800000;font-weight: bold\">else<\/span><span style=\"color: #808030\">:<\/span>\r\n    req <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>irecv<span style=\"color: #808030\">(<\/span>source<span style=\"color: #808030\">=<\/span><span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">,<\/span> tag<span style=\"color: #808030\">=<\/span>rank<span style=\"color: #808030\">)<\/span>\r\n    data <span style=\"color: #808030\">=<\/span> req<span style=\"color: #808030\">.<\/span>wait<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'Process {} received data:'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">,<\/span> data<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<h2>Collective communication<\/h2>\n<p>In parallel programming, it is often useful to perform what is known as collective communication, for example, broadcasting a Python object from the master process to all the worker processes. The example code below broadcasts a <a href=\"https:\/\/numpy.org\/\">Numpy<\/a> array using the\u00a0<code>bcast<\/code> method.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">from<\/span> mpi4py <span style=\"color: #800000;font-weight: bold\">import<\/span> MPI\r\n<span style=\"color: #800000;font-weight: bold\">import<\/span> numpy <span style=\"color: #800000;font-weight: bold\">as<\/span> np\r\n\r\ncomm <span style=\"color: #808030\">=<\/span> MPI<span style=\"color: #808030\">.<\/span>COMM_WORLD\r\nrank <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_rank<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\nsize <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_size<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #800000;font-weight: bold\">if<\/span> rank <span style=\"color: #44aadd\">==<\/span> <span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">:<\/span>\r\n    data <span style=\"color: #808030\">=<\/span> np<span style=\"color: #808030\">.<\/span>arange<span style=\"color: #808030\">(<\/span><span style=\"color: #008c00\">4.0<\/span><span style=\"color: #808030\">)<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">else<\/span><span style=\"color: #808030\">:<\/span>\r\n    data <span style=\"color: #808030\">=<\/span> <span style=\"color: #074726\">None<\/span>\r\n\r\ndata <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>bcast<span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">,<\/span> root<span style=\"color: #808030\">=<\/span><span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #800000;font-weight: bold\">if<\/span> rank <span style=\"color: #44aadd\">==<\/span> <span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">:<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'Process {} broadcast data:'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">,<\/span> data<span style=\"color: #808030\">)<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">else<\/span><span style=\"color: #808030\">:<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'Process {} received data:'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">,<\/span> data<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>The output from running the above code (broadcast.py) on 4 processes is as follows:<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">$ srun -n 4 python3 broadcast.py\r\nProcess 0 broadcast data: [0. 1. 2. 3.]\r\nProcess 1 received data: [0. 1. 2. 3.]\r\nProcess 2 received data: [0. 1. 2. 3.]\r\nProcess 3 received data: [0. 1. 2. 3.]<\/pre>\n<\/div>\n<\/div>\n<p>If one needs to divide a task and distribute the sub-tasks to the processes, the <code>scatter<\/code> method will be useful. However note that it is not possible to distribute more elements than the number of processors. If one has a big list or array, it is necessary to make slices of the list or array before calling the <code>scatter<\/code> method. The code below is an example of scattering a <a href=\"https:\/\/numpy.org\/\">Numpy<\/a> array, which is converted into a list of array slices before the <code>scatter<\/code> method is called.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">from<\/span> mpi4py <span style=\"color: #800000;font-weight: bold\">import<\/span> MPI\r\n<span style=\"color: #800000;font-weight: bold\">import<\/span> numpy <span style=\"color: #800000;font-weight: bold\">as<\/span> np\r\n\r\ncomm <span style=\"color: #808030\">=<\/span> MPI<span style=\"color: #808030\">.<\/span>COMM_WORLD\r\nrank <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_rank<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\nnprocs <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_size<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #800000;font-weight: bold\">if<\/span> rank <span style=\"color: #44aadd\">==<\/span> <span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">:<\/span>\r\n    data <span style=\"color: #808030\">=<\/span> np<span style=\"color: #808030\">.<\/span>arange<span style=\"color: #808030\">(<\/span><span style=\"color: #008000\">15.0<\/span><span style=\"color: #808030\">)\r\n<\/span>\r\n    <span style=\"color: #696969\"># determine the size of each sub-task<\/span>\r\n    ave<span style=\"color: #808030\">,<\/span> res <span style=\"color: #808030\">=<\/span> <span style=\"color: #400000\">divmod<\/span><span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">.<\/span>size<span style=\"color: #808030\">,<\/span> nprocs<span style=\"color: #808030\">)<\/span>\r\n    counts <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span>ave <span style=\"color: #44aadd\">+<\/span> <span style=\"color: #008c00\">1<\/span> <span style=\"color: #800000;font-weight: bold\">if<\/span> p <span style=\"color: #44aadd\">&lt;<\/span> res <span style=\"color: #800000;font-weight: bold\">else<\/span> ave <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]\r\n<\/span>\r\n    <span style=\"color: #696969\"># determine the starting and ending indices of each sub-task\r\n<\/span>    starts <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span><span style=\"color: #400000\">sum<\/span><span style=\"color: #808030\">(<\/span>counts<span style=\"color: #808030\">[<\/span><span style=\"color: #808030\">:<\/span>p<span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]<\/span>\r\n    ends <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span><span style=\"color: #400000\">sum<\/span><span style=\"color: #808030\">(<\/span>counts<span style=\"color: #808030\">[<\/span><span style=\"color: #808030\">:<\/span>p<span style=\"color: #44aadd\">+<\/span><span style=\"color: #008c00\">1<\/span><span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]\r\n\r\n<span style=\"color: #696969\">    # converts data into a list of arrays<\/span> <\/span>\r\n    data <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span>data<span style=\"color: #808030\">[<\/span>starts<span style=\"color: #808030\">[<\/span>p<span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">:<\/span>ends<span style=\"color: #808030\">[<\/span>p<span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">]<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">else<\/span><span style=\"color: #808030\">:<\/span>\r\n    data <span style=\"color: #808030\">=<\/span> <span style=\"color: #074726\">None<\/span>\r\n\r\ndata <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>scatter<span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">,<\/span> root<span style=\"color: #808030\">=<\/span><span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'Process {} has data:'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">,<\/span> data<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>The output from running the above code (scatter-array.py) on 4 processes is as follows.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">$ srun -n 4 python3 scatter-array.py\r\nProcess 0 has data: [0. 1. 2. 3.]\r\nProcess 1 has data: [4. 5. 6. 7.]\r\nProcess 2 has data: [8. 9. 10. 11.]\r\nProcess 3 has data: [12. 13. 14.]<\/pre>\n<\/div>\n<\/div>\n<p>Since there are 15 elements in total, we can see that each of the first 3 processes received 4 elements, and that the last process received 3 elements.<\/p>\n<p>The <code>gather<\/code> method does the opposite to <code>scatter<\/code>. If each process has an element, one can use <code>gather<\/code> to collect them into a list of elements on the master process. The example code below uses <code>scatter<\/code> and <code>gather<\/code> to compute \u03c0 in parallel.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">from<\/span> mpi4py <span style=\"color: #800000;font-weight: bold\">import<\/span> MPI\r\n<span style=\"color: #800000;font-weight: bold\">import<\/span> time\r\n<span style=\"color: #800000;font-weight: bold\">import<\/span> math\r\n\r\nt0 <span style=\"color: #808030\">=<\/span> time<span style=\"color: #808030\">.<\/span>time<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\ncomm <span style=\"color: #808030\">=<\/span> MPI<span style=\"color: #808030\">.<\/span>COMM_WORLD\r\nrank <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_rank<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\nnprocs <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>Get_size<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #696969\"># number of integration steps<\/span>\r\nnsteps <span style=\"color: #808030\">=<\/span> <span style=\"color: #008c00\">10000000<\/span>\r\n<span style=\"color: #696969\"># step size<\/span>\r\ndx <span style=\"color: #808030\">=<\/span> <span style=\"color: #008000\">1.0<\/span> <span style=\"color: #44aadd\">\/<\/span> nsteps\r\n\r\n<span style=\"color: #800000;font-weight: bold\">if<\/span> rank <span style=\"color: #44aadd\">==<\/span> <span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">:<\/span>\r\n    <span style=\"color: #696969\"># determine the size of each sub-task<\/span>\r\n    ave<span style=\"color: #808030\">,<\/span> res <span style=\"color: #808030\">=<\/span> <span style=\"color: #400000\">divmod<\/span><span style=\"color: #808030\">(<\/span>nsteps<span style=\"color: #808030\">,<\/span> nprocs<span style=\"color: #808030\">)<\/span>\r\n    counts <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span>ave <span style=\"color: #44aadd\">+<\/span> <span style=\"color: #008c00\">1<\/span> <span style=\"color: #800000;font-weight: bold\">if<\/span> p <span style=\"color: #44aadd\">&lt;<\/span> res <span style=\"color: #800000;font-weight: bold\">else<\/span> ave <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]\r\n<\/span>\r\n    <span style=\"color: #696969\"># determine the starting and ending indices of each sub-task\r\n<\/span>    starts <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span><span style=\"color: #400000\">sum<\/span><span style=\"color: #808030\">(<\/span>counts<span style=\"color: #808030\">[<\/span><span style=\"color: #808030\">:<\/span>p<span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]<\/span>\r\n    ends <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span><span style=\"color: #400000\">sum<\/span><span style=\"color: #808030\">(<\/span>counts<span style=\"color: #808030\">[<\/span><span style=\"color: #808030\">:<\/span>p<span style=\"color: #44aadd\">+<\/span><span style=\"color: #008c00\">1<\/span><span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]\r\n\r\n<span style=\"color: #696969\">    # save the starting and ending indices in data <\/span> <\/span>\r\n    data <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span><span style=\"color: #808030\">(<\/span>starts<span style=\"color: #808030\">[<\/span>p<span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">,<\/span> ends<span style=\"color: #808030\">[<\/span>p<span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">else<\/span><span style=\"color: #808030\">:<\/span>\r\n    data <span style=\"color: #808030\">=<\/span> <span style=\"color: #074726\">None<\/span>\r\n\r\ndata <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>scatter<span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">,<\/span> root<span style=\"color: #808030\">=<\/span><span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #696969\"># compute partial contribution to pi on each process<\/span>\r\npartial_pi <span style=\"color: #808030\">=<\/span> <span style=\"color: #008000\">0.0<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">for<\/span> i <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">[<\/span><span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">,<\/span> data<span style=\"color: #808030\">[<\/span><span style=\"color: #008c00\">1<\/span><span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n    x <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">(<\/span>i <span style=\"color: #44aadd\">+<\/span> <span style=\"color: #008000\">0.5<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #44aadd\">*<\/span> dx\r\n    partial_pi <span style=\"color: #44aadd\">+<\/span><span style=\"color: #808030\">=<\/span> <span style=\"color: #008000\">4.0<\/span> <span style=\"color: #44aadd\">\/<\/span> <span style=\"color: #808030\">(<\/span><span style=\"color: #008000\">1.0<\/span> <span style=\"color: #44aadd\">+<\/span> x <span style=\"color: #44aadd\">*<\/span> x<span style=\"color: #808030\">)<\/span>\r\npartial_pi <span style=\"color: #44aadd\">*<\/span><span style=\"color: #808030\">=<\/span> dx\r\n\r\npartial_pi <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span>gather<span style=\"color: #808030\">(<\/span>partial_pi<span style=\"color: #808030\">,<\/span> root<span style=\"color: #808030\">=<\/span><span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #800000;font-weight: bold\">if<\/span> rank <span style=\"color: #44aadd\">==<\/span> <span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">:<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'pi computed in {:.3f} sec'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>time<span style=\"color: #808030\">.<\/span>time<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #44aadd\">-<\/span> t0<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">)<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'error is {}'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span><span style=\"color: #400000\">abs<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #400000\">sum<\/span><span style=\"color: #808030\">(<\/span>partial_pi<span style=\"color: #808030\">)<\/span> <span style=\"color: #44aadd\">-<\/span> math<span style=\"color: #808030\">.<\/span>pi<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>In addition to the <code>gather<\/code> method which collects the elements into a list, one can also use the <code>reduce<\/code> method to collect the results. The last five lines of the above example can be rewritten as follows, where \u201c<code>op=MPI.SUM<\/code>\u201d requires that <code>pi<\/code> is obtained as the sum of <code>partial_pi<\/code> from all processes.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">pi <span style=\"color: #808030\">=<\/span> comm<span style=\"color: #808030\">.<\/span><span style=\"color: #400000\">reduce<\/span><span style=\"color: #808030\">(<\/span>partial_pi<span style=\"color: #808030\">,<\/span> op<span style=\"color: #808030\">=<\/span>MPI<span style=\"color: #808030\">.<\/span><span style=\"color: #400000\">SUM<\/span><span style=\"color: #808030\">,<\/span> root<span style=\"color: #808030\">=<\/span><span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">)<\/span>\r\n\r\n<span style=\"color: #800000;font-weight: bold\">if<\/span> rank <span style=\"color: #44aadd\">==<\/span> <span style=\"color: #008c00\">0<\/span><span style=\"color: #808030\">:<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'pi computed in {:.3f} sec'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span>time<span style=\"color: #808030\">.<\/span>time<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #44aadd\">-<\/span> t0<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">)<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #0000e6\">'error is {}'<\/span><span style=\"color: #808030\">.<\/span>format<span style=\"color: #808030\">(<\/span><span style=\"color: #400000\">abs<\/span><span style=\"color: #808030\">(<\/span>pi <span style=\"color: #44aadd\">-<\/span> math<span style=\"color: #808030\">.<\/span>pi<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<h2>Summary<\/h2>\n<p>We have briefly introduced the <code>mpi4py<\/code> module and the ways it enables communication of Python objects via all-lowercase methods.<\/p>\n<ul>\n<li><code>Get_rank<\/code> and <code>Get_size<\/code> provide the rank of the process and the size of the communicator (total number of processes), respectively.<\/li>\n<li><code>send<\/code> and <code>recv<\/code> are blocking point-to-point communication methods, while\u00a0<code>isend<\/code> and <code>irecv<\/code> methods are non-blocking methods.<\/li>\n<li><code>bcast<\/code>, <code>scatter<\/code>, <code>gather<\/code> and <code>reduce<\/code> are collective communication methods.<\/li>\n<\/ul>\n<p>You may read the <a href=\"https:\/\/mpi4py.readthedocs.io\/en\/stable\/tutorial.html\">MPI for Python tutorial page<\/a> for more information about the <code>mpi4py<\/code> module.<\/p>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>In previous posts we have introduced the multiprocessing module which makes it possible to parallelize Python programs on shared memory systems. The limitation of the multiprocessing module is that it does not support parallelization over multiple compute nodes (i.e. on distributed memory systems). To overcome this limitation and enable cross-node parallelization, we can use MPI [&hellip;]<\/p>\n","protected":false},"author":1140,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[5],"tags":[9,19],"class_list":["post-471","post","type-post","status-publish","format-standard","hentry","category-performance","tag-parallelization","tag-python"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p9W9Im-7B","_links":{"self":[{"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/posts\/471","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/users\/1140"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/comments?post=471"}],"version-history":[{"count":29,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/posts\/471\/revisions"}],"predecessor-version":[{"id":508,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/posts\/471\/revisions\/508"}],"wp:attachment":[{"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/media?parent=471"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/categories?post=471"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/tags?post=471"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}