{"id":258,"date":"2019-02-13T10:16:46","date_gmt":"2019-02-13T09:16:46","guid":{"rendered":"https:\/\/www.kth.se\/blogs\/pdc\/?p=258"},"modified":"2019-03-28T12:32:53","modified_gmt":"2019-03-28T11:32:53","slug":"parallel-programming-in-python-multiprocessing-part-1","status":"publish","type":"post","link":"https:\/\/www.kth.se\/blogs\/pdc\/2019\/02\/parallel-programming-in-python-multiprocessing-part-1\/","title":{"rendered":"Parallel programming in Python: multiprocessing (part 1)"},"content":{"rendered":"<div class=\"post-content-wrapper\"><p>Parallel programming solves big numerical problems by dividing them into smaller sub-tasks, and hence reduces the overall computational time on multi-processor and\/or multi-core machines. Parallel programming is well supported in traditional programming languages like C and FORTRAN, which are suitable for &#8220;heavy-duty&#8221; computational tasks. Traditionally, Python is considered to not support parallel programming very well, partly because of the <a href=\"https:\/\/docs.python.org\/dev\/c-api\/init.html#thread-state-and-the-global-interpreter-lock\">global interpreter lock (GIL)<\/a>. However, things have changed over time. Thanks to the development of a rich variety of libraries and packages, the support for parallel programming in Python is now much better.<\/p>\n<p>This post (and the following part) will briefly introduce <a href=\"https:\/\/docs.python.org\/dev\/library\/multiprocessing.html\">the\u00a0<code>multiprocessing<\/code>\u00a0module<\/a> in Python, which effectively side-steps the GIL by using subprocesses instead of threads. The\u00a0<code>multiprocessing<\/code>\u00a0module provides many useful features and is very suitable for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Symmetric_multiprocessing\">symmetric multiprocessing (SMP)<\/a> and shared memory systems. In this post we focus on <a href=\"https:\/\/docs.python.org\/dev\/library\/multiprocessing.html#module-multiprocessing.pool\">the\u00a0<code>Pool<\/code>\u00a0class<\/a>\u00a0of the <code>multiprocessing<\/code>\u00a0module, which controls a pool of worker processes and supports both synchronous and asynchronous\u00a0parallel execution.<\/p>\n<p><!--more--><\/p>\n<h2>The Pool class<\/h2>\n<h3>Creating a Pool object<\/h3>\n<p>To use the <code>multiprocessing<\/code>\u00a0module, you need to import it first.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">import<\/span> multiprocessing <span style=\"color: #800000;font-weight: bold\">as<\/span> mp\r\n<\/pre>\n<\/div>\n<\/div>\n<p>Documentation for the module can be displayed with the <code>help<\/code>\u00a0method.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #400000\">help<\/span><span style=\"color: #808030\">(<\/span>mp<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>The module can detect the number of available CPU cores via the <code>cpu_count<\/code>\u00a0method. (Note that we use the Python3 syntax for printing the resulting number.)<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">nprocs <span style=\"color: #808030\">=<\/span> mp<span style=\"color: #808030\">.<\/span>cpu_count<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span>f<span style=\"color: #0000e6\">\"Number of CPU cores: {nprocs}\"<\/span><span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>In practice it is desirable to have one process per CPU core, so it is a good idea to set <code>nprocs<\/code>\u00a0to be the number of available CPU cores. A <code>Pool<\/code>\u00a0object can be created by passing the desired number of processes to the constructor.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">pool <span style=\"color: #808030\">=<\/span> mp<span style=\"color: #808030\">.<\/span>Pool<span style=\"color: #808030\">(<\/span>processes<span style=\"color: #808030\">=<\/span>nprocs<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<h3>The map method<\/h3>\n<p>To demonstrate the usage of the <code>Pool<\/code>\u00a0class, let&#8217;s define a simple function that calculates the square of a number.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">def<\/span> square<span style=\"color: #808030\">(<\/span>x<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">return<\/span> x <span style=\"color: #44aadd\">*<\/span> x\r\n<\/pre>\n<\/div>\n<\/div>\n<p>Suppose we want to use the <code>square<\/code>\u00a0method to calculate the squares of a list of integers. In serial programming we can use the following code to compute and print the result, via\u00a0<a href=\"https:\/\/docs.python.org\/dev\/tutorial\/datastructures.html#list-comprehensions\">list comprehension<\/a>.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">result <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span>square<span style=\"color: #808030\">(<\/span>x<span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> x <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #008c00\">20<\/span><span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span>result<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>To execute the computation in parallel, we can use the <code>map<\/code>\u00a0method of the <code>Pool<\/code>\u00a0class, which is similar to <a href=\"https:\/\/docs.python.org\/dev\/library\/functions.html#map\">the built-in\u00a0<code>map<\/code>\u00a0function<\/a> in Python.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">result <span style=\"color: #808030\">=<\/span> pool<span style=\"color: #808030\">.<\/span><span style=\"color: #400000\">map<\/span><span style=\"color: #808030\">(<\/span>square<span style=\"color: #808030\">,<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #008c00\">20<\/span><span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">)<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span>result<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>The above parallel code will print exactly the same result as the serial code, but the computations are actually distributed and executed in parallel on the worker processes. The <code>map<\/code>\u00a0method will guarantee that the order of the output is correct.<\/p>\n<h3>The starmap method<\/h3>\n<p>You may have noticed that the <code>map<\/code>\u00a0method is only applicable to computational routines that accept a single argument (e.g. the previously defined <code>square<\/code>\u00a0function). For routines that accept multiple arguments, the <code>Pool<\/code>\u00a0class also provides the <code>starmap<\/code>\u00a0method. For example, we can define a more general routine that computes a power of arbitrary order<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">def<\/span> power_n<span style=\"color: #808030\">(<\/span>x<span style=\"color: #808030\">,<\/span> n<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">return<\/span> x <span style=\"color: #44aadd\">**<\/span> n\r\n<\/pre>\n<\/div>\n<\/div>\n<p>and pass this <code>power_n<\/code>\u00a0routine and a list of input arguments\u00a0to the <code>starmap<\/code>\u00a0method.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">result <span style=\"color: #808030\">=<\/span> pool<span style=\"color: #808030\">.<\/span>starmap<span style=\"color: #808030\">(<\/span>power_n<span style=\"color: #808030\">,<\/span> <span style=\"color: #808030\">[<\/span><span style=\"color: #808030\">(<\/span>x<span style=\"color: #808030\">,<\/span> <span style=\"color: #008c00\">2<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> x <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #008c00\">20<\/span><span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">)<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span>result<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>Note that both <code>map<\/code>\u00a0and <code>starmap<\/code>\u00a0are synchronous methods. In other words, if a worker process finishes its sub-task very early, it will wait for the other worker processes to finish. This may lead to performance degradation if the workload is not well balanced among the worker processes.<\/p>\n<h3>The apply_async method<\/h3>\n<p>The <code>Pool<\/code>\u00a0class also provides the <code>apply_async<\/code>\u00a0method that makes asynchronous execution of the worker processes possible. Unlike the <code>map<\/code>\u00a0method, which executes a computational routine over a list of inputs, the <code>apply_async<\/code>\u00a0method executes the routine only once. Therefore, in the previous example, we would need to define another routine, <code>power_n_list<\/code>, that computes the values of a list of numbers raised to a particular power.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">def<\/span> power_n_list<span style=\"color: #808030\">(<\/span>x_list<span style=\"color: #808030\">,<\/span> n<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">return<\/span> <span style=\"color: #808030\">[<\/span>x <span style=\"color: #44aadd\">**<\/span> n <span style=\"color: #800000;font-weight: bold\">for<\/span> x <span style=\"color: #800000;font-weight: bold\">in<\/span> x_list<span style=\"color: #808030\">]<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>To use the <code>apply_async<\/code>\u00a0method, we also need to divide the whole input list <code>range(20)<\/code>\u00a0into sub-lists (which are known as slices) and distribute them to the worker processes. The slices can be prepared by the following <code>slice_data<\/code>\u00a0method.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">def<\/span> slice_data<span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">,<\/span> nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n    aver<span style=\"color: #808030\">,<\/span> res <span style=\"color: #808030\">=<\/span> <span style=\"color: #400000\">divmod<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #400000\">len<\/span><span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">,<\/span> nprocs<span style=\"color: #808030\">)<\/span>\r\n    nums <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span><span style=\"color: #808030\">]<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">for<\/span> proc <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n        <span style=\"color: #800000;font-weight: bold\">if<\/span> proc <span style=\"color: #44aadd\">&lt;<\/span> res<span style=\"color: #808030\">:<\/span>\r\n            nums<span style=\"color: #808030\">.<\/span>append<span style=\"color: #808030\">(<\/span>aver <span style=\"color: #44aadd\">+<\/span> <span style=\"color: #008c00\">1<\/span><span style=\"color: #808030\">)<\/span>\r\n        <span style=\"color: #800000;font-weight: bold\">else<\/span><span style=\"color: #808030\">:<\/span>\r\n            nums<span style=\"color: #808030\">.<\/span>append<span style=\"color: #808030\">(<\/span>aver<span style=\"color: #808030\">)<\/span>\r\n    count <span style=\"color: #808030\">=<\/span> <span style=\"color: #008c00\">0<\/span>\r\n    slices <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span><span style=\"color: #808030\">]<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">for<\/span> proc <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n        slices<span style=\"color: #808030\">.<\/span>append<span style=\"color: #808030\">(<\/span>data<span style=\"color: #808030\">[<\/span>count<span style=\"color: #808030\">:<\/span> count<span style=\"color: #44aadd\">+<\/span>nums<span style=\"color: #808030\">[<\/span>proc<span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">]<\/span><span style=\"color: #808030\">)<\/span>\r\n        count <span style=\"color: #44aadd\">+<\/span><span style=\"color: #808030\">=<\/span> nums<span style=\"color: #808030\">[<\/span>proc<span style=\"color: #808030\">]<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">return<\/span> slices\r\n<\/pre>\n<\/div>\n<\/div>\n<p>Then we can pass the <code>power_n_list<\/code>\u00a0routine and the sliced input lists to the <code>apply_async<\/code>\u00a0method.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">inp_lists <span style=\"color: #808030\">=<\/span> slice_data<span style=\"color: #808030\">(<\/span><span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span><span style=\"color: #008c00\">20<\/span><span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">,<\/span> nprocs<span style=\"color: #808030\">)<\/span>\r\nmulti_result <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span>pool<span style=\"color: #808030\">.<\/span>apply_async<span style=\"color: #808030\">(<\/span>power_n_list<span style=\"color: #808030\">,<\/span> <span style=\"color: #808030\">(<\/span>inp<span style=\"color: #808030\">,<\/span> <span style=\"color: #008c00\">2<\/span><span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> inp <span style=\"color: #800000;font-weight: bold\">in<\/span> inp_lists<span style=\"color: #808030\">]<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>The actual result can be obtained using the <code>get<\/code>\u00a0method and nested\u00a0<a href=\"https:\/\/docs.python.org\/dev\/tutorial\/datastructures.html#list-comprehensions\">list comprehension<\/a>.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">result <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span>x <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> multi_result <span style=\"color: #800000;font-weight: bold\">for<\/span> x <span style=\"color: #800000;font-weight: bold\">in<\/span> p<span style=\"color: #808030\">.<\/span>get<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">print<\/span><span style=\"color: #808030\">(<\/span>result<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>Note that the <code>apply_async<\/code>\u00a0method itself does not guarantee the correct order of the output. In the above example, <code>apply_async<\/code>\u00a0was used with\u00a0<a href=\"https:\/\/docs.python.org\/dev\/tutorial\/datastructures.html#list-comprehensions\">list comprehension<\/a>\u00a0so that the result remained ordered (see also the <a href=\"https:\/\/docs.python.org\/dev\/library\/multiprocessing.html?highlight=process#examples\">examples<\/a>).<\/p>\n<h2>Example: computing \u03c0<\/h2>\n<p>After that brief introduction, we can use the <code>Pool<\/code>\u00a0class to do useful things. Here we use the calculation of \u03c0 as a simple example to demonstrate the parallelization of Python code. The formula for computing \u03c0 is given below.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-290 aligncenter\" src=\"https:\/\/www.kth.se\/blogs\/pdc\/files\/2019\/02\/pi.png\" alt=\"\" width=\"150\" height=\"50\" \/><\/p>\n<h3>Serial code<\/h3>\n<p>With the above formula we can compute the value of\u00a0\u03c0 via numerical integration over a large number of points. For example we can choose to use 10 million points. The serial code is shown below.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">nsteps <span style=\"color: #808030\">=<\/span> <span style=\"color: #008c00\">10000000<\/span>\r\ndx <span style=\"color: #808030\">=<\/span> <span style=\"color: #008000\">1.0<\/span> <span style=\"color: #44aadd\">\/<\/span> nsteps\r\npi <span style=\"color: #808030\">=<\/span> <span style=\"color: #008000\">0.0<\/span>\r\n<span style=\"color: #800000;font-weight: bold\">for<\/span> i <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nsteps<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n    x <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">(<\/span>i <span style=\"color: #44aadd\">+<\/span> <span style=\"color: #008000\">0.5<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #44aadd\">*<\/span> dx\r\n    pi <span style=\"color: #44aadd\">+<\/span><span style=\"color: #808030\">=<\/span> <span style=\"color: #008000\">4.0<\/span> <span style=\"color: #44aadd\">\/<\/span> <span style=\"color: #808030\">(<\/span><span style=\"color: #008000\">1.0<\/span> <span style=\"color: #44aadd\">+<\/span> x <span style=\"color: #44aadd\">*<\/span> x<span style=\"color: #808030\">)<\/span>\r\npi <span style=\"color: #44aadd\">*<\/span><span style=\"color: #808030\">=<\/span> dx\r\n<\/pre>\n<\/div>\n<\/div>\n<h3>Parallel code<\/h3>\n<p>To parallelize the serial code for computing \u03c0, we need to divide the\u00a0<span style=\"font-family: Consolas, Monaco, Lucida Console, monospace\">for<\/span>\u00a0loop into sub-tasks and distribute them to the worker processes. In other words, we need to evenly distribute the task of evaluating the integrand at 10 million points. This can be conveniently done by providing the start, stop and step arguments to <a href=\"https:\/\/docs.python.org\/dev\/library\/functions.html#func-range\">the built-in\u00a0<code>range<\/code>\u00a0function<\/a>. The first integer in the <code>range<\/code>\u00a0function is the start of the sequence, and should be set as the index or rank of the process. The second integer is the number of integration points, namely the end of the sequence. The third integer is the step between the adjacent elements in the sequence, and is set as the number of processes to avoid double counting. For example, the following <code>calc_partial_pi<\/code>\u00a0function uses the <code>range<\/code>\u00a0function for the sub-task on a worker process.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\"><span style=\"color: #800000;font-weight: bold\">def<\/span> calc_partial_pi<span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">,<\/span> nprocs<span style=\"color: #808030\">,<\/span> nsteps<span style=\"color: #808030\">,<\/span> dx<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n    partial_pi <span style=\"color: #808030\">=<\/span> <span style=\"color: #008000\">0.0<\/span>\r\n    <span style=\"color: #800000;font-weight: bold\">for<\/span> i <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">,<\/span> nsteps<span style=\"color: #808030\">,<\/span> nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">:<\/span>\r\n        x <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">(<\/span>i <span style=\"color: #44aadd\">+<\/span> <span style=\"color: #008000\">0.5<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #44aadd\">*<\/span> dx\r\n        partial_pi <span style=\"color: #44aadd\">+<\/span><span style=\"color: #808030\">=<\/span> <span style=\"color: #008000\">4.0<\/span> <span style=\"color: #44aadd\">\/<\/span> <span style=\"color: #808030\">(<\/span><span style=\"color: #008000\">1.0<\/span> <span style=\"color: #44aadd\">+<\/span> x <span style=\"color: #44aadd\">*<\/span> x<span style=\"color: #808030\">)<\/span>\r\n    partial_pi <span style=\"color: #44aadd\">*<\/span><span style=\"color: #808030\">=<\/span> dx\r\n    <span style=\"color: #800000;font-weight: bold\">return<\/span> partial_pi\r\n<\/pre>\n<\/div>\n<\/div>\n<p>With the <code>calc_partial_pi<\/code>\u00a0function we can prepare the input arguments for the sub-tasks and compute the value of \u03c0 using the <code>starmap<\/code>\u00a0method of the <code>Pool<\/code>\u00a0class, as shown below.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">nprocs <span style=\"color: #808030\">=<\/span> mp<span style=\"color: #808030\">.<\/span>cpu_count<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span>\r\ninputs <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span><span style=\"color: #808030\">(<\/span>rank<span style=\"color: #808030\">,<\/span> nprocs<span style=\"color: #808030\">,<\/span> nsteps<span style=\"color: #808030\">,<\/span> dx<span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> rank <span style=\"color: #800000;font-weight: bold\">in<\/span> <span style=\"color: #400000\">range<\/span><span style=\"color: #808030\">(<\/span>nprocs<span style=\"color: #808030\">)<\/span><span style=\"color: #808030\">]<\/span>\r\n\r\npool <span style=\"color: #808030\">=<\/span> mp<span style=\"color: #808030\">.<\/span>Pool<span style=\"color: #808030\">(<\/span>processes<span style=\"color: #808030\">=<\/span>nprocs<span style=\"color: #808030\">)<\/span>\r\nresult <span style=\"color: #808030\">=<\/span> pool<span style=\"color: #808030\">.<\/span>starmap<span style=\"color: #808030\">(<\/span>calc_partial_pi<span style=\"color: #808030\">,<\/span> inputs<span style=\"color: #808030\">)<\/span>\r\npi <span style=\"color: #808030\">=<\/span> <span style=\"color: #400000\">sum<\/span><span style=\"color: #808030\">(<\/span>result<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>Asynchronous parallel calculation can be carried out with the <code>apply_async<\/code>\u00a0method of the <code>Pool<\/code>\u00a0class. We can make use of the <code>calc_partial_pi<\/code>\u00a0function and the <code>inputs<\/code>\u00a0list since both <code>starmap<\/code>\u00a0and <code>apply_async<\/code>\u00a0support multiple arguments. The difference is that <code>starmap<\/code>\u00a0returns the results from all processes, while <code>apply_async<\/code>\u00a0returns the result from a single process. The code using the <code>apply_async<\/code>\u00a0method is shown below.<\/p>\n<div class=\"highlight-default\">\n<div class=\"highlight\">\n<pre style=\"color: #000000;background: #ffffff\">multi_result <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span>pool<span style=\"color: #808030\">.<\/span>apply_async<span style=\"color: #808030\">(<\/span>calc_partial_pi<span style=\"color: #808030\">,<\/span> inp<span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> inp <span style=\"color: #800000;font-weight: bold\">in<\/span> inputs<span style=\"color: #808030\">]<\/span>\r\nresult <span style=\"color: #808030\">=<\/span> <span style=\"color: #808030\">[<\/span>p<span style=\"color: #808030\">.<\/span>get<span style=\"color: #808030\">(<\/span><span style=\"color: #808030\">)<\/span> <span style=\"color: #800000;font-weight: bold\">for<\/span> p <span style=\"color: #800000;font-weight: bold\">in<\/span> multi_result<span style=\"color: #808030\">]<\/span>\r\npi <span style=\"color: #808030\">=<\/span> <span style=\"color: #400000\">sum<\/span><span style=\"color: #808030\">(<\/span>result<span style=\"color: #808030\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>In a previous post we have discussed the <a href=\"https:\/\/www.kth.se\/blogs\/pdc\/2018\/11\/scalability-strong-and-weak-scaling\/\">scaling of parallel programs<\/a>. We can also run a scaling test for the parallel Python code based on the <code>starmap<\/code>\u00a0and <code>apply_async<\/code>\u00a0methods of the <code>Pool<\/code>\u00a0class. From the figure below we can see that the two methods provide very similar scaling for computing the value of \u03c0.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-299 aligncenter\" src=\"https:\/\/www.kth.se\/blogs\/pdc\/files\/2019\/02\/pool-scaling-new.png\" alt=\"\" width=\"560\" height=\"280\" \/><\/p>\n<h2>Summary<\/h2>\n<p>We have briefly shown the basics of the <code>map<\/code>, <code>starmap<\/code>\u00a0and <code>apply_async<\/code>\u00a0methods from the <code>Pool<\/code>\u00a0class.<\/p>\n<ul>\n<li><code>map<\/code> and <code>starmap<\/code>\u00a0are synchronous methods.<\/li>\n<li><code>map<\/code> and <code>starmap<\/code>\u00a0guarantee the correct order of output.<\/li>\n<li><code>starmap<\/code> and <code>apply_async<\/code>\u00a0support multiple arguments.<\/li>\n<\/ul>\n<p>You may read the <a href=\"https:\/\/docs.python.org\/dev\/library\/multiprocessing.html#module-multiprocessing.pool\">Python documentation page<\/a> for details about other methods in the <code>Pool<\/code>\u00a0class.<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Parallel programming solves big numerical problems by dividing them into smaller sub-tasks, and hence reduces the overall computational time on multi-processor and\/or multi-core machines. Parallel programming is well supported in traditional programming languages like C and FORTRAN, which are suitable for &#8220;heavy-duty&#8221; computational tasks. Traditionally, Python is considered to not support parallel programming very well, [&hellip;]<\/p>\n","protected":false},"author":1140,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[5],"tags":[9,19],"class_list":["post-258","post","type-post","status-publish","format-standard","hentry","category-performance","tag-parallelization","tag-python"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p9W9Im-4a","_links":{"self":[{"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/posts\/258","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/users\/1140"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/comments?post=258"}],"version-history":[{"count":48,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/posts\/258\/revisions"}],"predecessor-version":[{"id":398,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/posts\/258\/revisions\/398"}],"wp:attachment":[{"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/media?parent=258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/categories?post=258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kth.se\/blogs\/pdc\/wp-json\/wp\/v2\/tags?post=258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}