Tag: AI

Parallel Python in classes…now you are in a pickle

In the past, I discussed how to create a python script which runs your calculations in parallel.  Using the multiprocessing library, you can circumvent the GIL and employing the async version of the multiprocessing functions, calculations are even performed in parallel. This works quite well, however, when using this within a python class you may run into some unexpected behaviour and errors due to the pickling performed by the multiprocessing library.

For example, if the doOneRun function is a class function defined as

class MyClass:
...
    def doOneRun(self, id:int):
       return id**3
...

and you perform some parallel calculation in another function of your class as

class MyClass:
...
    def ParallelF(self, NRuns:int):
       import multiprocessing as mp

       nproc=10
       pool=mp.Pool(processes=nprocs) 
       drones=[pool.apply_async(self.doOneRun, args=nr) for nr in range(NRuns)] 

       for drone in drones: 
           Results.collectData(drone.get()) 
       pool.close() 
       pool.join() 
       
...

you may run into a runtime error complaining that a function totally unrelated to the parallel work (or even to the class itself) can not be pickled. 😯

So what is going on? In the above setup, you would expect the pool.apply_async function to take just a function pointer to the doOneRun function. However, as it is provided by a the call self.doOneRun, the pool-function grabs the entire class and everything it contains, and tries to pickle it to distribute it to all the processes.  In addition to the fact that such an approach is hugely inefficient, it has the side-effect that any part associated to your class needs to be pickleable, even if it is a class-function of a class used to generate an object which is just a property of the MyClass Class above.

So both for reasons of efficiency and to avoid such side-effects, it is best to make the doOneRun function independent of a class, and even placing it outside the class.

def doOneRun(id:int):
    return id**3
  
class MyClass:
...
    def ParallelF(self, NRuns:int):
       import multiprocessing as mp

       nproc=10
       pool=mp.Pool(processes=nprocs) 
       drones=[pool.apply_async(doOneRun, args=nr) for nr in range(NRuns)] 

       for drone in drones: 
           Results.collectData(drone.get()) 
       pool.close() 
       pool.join() 
       
...

This way you avoid pickling the entire class, reducing initialization times of the processes and the  unnecessary communication-overhead between processes. As a bonus, you also reduce the risk of unexpected crashes unrelated to the calculation performed.

Parallel Python?

As part of my machine learning research at AMIBM, I recently ran into the following challenge: “Is it possible to do parallel computation using python.” It sent me on a rather long and arduous journey, with the final answer being something like: “very reluctantly“.

Python was designed with one specific goal in mind; make it easy to implement small test programs to see if an idea is worth pursuing. This gave rise to a scripting language with a lot of flexibility, but also with significant limitations, most of which the “intended” user would never meet. However, as a consequence of its success, many are using it going far beyond this original scope (yours truly as well 🙂 ).

Python offers various libraries to parallelize your scripts…most of them wrappers adding minor additional functionality. However, digging down to the bottom one generally ends up at one of the following two libraries: the threading module and the multiprocessing module.

Of course, as with many things python, there is a huge amount of tutorials available with many of great quality.

import threading

Programmers experienced in a programming language such as C/C++, Pascal, or Fortran, may be familiar with the concept of multi-threading. With multi-threading, a CPU allows a program to distribute its work over multiple program-threads which can be performed in parallel by the different cores of the CPU (or while a core is idle, e.g., since a thread is waiting for data to be fetched).  One of the most famous API’s for writing multi-threaded applications is OpenMP. In the past I used it to parallelize my Hirshfeld-I implementation and the phonon-module of HIVE.

For Python, there is no implementation of the OpenMP API, instead there is the threading module. This provides access to the creation of multiple threads, each able to perform their own tasks while sharing data-objects. Unfortunately, python has also the Global Interpreter Lock, GIL for short, which allows only a single thread to access the interpreter at a time. This effectively reduces thread-based parallelization to a complex way of running a code in a serial way.

For more information on “multi-threading” in python, you can look into this tutorial.

import multiprocessing

In addition to the threading module, there is also the multiprocessing module. This module side-steps the GIL by creating multiple processes, each having its own interpreter. This however comes at a cost. Firstly, there is a significant computational cost starting the different processes. Secondly, objects are not shared between processes, so additional work is needed to collect and share data.

Using the “Pool” class, things are somewhat simplified, as can be seen in the code-fragment below.  With the pool class one creates a set of threads/processes available for your program. Then through the function apply_async function it is possible to run processes in parallel. (Note that you need to use the “async” version of the function, as otherwise you end up with running things serial …again)

  1. import multiprocessing as mp
  2.  
  3. def doOneRun(id:int): #trivial function to run in parallel
  4. return id**3
  5.  
  6.  
  7.  
  8. num_workers=10 #number of processes
  9. NRuns=1000 #number of runs of the function doOneRun
  10.  
  11. pool=mp.Pool(processes=num_workers) # create a pool of processes
  12. drones=[pool.apply_async(doOneRun, args=nr) for nr in range(NRuns)] #and run things in parallel
  13.  
  14. for drone in drones: #and collect the data
  15. Results.collectData(drone.get()) #Results.collectData is a function you write to recombine the separate results into a single result and is not given here.
  16.  
  17. pool.close() #close the pool...no new tasks can be run on any of the processes
  18. pool.join() #collapse all threads back into the main thread

 

how many cores does my computer have?

If you are used to HPC applications, you always want to get as much out of your machine as possible. With regard to parallelization this often means making sure no CPU cycle is left unused. In the example above we manually selected the number of processes to spawn. However, would it not be nice if the program itself could just set this value to be equal to the number of physical cores accessible?

Python has a large number of functions claiming to do just that. A few of them are given below.

  •  multiprocessing.cpu_count(): returns the number of logical cores it can find. So if you have a modern machine with hyper-threading technology, this will return a multiple of the number of physical cores (and you will be over-subscribing your CPU.
  • os.cpu_count(): same as multiprocessing.cpu_count().
  • psutil.cpu_count(logical=False): This implementation gives the same default behavior, however, the parameter logical allows for this function to return the correct number of cores in a single CPU. Indeed a single CPU. HPC architectures which contain multiples CPUs per node will again return an incorrect number, as the implementation makes use of a python “set”, and as such doesn’t increment for the same index core on a different CPU.

In conclusion, there seems to be no simple way to obtain the correct number of physical cores using python, and one is forced to provide this number manually. (If you do have knowledge of such a function which works in both windows and unix environments and both desktop and HPC architectures feel free to let me know in the comments.)

All in all, it is technically possible to run code in parallel using python, but you have to deal with a lot of python quirks such as GIL.

New year’s resolution

A new year, a new beginning.

For most people this is a time of making promises, starting new habits or stopping old ones. In general, I forgo making such promises, as I know they turn out idle in a mere few weeks without external stimulus or any real driving force.

In spite of this, I do have a new years resolution for this year: I am going to study machine learning and use it for any suitable application I can get my hands on (which will mainly be materials science, but one never knows).  I already have a few projects in mind, which should help me stay focused and on track. With some luck, you will be reading about them here on this blog. With some more luck, they may even end up being part of an actual scientific publication.

But first things first, learn the basics (beyond hear-say messages of how excellent and world improving AI is/will be). What are the different types of machine learning available, is it all black box or do you actually have some control over things. Is it a kind of magic? What’s up with all these frameworks (isn’t there anyone left who can program?), and why the devil seem they all to be written in a script langue (python) instead of a proper programming language? A lot of questions I hope to see answered. A lot of things to learn. Lets start by building some foundations…the old fashioned way: By studying using a book, with real paper pages!

Happy New Year, and best wishes to you all!