Tag Archive: fortran

Jul 14

Simple Parallelization in Fortran: OpenMP

Pentium2speedThe first PC we got at our home was a Pentium II. My dad got it, because I was going to university, and I would be able to do something “useful” with it. (Yup, I survived my entire high school career searching stuff in the library and the home encyclopedia. Even more, Google didn’t even exist before we got¬†our computer, as the company was only founded in 1998 ūüôā ). The machine was advertised as state of the art with a clock rate of a whooping 233 MHz!¬†During the decade that followed, the evolution of the clock rates kept going at a steady pace, until it saturated at about 3-4 GHz(15 times faster than the 233 MHz) around 2005. Since then, the clock rate has not increased a bit. If anything, the average clock rate has even decreased to the range 2-3 GHz.¬†As power-consumption grows quadratically with with the clock rate, this means that (1) there is much more heat produced, that needs to be transported away from your CPU (otherwise it get’s destroyed), (2) reducing the clock rate¬†by a factor 2, allows you to power 4 CPU’s at half the clock rate, effectively doubling your calculation power.¬†(There are even more tricks involved in modern CPU’s which crack up performance such that the clock rate isn’t a real measure for performance any longer, and sales people need to learn more new buzzword to sell your computer/laptop ūüĎŅ )

intelCoreCloneWhere in 2005 you bought a single¬†CPU with a high clock rate, you now get a machine with multiple cores. Most machines you can get these days have a minimum of 2 cores, with¬†quad-core machines becoming more and more common. But, there is always a but, even though you now have access to multiple¬†times¬†the processing power of 2005, this does not mean that your own code will be able to use it.¬†Unfortunately there is no simple compiler switch which makes your code parallel (like the -m64 switch which makes your code 64-bit), you have to do this yourself (the free lunch is over). Two commonly used frameworks for this task are OpenMP and MPI. The former mainly focuses on shared memory configurations (laptops, desktops, single nodes in a cluster), while the latter focuses of large distributed memory setups (multi-node clusters) and is thus well-suited for creating codes that need to run on hundreds or even thousands of CPU’s. The two frameworks differ significantly in their complexity, fortunately for us, OpenMP is both the easier one, and the one most suited for a modern multi-core computer. The OpenMP framework consists of pragma’s (or directives) which can be added in an existing code as comment lines, and tell a compiler knowledgeable of OpenMP how to parallelize the code. (It is interesting to note that MPI and OpenMP are inteded for parallel programming in either C, C++ or fortran … a hint that what the important programming languages are.)

OpenMP in Fortran: Basics

A. Compiler-options and such

As most modern fortran compilers also are well aware of openMP (you can check which version of openMP is supported here), you generally will not need to install a new compiler to write parallel fortran code. You only need to add a single compiler flag: -fopenmp (gcc/gfortran),¬†-openmp (intel compiler), or -mp (Portland Group). In¬†Code::Blocks you will find this option under Settings > Compiler > Compiler Settings tab > Compiler Flags tab (If the option isn’t present try adding it to “other compiler options” and hope your compiler¬†recognizes one of the flags).¬†

Secondly, you need to link in the OpenMP¬†library. In Code::Blocks go to¬†Settings > Compiler > Linker¬†Settings tab >¬†Link Libraries: add. Where you add the libgomp.dll.a library (generally found in the folder of your compiler…in case of 64 bit compilers, make sure you get the 64 bit version)

Finally, you may want to get access to OpenMP functions inside your code. This can be achieved by a use statement: use omp_lib.

B. Machine properties

OpenMP contains several functions which allow you to query and set several environment variables (check out these cheat-sheets for OpenMP v3.0 and v4.0).

  • omp_get_num_procs() : returns the number of processors your code sees (in hyper-threaded CPU’s this will be double of the actual number of processor cores).
  • omp_get_num_threads() : returns the number of threads available in a specific section of the code.
  • omp_set_num_threads(I):¬†Sets the number of threads for the openMP parallel section to I
  • omp_get_thread_num() : returns the index of the specific thread you are in [0..I[

 

  1. subroutine OpenMPTest1()
  2.         use omp_lib;
  3.  
  4.         write(*,*) "Running OpenMP Test 1: Environment variables"
  5.         write(*,*) "Number of threads :",omp_get_num_threads()
  6.         write(*,*) "Number of CPU's available:",omp_get_num_procs()
  7.         call omp_set_num_threads(8) ! set the number of threads to 8
  8.         write(*,*) "#Threads outside the parallel section:",omp_get_num_threads()
  9.         !below we start a parallel section
  10.         !$OMP PARALLEL
  11.         write(*,*) "Number of threads in a parallel section :",omp_get_num_threads()
  12.         write(*,*) "Currently in thread with ID = ",omp_get_thread_num()
  13.         !$OMP END PARALLEL
  14.  
  15. end subroutine OpenMPTest1

 

Notice in the example code above that outside the parallel section indicated with the directives $OMP PARALLEL and $OMP END PARALLEL, the program only sees a single thread, while inside the parallel section 8 threads will run (independent of the number of cores available). 

C. Simple parallelization

The OpenMP frameworks consists of an set of directives¬†which can be used to manage the parallelization of your code (cheat-sheets for OpenMP v3.0¬†and¬†v4.0). I will not describe them in detail as there exists several very well written and full tutorials on the subject, we’ll just have a look at a quick and easy parallelization of a big for-loop. As said, OpenMP makes use of directives (or Pragma’s) which are placed as comments inside the code. As such they will not interfere with your code when it is compiled as a serial code (i.e. without the -fopenmp compiler flag). The directives are preceded by what is called a sentinel ( $OMP ). In the above example code, we already saw a first directive: PARALLEL. Only inside blocks delimited by this directive, can your code be parallel.

  1. subroutine OMPTest2()
  2.         use omp_lib;
  3.  
  4.         integer :: IDT, NT,nrx,nry,nrz
  5.         doubleprecision, allocatable :: A(:,:,:)
  6.         doubleprecision :: RD(1:1000)
  7.         doubleprecision :: startT, TTime, stt
  8.  
  9.         call random_seed()
  10.         call random_number(RD(1:1000))
  11.         IDT=500 ! we will make a 500x500x500 matrix
  12.         allocate(A(1:IDT,1:IDT,1:IDT))
  13.  
  14.         write(*,'(A)') "Number of preferred threads:"
  15.         read(*,*) NT
  16.         call omp_set_num_threads(NT)
  17.         startT=omp_get_wtime()
  18.         !$OMP PARALLEL PRIVATE(stt)
  19.         stt=omp_get_wtime()
  20.        
  21.         !$OMP DO
  22.         do nrz=1,IDT
  23.            do nry=1,IDT
  24.               do nrx=1,IDT
  25.               A(nrx,nry,nrz)=RD(modulo(nrx+nry+nrz,1000)+1)
  26.               end do
  27.            end do
  28.         end do
  29.         !$OMP END DO
  30.         write(*,*) "time=",(omp_get_wtime()-stt)/omp_get_wtick()," ticks for thread ",omp_get_thread_num()
  31.         !$OMP END PARALLEL
  32.         TTime=(omp_get_wtime()-startT)/omp_get_wtick()
  33.         write(*,*)" CPU-resources:",Ttime," ticks."
  34.  
  35.         deallocate(A)
  36.     end subroutine RunTest2

The program above fills up a large 3D array with random values taken from a predetermined list. The user is asked to set the number of threads (lines 14-16), and the function omp_get_wtime() is used to obtain the number of seconds since epoch, while the function omp_get_wtick() gives the number of seconds between ticks. These functions can be used to get some timing data for each thread, but also for the entire program. For each thread, the starting time is stored in the variable stt. To protect this variable of being overwritten by each separate thread, this¬†variable is declared as private to the thread (line 18: PRIVATE(stt) ). As a result, each thread will have it’s own private copy of the stt variable.

The DO directive on line 21, tells the compiler that the following loop needs to be parallelized. Putting the !$OMP DO pragma around the outer do-loop has the advantage that it minimizes the overhead produced by the parallelization (i.e. resources required to make local copies of variables, calculating the distribution of the workload over the different threads at the start of the loop, and combining the results at the end of the loop).

As you can see, parallelizing a loop is rather simple. It takes only 4 additional comment lines (!$OMP PARALLEL , !$OMP DO, !$OMP END DO and !$OMP END PARALLEL) and some time figuring out which variables should be private for each thread, i.e. which are the variables that get updated during each cycle of a loop. Loop counters you can even ignore as these are by default considered private. In addition, the number of threads is set on another line giving us 5 new lines of code in total. It is of course possible to go much further, but this is the basis of what you generally need.

Unfortunately, the presented example is not that computationally demanding, so it will be hard to see the full effect of  the parallelization. Simply increasing the array size will not resolve this as you will quickly run out of memory. Only with more complex operations in the loop will you clearly see the parallelization. An example of a more complex piece of code is given below (it is part of the phonon-subroutine in HIVE):

  1. !setup work space for lapack
  2.         N = this%DimDynMat
  3.         LWORK = 2*N - 1
  4.         call omp_set_num_threads(this%nthreads)
  5.         chunk=(this%nkz)/(this%nthreads*5)
  6.         chunk=max(chunk,1)
  7.         !$OMP PARALLEL PRIVATE(WORK, RWORK, DM, W, RPart,IO)
  8.         allocate(DM(N,N))
  9.         allocate( WORK(2*LWORK), RWORK(3*N-2), W(N) )
  10.         !the write statement only needs to be done by a single thread, and the other threads do not need to wait for it
  11.         !$OMP SINGLE
  12.         write(uni,'(A,I0,A)') " Loop over all ",this%nkpt," q-points."
  13.         !$OMP END SINGLE NOWAIT
  14.         !we have to loop over all q-points
  15.         !$OMP DO SCHEDULE(DYNAMIC,chunk)
  16.         do nrz=1,this%nkz
  17.             do nry=1,this%nky
  18.                 do nrx=1,this%nkx
  19.                     if (this%kpointListBZ1(nrx,nry,nrz)) then
  20.                         !do nrk=1,this%nkpt
  21.                         WORK = 0.0_R_double
  22.                         RWORK = 0.0_R_double                                                
  23.                         DM(1:this%DimDynMat,1:this%DimDynMat)=this%dynmatFIpart(1:this%DimDynMat,1:this%DimDynMat) ! make a local copy
  24.                         do nri=1,this%poscar%nrions
  25.                             do nrj=1,this%poscar%nrions
  26.                                 Rpart=cmplx(0.0_R_double,0.0_R_double)
  27.                                 do ns=this%vilst(1,nri,nrj),this%vilst(2,nri,nrj)
  28.                                     Rpart=Rpart + exp(i*(dot_product(this%rvlst(1:3,ns),this%kpointList(:,nrx,nry,nrz))))
  29.                                 end do
  30.                                 Rpart=Rpart/this%mult(nri,nrj)
  31.                                 DM(((nri-1)*3)+1:((nri-1)*3)+3,((nrj-1)*3)+1:((nrj-1)*3)+3) = &
  32.                                     & DM(((nri-1)*3)+1:((nri-1)*3)+3,((nrj-1)*3)+1:((nrj-1)*3)+3)*Rpart
  33.                             end do
  34.                         end do
  35.                         call MatrixHermitianize(DM,IOS=IO)
  36.                         call ZHEEV( 'V', 'U', N, DM, N, W, WORK, LWORK, RWORK, IO )
  37.                         this%FullPhonFreqList(:,nrx,nry,nrz)=sign(sqrt(abs(W)),W)*fac
  38.                     end if
  39.                 end do
  40.             end do
  41.         end do
  42.         !$OMP END DO
  43.         !$OMP SINGLE
  44.         write(uni,'(A)') " Freeing lapack workspace."
  45.         !$OMP END SINGLE NOWAIT
  46.         deallocate( WORK, RWORK,DM,W )
  47.         !$OP END PARALLEL

In the above code, a set of equations is solved using the LAPACK eigenvalue solver ZHEEV to obtain the energies of the phonon-modes in each point of the Brillouin zone. As the calculation of the eigenvalue spectrum for each point is independent of all other points, this is extremely well-suited for parallelization, so we can add !$OMP PARALLEL and !$OMP END PARALLEL on lines 7 and 47. Inside this parallel section there are several variables which are recycled for every grid point, so we will make them PRIVATE (cf. line 7, most of them are work-arrays for the ZHEEV subroutine).

Lines 12 and 44 both contain a write-statement. Without further action, each thread will perform this write action, and we’ll end up with multiple copies of the same line (Although this will not break your code it will look very sloppy to any user of the code). To circumvent this problem we make use of the !$OMP SINGLE directive. This directive makes sure only 1 thread (the first to arrive) will perform the write action. Unfortunately, the¬†SINGLE block will create an implicit barrier at which all other threads will wait. To prevent this from happening, the NOWAIT clause is added at the end of the block. In this specific case, the NOWAIT clause will have only very limited impact due to the location of the write-statements. But this need not always to be the case.

On line 15 the !$OMP DO pragma indicates a loop will follow that should be parallelized. Again we choose for the outer loop, as to reduce the overhead due to the parallelization procedure. We also tell the compiler how the work should be distributed using the SCHEDULE(TYPE,CHUNK) clause. There are three types of scheduling:

  1. STATIC: which is best suited for homogeneous workloads. The loop is split in equal pieces (size given by the optional parameter CHUNK, else equal pieces with size=total size/#threads)
  2. DYNAMIC: which is better suited if the workload is not homogeneous.(in this case the central if-clause on line 19 complicates things). CHUNK can again be used to define the sizes of the workload blocks.
  3. GUIDED: which is a bit like dynamic but with decreasing block-sizes.

From this real life example, it is again clear that OpenMP parallelization in fortran can be very simple.

D. Speedup?

On my loyal sidekick (with hyper-threaded quad-core core i7) I was able to get following speedups for the phonon-code (the run was limited to performing only a phonon-DOS calculation):

speedup due to openMP parallelization

Speedup of the entire phonon-subroutine due to parallelization of the main-phonon-DOS loop.

The above graph shows the speed-up results for the two different modes for calculating the phonon-DOS. The reduced mode (DM red), uses a spectrum reduced to that of a unit-cell, but needs a much denser sampling of the Brillouin zone (second approach), and is shown by the black line. The serial calculation in this specific case only took 96 seconds, and the maximum speedup obtained was about x1.84. The red and green curves give the speedup of the calculation mode which makes use of the super-cell spectrum (DM nored, i.e. much larger matrix to solve), and shows for increasing grid sizes a maximum speedup of x2.74 (serial time: 45 seconds) and x3.43 (serial time 395 seconds) respectively. The reason none of the setups reaches a speedup of 4 (or more) is twofold:

  1. Amdahl’s law puts an upper limit to the global speedup of a calculation by taking into account that only part of the code is parallelized (e.g. write sections to a single file can not be parallelized.)
  2. There needs to be sufficient blocks of work for all threads (indicated by nkz in the plot)

In case of the DM nored calculations, the parallelized loop clearly takes the biggest part of the calculation-time, while for the DM red calculation, also the section generating the q-point grid takes a large fraction of the calculation time limiting the effect of parallelization. An improvement here would be to also parallelize  the subroutine generating the grid, but that will be for future work. For now, the expensive DM nored calculations show an acceptable speedup.

 

Jun 19

To x64 or not to x64: Installing a 64-bit fortran compiler

Current day computers generally have 64-bit processors, and most even have 64-bit operating systems. On such systems, 32-bit programs will run fine, but 64-bit programs can make more efficient use of the underlying system. When we installed a fortran compiler and the code::blocks IDE, the default fortran compiler generated 32-bit programs.¬†This generally is not an issue, unless you need a large amount of memory, for example to store a temporary array with¬†4003 double precision coordinates (as I did for a project I’m currently working on). You may first start to look for ways of increasing the stack-size of your program, but you will soon discover that the problem is more profound: a 32-bit program cannot access address spacing beyond 4Gb. (In practice, generally you will not even reach 4Gb before running into problems.) This is because the memory address of your data is stored as a 32-bit value (232¬†= 4 294 967 296 = 4Gb) so the only way out of this predicament is a “larger address” aka 64-bit. So you need to install a new compiler capable of providing 64-bit programs.

  1. Installing minGW64 for code::blocks
    1. Installing the compiler
    2. Setting the PATH-variable (win10)
    3. Adding the compiler to code::blocks
  2. Upgrading Lapack to 64-bit

Read the rest of this entry »

Apr 06

Animated Primes

Optimus Prime

Optimus Prime, the most prime of all autobots.[source]

2, 3, 5, 7, 11, 13, 17, 19, 23, …and on toward infinity. Prime numbers are a part of mathematics which seduce most of us at some point in our lives to try and “solve them”. There is structure in them, but too little to be easily understandable, and too much to let go and consider them chaos.

Recently, my girlfriend got sucked into this madness. Luckily she is not trying to generate the next largest prime (which would need to have more than 22 Million digits, although I might have jinxed it now). Instead she is looking for order into chaos; leading to pen and paper graphs, evolving into excel worksheets being abused as ordinary bitmap pictures and finally mathematica applications for even more heavy lifting.

These last exercises came to a somewhat grinding halt when mathematica refused to generate a gif-animation (it just failed to do the job without providing any warning that something might have gone awry). However, because it could generate the separate frames without any issues, I was tempted to dig up an old fortran module which allowed me to generate gif images and animations. I remembered correctly that it was able to both read and write gif-images, and so, after 10 minutes of writing a small program (9 minutes remembering how things worked) and another 10 linking in all dependencies and fixing some inconsistencies, I had a working program for generating a gif animation from separate gif images.

Animated gif, generated in fortran.

The initial test went smoothly. Using some simple gif images I created in paint, the resulting 4 frame animation is the one on the left (which is probably giving you a headache as you read this ;-).

The real test, with the single frame gifs Sylvia had prepared, was rather interesting: It failed miserably. Fortunately it also taught us the origin of the problem: mathematica was generating frames of different dimensions. And although this does not give rise to problems in the workbook or generated avi, it does kill an animated gif. After fixing this, also mathematica successfully generated the animated gif Sylvia wanted to see. Also my own small program was now able to perform the intended trick, although it showed that a different color-map was created for each frame by mathematica, something my own implementation did not handle well (we’ll fix that somewhere in the future).

The executable¬†(including prerequisite¬†dll’s) can be downloaded here.

If you do not suffer from epileptic seizures (or not yet), you could check out the results in the spoiler below.

Spoiler Inside: gif-animations SelectShow

Aug 29

Fortran dll’s and libraries: a Progress bar

In the previous fortran tutorials, we learned the initial aspects of object oriented programming (OOP) in fortran 2003. And even though our agent-based opinion-dynamics-code is rather simple, it can quickly take several minutes for a single run of the program to finish. Two tools which quickly become of interest for codes that need more than a few minutes to run are: (1) a progress bar, to track the advance of the “slow” part of the code and prevent you from killing the program 5 seconds before it is to finish, and (2) a timer, allowing you to calculate the time needed to complete certain sections of code, and possibly make predictions of the expected total time of execution.

In this tutorial, we will focus on the progress bar. Since our (hypothetical) code is intended to run on High-Performance Computing (HPC) systems and is written in the fortran language, there generally is no (or no easy) access to GUI’s. So we need our¬†progress bar class to¬†run in a command line user interface. Furthermore, because it is such a widely useful tool we want to build it into a (shared) library (or dll in windows).progress_1pct

The progress bar class

What do we want out of our progress bar? It needs to be easy to use,¬†flexible and smart enough to work nicely even for a lazy user. The output it should provide is formatted as follows: <string> <% progress> <text progress bar>, where the string is a custom character string provided by the user, while ‘%progress’ and ‘text progress bar’ both show the progress.¬†The first shows the progress as an updating number (fine grained), while the second shows it visually as a growing bar (coarse grained).

  1. type, public :: TProgressBar
  2. private
  3. logical :: init
  4. logical :: running
  5. logical :: done
  6. character(len=255) :: message
  7. character(len=30) :: progressString
  8. character(len=20) :: bar
  9. real :: progress
  10. contains
  11. private
  12. procedure,pass(this),public :: initialize
  13. procedure,pass(this),public :: reset
  14. procedure,pass(this),public :: run
  15. procedure,pass(this),private:: printbar
  16. procedure,pass(this),private:: updateBar
  17. end type TProgressBar

All properties of the class are private (data hiding), and only 3 procedures are available to the user: initialize, run and reset. The procedures, printbar and updatebar are private, because we intend the class to be smart enough to decide if a new print and/or update is required. The reset procedure is intended to reset all properties of the class. Although one might consider to make this procedure private as well, it may be useful to allow the user to reset a progress bar in mid progress.(The same goes for the initialize procedure.)

  1. subroutine run(this,pct,Ix,msg)
  2. class(TProgressBar) :: this
  3. real::pct
  4. integer, intent(in), optional :: Ix
  5. character(len=*),intent(in),optional :: msg
  6.  
  7. if (.not. this%init) call this%initialize(msg)
  8. if (.not. this%done) then
  9. this%running=.true.
  10. this%progress=pct
  11. call this%updateBar(Ix)
  12. call this%printbar()
  13. if (abs(pct-100.0)<1.0E-6) then
  14. this%done=.true.
  15. write(*,'(A6)') "] done"
  16. end if
  17. end if
  18.  
  19. end subroutine run

In practice, the run procedure is the heart of the class, and the only procedure needed in most applications. It takes 3 parameters: The progress (pct), the number of digits to print of pct (Ix),and the <string> message (msg). The later two parameters are even optional, since msg may already have been provided if the initialize procedure was called by the user. If the class was not yet initialized it will be done at the start of the procedure. And while the progress bar has not yet reached 100% (within 1 millionth of a %) updates and prints of the bar are performed. Using a set of Boolean properties (init, running, done), the class keeps track of its status. The update and print procedures just do this: update the progress bar data and print the progress bar. To print the progress bar time and time again on the same line, we need to make use of the carriage return character (character 13 of the ASCII table):

write(*,trim(fm), advance='NO') achar(13), trim(this%message),trim(adjustl(this%progressString)),'%','[',trim(adjustl(this%bar))

The advance=’NO‘ option¬†prevents the write statement to¬†move to the next line. This can sometimes have the unwanted side-effect that the write statement above does not appear on the screen. To force this, we can use the fortran 2003 statement¬†flush(OUTPUT_UNIT), where “OUTPUT_UNIT” is a constant defined in the intrinsic fortran 2003 module iso_fortran_env. For older versions of fortran, several compilers provided a (non standard) flush subroutine that could be called to perform the same action.¬†As such, we now have our class ready to be used. The only thing left to do is to turn it into a dll or shared library.progress_25pct

How to create a library and use it

There are two types of libraries: static and dynamic.

Static libraries¬†are used to provide access to functions/subroutines at compile time to the library user. These functions/subroutines are then included in the executable that is being build.¬†In linux environments these will have the extension “.a”, with the .a referring to archive. In a windows environment the extension is “.lib”, for library.

Dynamic libraries are used to provide access to functions/subroutines at run time. In contrast to static libraries, the functions are not included in the executable, making it smaller in size. In linux environments these will have the extension “.so”, with the .so referring to shared object. In a windows environment the extension is “.dll”, for dynamically linked library.

In contrast to C/C++, there is relatively little information to be found on the implementation and use of libraries in fortran. This may be the reason why many available fortran-“libraries” are not really libraries, in the sense meant here.¬†Instead they are just¬†one or more files of fortran code shared by their author(s), and there is nothing wrong with that.¬†These files can then be compiled and used¬†as any other module file.

So how do we create a library from our Progressbar class? Standard examples start from a set of procedures one wants to put in a library. These procedures are put into a .f or .f90 file. Although they are not put into a module (probably due to the idea of having compatibility with fortran 77) which is required for our class, this is not really an issue. The same goes for the .f03 or .f2003 extension for our file containing a fortran 2003 class. To have access to our class and its procedures in our test program, we just need to add the use progressbarsmodule clause. This is because our procedures and class are incorporated in a module (in contrast to the standard examples). Some of the examples I found online also include compiler dependent pragmas to export and import procedures from a dll. Since I am using gfortran+CB for development, and ifort for creating production code, I prefer to avoid such approaches since it hampers workflow and introduces another possible source of bugs.

The compiler setups I present below should not be considered perfect, exhaustive or fool-proof, they are just the ones that work fine for me. I am, however, always very interested in hearing other approaches and fixes in the comments.progress_52pct

Windows

The windows approach is very easy. We let Code::Blocks do all the hard work.

shared library: PBar.dll

Creating the¬†dll : Start a new project, and select the option “Fortran DLL“. Follow the instructions, which are similar to the setup of a standard fortran executable. Modify/replace/add the fortran source you wish to include into your library and build your code (you can not run it since it is a library).

Creating a user program : The program in which you will be using the dll is setup in the usual way. And to get the compilation running smoothly the following steps are required:

  • Add the use myspecificdllmodule¬†clause where needed, with myspecificdllmodule the name of the module included in the dll you wish to use at that specific point.
  • If¬†there are modules included in the dll, the *.mod files need to be present for the compiler to access upon compilation of the user program. (Which results in a limitation with regard to distribution of the dll.)
  • Add the library to the linker settings of the program (project>build options>linker settings), and then add the .dll file.
  • Upon running the program you only need the program executable and the dll.

static library

The entire setup is the same as for the shared library. This time, however,¬†choose the “Fortran Library” option instead of Fortran dll. As the static library is included in the executable, there is no need to ship it with the executable, as is the case for the dll.

Unix

For the unix approach we will be working on the command line, using the intel compiler, since this compiler is often installed at HPC infrastructures.

static library: PBar.a

After having created the appropriate fortran files you wish to include in your library (in our example this is always a single file: PBar.f03, but for multiple files you just need to replace PBar.f03 with the list of files of interest.)

  1. Create the object files:
    ifort -fpic -c -free -Tf Pbar.f03

    Where -fpic tells the compiler to generate position independent code, typical for use in a shared object/library, while -c tells the compiler to create an object file. The -free and -Tf compiler options are there to convince the compiler that the f03 file is actual fortran code to compile and that it is free format.

  2. Use the GNU ar tool to combine the object files into a library:
    ar rc PBarlib.a PBar.o
  3. Compile the program with the library
    ifort TestProgram.f90 PBarlib.a -o TestProgram.exe

    Note that also here the .mod file of our Progressbarsmodule needs to be present for the compilation to be successful.

shared library: PBar.so

For the shared library the approach does not differ that much.

  1. Create the object files:
    ifort -fpic -c -free -Tf Pbar.f03

    In this case the fpic option is not optional in contrast to the static library above. The other options are the same as above.

  2. Compile the object files into a shared library:
    ifort -shared PBar.o -o libPBar.so

    The compiler option -shared creates a shared library, while the -o option allows us to set the name of the library.

  3. Compile the program with the library
    ifort TestProgram.f90 libPBar.so -o TestProgram.exe

    Note that also here the .mod file of our Progressbarsmodule needs to be present for the compilation to be successful. To run the program you also need to add the location of the library file libPBar.so to the environment variable LD_LIBRARY_PATH

One small pickle

HPC systems may perform extensive buffering of data before output, to increase the efficiency of the machine (disk-writes are the slowest¬†memory access option)…and as a result this can sometimes overrule our flush command. The progressbar in turn will not show much progress until it is actually finished, at which point the entire bar will be shown at once. There are options to force the infrastructure not to use this buffering (and the system administrators in general will not appreciate this), for example by setting the compiler flag -assume nobuffered_stdout. So the best solution for HPC applications will be the construction of a slightly modified¬†progress bar, where the carriage return is not used.

progress_100pct

 

Special thanks also to the people of stack-exchange for clarifying some of the issues with the modules.

Source files for the class and test-program can be downloaded here.

 

May 22

Tutorial OOP(II): One problem, different possible classes

additional resources
agent paper: Sobkowicz
source-code: AgentTutorials
Arxiv: Full Tutorial

In the previous tutorial, we saw how to tackle an opinion dynamics problem using agents as a class in an Object Oriented Programming (OOP) approach. In many topics of interest in (socio-)physics and chemistry, we deal with a large number of particles, be it electrons, atoms, agents, stars, … These are contained in a superstructure (electrons‚áíatom, atoms‚áímolecule/solid, agents‚áípopulation, stars‚áígalaxy,…) which is generally represented in the code as an array. As we noted in the previous tutorial, there were several variables which were global to the agents, but we implemented them as properties of the agents anyhow. As a result, a significant amount of additional memory needed to be allocated for storing in essence the same data. This was done to prevent the need of having to provide this information at every function call.

Returning to our problem of interest, we now consider two object classes: The TAgent-class and the TPopulation-class. This leads to several possible ways this problem can be implemented.

  1. Array of TAgents: As was done in the previous tutorial, we only make a class of the agents, and put them in an array.
  2. TPopulation of TAgents: In this case we construct a class called TPopulation of which one property is the set of TAgents. The TPopulation-class also contains some of the global variables as properties, and operations on this set are methods of the TPopulation-class.
  3. TPopulation-class without TAgent-class: In this last case, the agents are dissolved, and their properties are stored in array-properties of the TPopulation-class. The methods of the TAgent-class now become methods of the TPopulation-class. And the global variables become additional properties of the TPopulation-class.

Although the true OOP-programmer may only consider the  second option the way to go, we will consider the third option in this tutorial.

Read the rest of this entry »

May 03

Tutorial OOP(I): Objects in Fortran 2003

After having set up our new project in the first session of this tutorial, we now come to an important second step: choosing and creating our Objects. In OOP, the central focus is not a (primitive) variable or a function, but ‚Äúan object‚ÄĚ. In Object Oriented Programming (OOP) most if not all variables and functions are incorporated in one or more (types of) objects. An object, is just like a real-life object; It has properties and can do things. For example: a car. It has properties (color, automatic or stick, weight, number of seats,…) and can do things (drive, break down, accelerate,…). In OOP, the variables containing the values that give the color, weight, stick or not,… of the car are called the properties of the car-object. The functions that perform the necessary calculations/modifications of the variables to perform the actions of driving, breaking down,… are called the methods.

Since we are still focusing on the opinion dynamics paper of Sobkowicz, let us use the objects of that paper to continue our tutorial. A simplified version of the research question in the paper could be as follows:

How does the (average) opinion of a population of agents evolve over time?

For our object-based approach, this already contains much of the information we need. It tells us what our ‚Äúobjects‚ÄĚ could be: agents. It gives us properties for these objects: opinion. And it also tells us something of the methods that will be involved: opinion…evolve over time.

Let us now put this into Fortran code. A class definition in Fortran uses the TYPE keyword, just like complex data types.

  1. Type, public :: TAgentClass
  2. private
  3. real :: oi !< opinion
  4. contains
  5. private
  6. procedure, pass(this), public :: getOpinion
  7. procedure, pass(this), public :: setOpinion
  8. procedure, pass(this), public :: updateOpinion
  9. end type TAgentClass

Read the rest of this entry »

Apr 26

Tutorial: Starting a new Object Oriented Fortran-project in Code::Blocks

Because Fortran has been around since the time of the dinosaurs, or so it seems, one persistent myth is that modern constructs and techniques are not supported. One very important modern programming approach is Object Oriented Programming (OOP). Where procedural programming focuses on splitting the program in smaller functional units (functions and procedures), OOP is centered on “Objects and Classes”. These objects, just like real life objects, have properties (eg.¬†name, color, position, …) and contain methods (i.e. procedures representing their functionality). This sounds more complex than it actually is. In a very simplistic sense, objects and classes are created¬†by extending¬†complex data types in procedural programming languages¬†with¬†internally contained functions/procedures, referred to as methods. In the table below, you see a limited comparison between the two. The first three languages are procedural oriented languages, while the last two allow for OOP.

C Pascal F95 C++ OOP F2003 OOP
data type keyword struct record type class type
containing variables field field field property property
containing functions / / / method method
inheritance no no no yes yes

In this tutorial series, I want to show how you can use OOP in Fortran in scientific codes. As a first example, I will show you how to create an agent based opinion dynamics code implementing the model presented by Sobkowicz in a publication I recently had  to review.

I choose this as our tutorial example (over for example geometric shapes) because it is a real-life scientific example which is simple enough to allow for several aspects in scientific and OO-programming to be shown. In addition, from the physicists point of view: particle¬†or¬†agent, it is all the same to us when we are implementing their behavior into a program.(You could even start thinking in terms of NPC’s for a game, although game-dynamics in general require quite a bit more implementation)

So let us get started. First, you might wish to grab a copy of the paper by Sobkowicz (it’s freely accessible), just to be able to more easily track the notation I¬†will be using in the code. After having installed code::blocks it is rather straight-forward to create a new Fortran program.

  1. Open code::blocks, and start a new project (File > New > Project) by selecting “Fortran application”newProject
  2. Provide a “Project title” and a location to store the files of the new project.setupProject1
  3. Choose your compiler (GNU Fortran Compiler, make sure this is the one selected) and Finish.setupProject2
  4. Now Code::Blocks will present you a new project, with already one file present: main.f90 HelloWorld
  5. First we are going to rename this main file, e.g. to main.f95, to indicate that we will be  using Fortran 95 code in this file. By right-clicking on the file name (indicated with the red arrow) you see a list of possible op w this option while the file is open in the right-hand pane. Close the file there, and now you can rename main.f95.
  6. Although we will be following the OOP paradigm for the main internal workings of our program, a procedural setup will be used to set up the global lines of the program. I split the code in three levels: The top-level being the main program loop which is limited to the code below, the middle level which contains what you may normally consider the program layout (which you can easily merge with the top level) and the bottom level which contains the OO innards of our program. All subroutines/functions belonging to the middle level will be placed in one external module (Tutorial1) which is linked to the main program through the use-statement at line 2 and the single subroutine call (RunTutorial1).
  1. program AgentTutorials
  2. use Tutorial1;
  3. implicit none
  4.  
  5. write(*,'(A)') "Welcome to the Agents OOP-Fortran Tutorial"
  6. call RunTutorial1()
  7. write(*,'(A)') "-- Goodbye"
  8. end program AgentTutorials
  1. Next step, we need a place for the RunTutorial1 subroutine, and all other subroutines/functions that will provide the global layout of our agents-program. Make a new file in Code::Blocks as followsNewFile
  2. and select a Fortran source file and finish by placing it in a directory of your choice and click OK.NewFile2
  3. In this empty file, a module is setup in the following fashion:
  1. module Tutorial1
  2. implicit none
  3. private
  4.  
  5. public :: RunTutorial1
  6.  
  7. contains
  8. !+++++++++++++++++++++++++++++++++++++
  9. !>\brief Main program routine for the first agents tutorial
  10. !!
  11. !! This is the only function accessable outside this module.
  12. !<------------------------------------
  13. subroutine RunTutorial1()
  14. write(*,'(A)') "----Starting Turorial 1 subprogram.----"
  15. write(*,'(A)') "-------End Turorial 1 subprogram.------"
  16. end subroutine RunTutorial1
  17.  
  18. end module Tutorial1
  1. implicit none : Makes sure that you have to define all variables.
  2. private : The private statement on line 3 makes sure that everything inside the module remains hidden for the outside world, unless you explicitly make it visible. In addition to data/information-hiding this also helps you keep your namespace in check.
  3. public :: Runtutorial1 : Only the function Runtutorial1 is available outside this module.(Because we want to use it in program  AgentTutorials.)
  4. contains : Below this  keyword all functions and subroutines of the module will be placed.
  5. “!” : The exclamation marks indicate a comment block. In this case it is formatted to be parsed by doxygen.
  6. subroutine XXX ¬†/ end subroutine XXX : The subroutine we will be implementing/using as the main loop of our agents program. Note that in recent incarnations of fortran the name of the subroutine/function/module/… is repeated at their end statement. For small functions this may look ridiculous, but for larger subroutines it improves readability.

At this point or program doesn’t do that much, we have only set up the framework in which we will be working. Compiling and running the program (F9 or Build > Build and run) should show you:

AgentsRun1 In the next session of this tutorial we will use OO-Fortran 2003 to create the agent-class and create a first small simulation.

 

 

Mar 02

Start to Fortran

Code-statistics for the hive3 code (feb.2015)

Code-statistics for the hive3 code (feb.2015)

If you are used to programming in C/C++, Java or Pascal, you probably do this using an Integrated Development Environment (IDE‚Äôs) such as Dev-Cpp/Pascal, Netbeans, Eclipse, … There are dozens of free IDE‚Äôs for each of these languages. When starting to use fortran, you are in for a bit of a surprise. There are some commercial IDE‚Äôs that can handle fortran (MS Visual Studio, or the Lahey IDE). Free fortran IDEs are rather scarce and quite often are the result of the extension of a C++ focused IDE. This, however, does not make them less useful. Code::Blocks is such an IDE. It supports several¬†programming and scripting languages including C and fortran, making it also suited for mixed languages development. In addition, this IDE has been developed for Windows, Linux and Mac Os X, making it highly portable. Furthermore, installing this IDE combined with for example the gcc compiler can be done quickly and without much hassle, as is explained in this excellent tutorial. In 5¬†steps everything is installed and you are up and running:

  1. Get a gfortran compiler at https://gcc.gnu.org/wiki/GFortran

    Go for binaries and get the installer if you are using Windows. This will provide you with the latest build. Be careful if you are doing this while upgrading from gfortran 4.8 to 4.9 or 4.10. The latter two are known to have a recently fixed compiler-bug related to the automatic finalization of objects. A solution to this problem is given in this post.

    UPDATE 03/02/2017: As the gcc page has changed significantly since this post was written, I suggest to follow the procedure described here for the installation of a 64bit version of the compiler.

  2. Get the Code::Blocks IDE at http://www.codeblocks.org/ or http://cbfortran.sourceforge.net/ (preferred)

    Since version 13.12 the Fortranproject plugin is included in the Code::Blocks installation.

  3. Setup gfortran

    Run the installer obtained at step 1…i.e. keep clicking OK until all is finished.

  4. Setup Code::Blocks for fortran
    1. Run the installer or Unzip the zip-file obtained in step 2.
    2. Run Code::Blocks and set your freshly installed GNU fortran compiler as default.
    3. Associate file types with Code::Blocks. If you are not using other IDE’s this may be an interesting idea
    4. Go to settings, select ‚ÄúCompiler and Debugger‚ÄĚ, click on ‚ÄúToolchain executables‚ÄĚ and set the correct paths.
    5. Code::blocks has been configured.
  5. Your first new fortran program
    1. Go to ‚ÄúFile‚ÄĚ ‚Üí ‚ÄúNew‚ÄĚ ‚Üí¬†‚ÄúProject‚ÄĚ.
    2. Select ‚ÄúFortran Application‚ÄĚ.
    3. Follow the Wizard: provide a project folder and title.
    4. Make sure the compiler is set to ‚ÄúGNU Fortran Compiler‚ÄĚ, and click Finish.
    5. A new project is now being created, containing a main file named ‚Äúmain.f90‚ÄĚ
    6. Click ‚ÄúBuild‚ÄĚ, to build this program, and then ‚ÄúRun‚ÄĚ.
    7. Congratulations your first Fortran program is a fact.

 

Of course any real project will contain many files, and when you start to create fortran 2003/2008 code you will want to use ‚Äú.f2003‚ÄĚ or ‚Äú.f03‚ÄĚ instead of ‚Äú.f90‚ÄĚ . The Code::Blocks IDE is well suited for the former tasks, and we will return to these later. Playing with this IDE is the only way to learn about all its options. Two really nice plugins are ‚ÄúFormat Fortran Indent‚ÄĚ and ‚ÄúCode statistics‚ÄĚ. The first one can be used to auto-indent your Fortran code, making it easier to find those nasty missing ‚Äúend‚ÄĚ statements. The code statistics tool runs through your entire project and tells you how many lines of code you have, and how many lines contain comments.

Feb 07

Fortran: Tales of the Living Dead?

Fortran, just like COBOL (not to be confused with cobold), is a programming language which is most widely known for its predicted demise. It has been around for over half a century, making it the oldest high level programming language. Due to this age, it is perceived by many to be a language that should have gone the way of the dinosaur a long time ago, however, it seems to persist, despite futile attempts to retire it.¬†One of the main concerns is that the language is not up to date, and there are so many nice high level languages which allow you to do things much easier and ‚Äúas fast‚ÄĚ. Before we look into that, it is important to know that there are roughly two types of programming languages:

  1. Compiled languages (e.g. Fortran, C/C++, Pascal)
  2. Interpreted and scripting languages (e.g. Java, PHP, Python, Perl)

The former languages result in binary code that is executed directly on the machine of the user. These programs are generally speaking fast and efficient. Their drawback, however, is that they are not very transferable (different hardware, e.g. 32 vs. 64 bit, tend to be a problem). In contrast, interpreted languages are ‚Äėcompiled/interpreted‚Äô at run-time by additional software installed on the machine of the user (e.g.¬†JVM: Java virtual machine), making these scripts rather slow and inefficient since they are reinterpreted on each run. Their advantage, however, is their transferability and ease of use. ¬†Note that Java is a bit of a borderline case; it is not entirely a full programming language like C and Fortran, since it requires a JVM to run, however, it is also not a pure scripting language like python or bash where the language is mainly used to glue other programs together.

Fortran has been around since the age of the dinosaurs. (https://onionesquereality.wordpress.com/)

Fortran has been around since the age of the dinosaurs. (via Onionesque Reality)

Now let us return to Fortran. As seen above, it is a compiled language, making it pretty fast. It was designed for scientific number-crunching purposes (FORTRAN comes from FORmula TRANslation) and as such it is used in many numerical high performance libraries (e.g. (sca) LAPACK and BLAS). The fact that it appeared in 1957 does not mean nothing has happened since. Over the years the language evolved from a procedural language, in FORTRAN II, to one supporting also modern Object Oriented Programming (OOP) techniques, in Fortran 2003. It is true that new techniques were introduced later than it was the case in other languages (e.g. the OOP concept), and many existing scientific codes contain quite some ‚Äúold school‚ÄĚ FORTRAN 77. This gives the impression that the language is rather limited compared to a modern language like C++.

So why is it still in use? Due to its age, many (numerical) libraries exist written in Fortran, and due to its performance in a HPC environment. This last point is also a cause of much debate between Fortran and C adepts. Which one is faster? This depends on many things. Over the years compilers have been developed for both languages aiming at speed. In addition to the compiler, also the programming skills of the scientist writing the code are of importance. As a result, comparative tests end up showing very little difference in performance [link1, link2]. In the end, for scientific programming, I think the most important aspect to consider is the fact that most scientist are not that good at programming as they would like/think (author included), and as such, the difference between C(++) and Fortran speeds for new projects will mainly be due to this lack of skills.

However, if you have no previous programming experience, I think Fortran may be easier and safer to learn (you can not play with pointers as is possible with C(++) and Pascal, which is a good thing, and you are required to define your variables, another good coding practice (Okay, you can use implicit typing in theory, but more or less everybody will suggest against this, since it is bad coding practice)). It is also easier to write more or less bug free code than is possible in C(++) (remember defining a global constant PI and ending up with the integer value 3 instead of 3.1415…). Also its long standing procedural setup keeps things a bit more simple, without the need to dive into the nitty gritty details of OOP, where you should know that you are handling pointers (This may be news for people used to Java, and explain some odd unexpected behavior) and getting to grips with concepts like inheritance and polymorphism, which, to my opinion, are rather complex in C++.

In addition, Fortran allows you to grow, while retaining ‚Äėold‚Äô code. You can start out with simple procedural design (Fortran 95) and move toward Object Oriented Programming (Fortran 2003) easily. My own Fortran code is a mixture of Fortran 95 and Fortran 2003. (Note for those who think code written using OOP is much slower than procedural programming: you should set the relevant compiler flags, like ‚Äďipo )

In conclusion, we end up with a programming language which is fast, if not the fastest, and contains most modern features (like OOP). Unlike some more recent languages, it has a more limited user base since it is not that extensively used for commercial purposes, leading to a slower development of the compilers (though these are moving along nicely, and probably will still be moving along nicely when most of the new languages have already been forgotten again). Tracking the popularity of programming languages is a nice pastime, which will generally show you C/C++ being one of the most popular languages, while languages like Pascal and Fortran dangle somewhere around 20th-40th position, and remain there over long timescales.

The fact that Fortran is considered rather obscure by proponents of newer scripting languages like Python can lead to slightly funny comment like:‚ÄúWhy don‚Äôt you use Python instead of such an old and deprecated language, it is so such easier to use and with the NumPy and SciPy library you can also do number-crunching.‚ÄĚ. First of all, Python is a scripting language (which in my mind unfortunately puts it at about the same level as HTML and CSS ūüôā ¬†), but more interestingly, those libraries are merely wrappers around C-wrappers around FORTRAN 77 libraries like MINPACK. So people suggesting to use Python over Fortran 95/2003 code, are actually suggesting using FORTRAN 77 over more recent Fortran standards. Such comments just put a smile on my face. ūüėÄ ¬†With all this in mind, I hope to show in this blog that modern Fortran can tackle all challenges of modern scientific programming.

Jan 16

CA: Coders Anonymous

It has been 2 weeks since I last wrote some code, today I started again.

Two weeks ago  I started the, for me, daunting task of upgrading my IDE and compiler to their most recent version. The upgrade itself went smoothly, since it basically consisted of uninstalling the old versions and installing the new ones. The the big finale, recompiling my fortran codebase, went just a little bit less smoothly. It crashed straight into a compiler-bug, nicely introduced in version 4.9 of the gcc fortran compiler, and carefully nurtured up to version 4.10 I had just installed. The bug sounds as follows:

error: internal compiler error: in gfc_conv_descriptor_data_get, at fortran/trans-array.c:145
end module TPeriodicTableModule

Clear sounding as it is, it required some further investigation to find out what was actually the problem and if and how it can be resolved. The problem appeared to be a rather simple one; The compiler seems to be unable to generate the finalization code for some object based constructions involving both fixed size and allocatable arrays, the strong suite of the fortran language.  A minimal example allowing you to bump into this compiler bug goes as follows:

 

Bug 59765   
  1. module bug59765
  2. type TSubObject
  3. integer, dimension(:), allocatable :: c
  4. end type TSubObject
  5. type TObject
  6. type(TSubObject), dimension(1) :: u
  7. end type TObject
  8. contains
  9.  
  10. subroutine add(s)
  11. class(TObject), intent(inout) :: s
  12. end subroutine add
  13.  
  14. end module bug59765

 

The issue arises when the compiler tries to setup the deallocation of the allocatable array of the TSubObject elements of the array u. Apparently the combination of the static array u and allocatable arrays c in the elements of u result in confusion. It was suggested that the compiler wants to perform the deallocation procedure as an array operation (one of the neat tricks fortran has up its sleeves):

deallocate(s%u(:)%c)

instead it should just use a normal do-loop and run over all elements of u.

One of the main ironies of this story is that this bug is strongly connected to object oriented programming, a rather new concept in the world of fortran. Although introduced in fortran 2003, more than 10 year ago, compiler support for these features have only reached basic maturity in recent years. The problem we are facing is one in the destructor of an object: the smart compiler wants to make our life easy, and implicitly create a good destructor for us. As with most smart solutions of modern day life, such things have a tendency to fail when you least expect it.

However, this bug (and the fact that it persists in more recent versions of the compiler) forces us to employ¬†good coding practices: write a destructor yourself. Where C++ has implemented keywords for both constructor and destructor, the fortran programmer, as yet, only has a keyword for a destructor: final. This finalization concept was introduced in the fortran 2003 standard as part of the introduction of the Object Oriented Programming paradigm. A final procedure/function also works slightly different than what you may be used to in for example C++, namely, it is not directly callable by the programmer as an object-function/procedure. A final procedure/function is only called upon in an automatic way when an object is destroyed. So for those of us who also implement ¬†‘free()‘ procedures to clean-up objects at runtime, this means some extra work may be needed (I haven’t checked this in detail).

So how is our example-problem healed from bug 59765? Through the introduction of our own destructor.

  1. module fixbug59765
  2. type TSubObject
  3. integer, dimension(:), allocatable :: c
  4. contains
  5. final :: destroy_TSubObject
  6. end type TSubObject
  7. type TObject
  8. type(TSubObject), dimension(1) :: u
  9. end type TObject
  10. contains
  11.  
  12. subroutine add(s)
  13. class(TObject), intent(inout) :: s
  14. end subroutine add
  15.  
  16. subroutine destroy_TSubObject(this)
  17. type(TSubObject) :: this !note: this needs to be a type not a class
  18.  
  19. if (allocated(this%c)) deallocate(this%c)
  20. end subroutine destroy_TSubObject
  21.  
  22. end module fixbug59765

In my own code, both the TSubObject and TObject classes got their own final procedure, due to the slightly higher complexity of the objects involved. The resulting code compiled without further complaints, and what is more, it also still compiled with the recent intel ifort compiler. Unfortunately, final procedures are only included in the gcc compiler since version 4.9, making code containing them incompatible with the gcc version 4.8 and earlier.