Simple Parallelization in Fortran: OpenMP

Pentium2speedThe first PC we got at our home was a Pentium II. My dad got it, because I was going to university, and I would be able to do something “useful” with it. (Yup, I survived my entire high school career searching stuff in the library and the home encyclopedia. Even more, Google didn’t even exist before we got our computer, as the company was only founded in 1998 🙂 ). The machine was advertised as state of the art with a clock rate of a whooping 233 MHz! During the decade that followed, the evolution of the clock rates kept going at a steady pace, until it saturated at about 3-4 GHz(15 times faster than the 233 MHz) around 2005. Since then, the clock rate has not increased a bit. If anything, the average clock rate has even decreased to the range 2-3 GHz. As power-consumption grows quadratically with with the clock rate, this means that (1) there is much more heat produced, that needs to be transported away from your CPU (otherwise it get’s destroyed), (2) reducing the clock rate by a factor 2, allows you to power 4 CPU’s at half the clock rate, effectively doubling your calculation power. (There are even more tricks involved in modern CPU’s which crack up performance such that the clock rate isn’t a real measure for performance any longer, and sales people need to learn more new buzzword to sell your computer/laptop 👿 )

intelCoreCloneWhere in 2005 you bought a single CPU with a high clock rate, you now get a machine with multiple cores. Most machines you can get these days have a minimum of 2 cores, with quad-core machines becoming more and more common. But, there is always a but, even though you now have access to multiple times the processing power of 2005, this does not mean that your own code will be able to use it. Unfortunately there is no simple compiler switch which makes your code parallel (like the -m64 switch which makes your code 64-bit), you have to do this yourself (the free lunch is over). Two commonly used frameworks for this task are OpenMP and MPI. The former mainly focuses on shared memory configurations (laptops, desktops, single nodes in a cluster), while the latter focuses of large distributed memory setups (multi-node clusters) and is thus well-suited for creating codes that need to run on hundreds or even thousands of CPU’s. The two frameworks differ significantly in their complexity, fortunately for us, OpenMP is both the easier one, and the one most suited for a modern multi-core computer. The OpenMP framework consists of pragma’s (or directives) which can be added in an existing code as comment lines, and tell a compiler knowledgeable of OpenMP how to parallelize the code. (It is interesting to note that MPI and OpenMP are inteded for parallel programming in either C, C++ or fortran … a hint that what the important programming languages are.)

OpenMP in Fortran: Basics

A. Compiler-options and such

As most modern fortran compilers also are well aware of openMP (you can check which version of openMP is supported here), you generally will not need to install a new compiler to write parallel fortran code. You only need to add a single compiler flag: -fopenmp (gcc/gfortran), -openmp (intel compiler), or -mp (Portland Group). In Code::Blocks you will find this option under Settings > Compiler > Compiler Settings tab > Compiler Flags tab (If the option isn’t present try adding it to “other compiler options” and hope your compiler recognizes one of the flags). 

Secondly, you need to link in the OpenMP library. In Code::Blocks go to Settings > Compiler > Linker Settings tab > Link Libraries: add. Where you add the libgomp.dll.a library (generally found in the folder of your compiler…in case of 64 bit compilers, make sure you get the 64 bit version)

Finally, you may want to get access to OpenMP functions inside your code. This can be achieved by a use statement: use omp_lib.

B. Machine properties

OpenMP contains several functions which allow you to query and set several environment variables (check out these cheat-sheets for OpenMP v3.0 and v4.0).

  • omp_get_num_procs() : returns the number of processors your code sees (in hyper-threaded CPU’s this will be double of the actual number of processor cores).
  • omp_get_num_threads() : returns the number of threads available in a specific section of the code.
  • omp_set_num_threads(I): Sets the number of threads for the openMP parallel section to I
  • omp_get_thread_num() : returns the index of the specific thread you are in [0..I[

 

  1. subroutine OpenMPTest1()
  2.         use omp_lib;
  3.  
  4.         write(*,*) "Running OpenMP Test 1: Environment variables"
  5.         write(*,*) "Number of threads :",omp_get_num_threads()
  6.         write(*,*) "Number of CPU's available:",omp_get_num_procs()
  7.         call omp_set_num_threads(8) ! set the number of threads to 8
  8.         write(*,*) "#Threads outside the parallel section:",omp_get_num_threads()
  9.         !below we start a parallel section
  10.         !$OMP PARALLEL
  11.         write(*,*) "Number of threads in a parallel section :",omp_get_num_threads()
  12.         write(*,*) "Currently in thread with ID = ",omp_get_thread_num()
  13.         !$OMP END PARALLEL
  14.  
  15. end subroutine OpenMPTest1

 

Notice in the example code above that outside the parallel section indicated with the directives $OMP PARALLEL and $OMP END PARALLEL, the program only sees a single thread, while inside the parallel section 8 threads will run (independent of the number of cores available). 

C. Simple parallelization

The OpenMP frameworks consists of an set of directives which can be used to manage the parallelization of your code (cheat-sheets for OpenMP v3.0 and v4.0). I will not describe them in detail as there exists several very well written and full tutorials on the subject, we’ll just have a look at a quick and easy parallelization of a big for-loop. As said, OpenMP makes use of directives (or Pragma’s) which are placed as comments inside the code. As such they will not interfere with your code when it is compiled as a serial code (i.e. without the -fopenmp compiler flag). The directives are preceded by what is called a sentinel ( $OMP ). In the above example code, we already saw a first directive: PARALLEL. Only inside blocks delimited by this directive, can your code be parallel.

  1. subroutine OMPTest2()
  2.         use omp_lib;
  3.  
  4.         integer :: IDT, NT,nrx,nry,nrz
  5.         doubleprecision, allocatable :: A(:,:,:)
  6.         doubleprecision :: RD(1:1000)
  7.         doubleprecision :: startT, TTime, stt
  8.  
  9.         call random_seed()
  10.         call random_number(RD(1:1000))
  11.         IDT=500 ! we will make a 500x500x500 matrix
  12.         allocate(A(1:IDT,1:IDT,1:IDT))
  13.  
  14.         write(*,'(A)') "Number of preferred threads:"
  15.         read(*,*) NT
  16.         call omp_set_num_threads(NT)
  17.         startT=omp_get_wtime()
  18.         !$OMP PARALLEL PRIVATE(stt)
  19.         stt=omp_get_wtime()
  20.        
  21.         !$OMP DO
  22.         do nrz=1,IDT
  23.            do nry=1,IDT
  24.               do nrx=1,IDT
  25.               A(nrx,nry,nrz)=RD(modulo(nrx+nry+nrz,1000)+1)
  26.               end do
  27.            end do
  28.         end do
  29.         !$OMP END DO
  30.         write(*,*) "time=",(omp_get_wtime()-stt)/omp_get_wtick()," ticks for thread ",omp_get_thread_num()
  31.         !$OMP END PARALLEL
  32.         TTime=(omp_get_wtime()-startT)/omp_get_wtick()
  33.         write(*,*)" CPU-resources:",Ttime," ticks."
  34.  
  35.         deallocate(A)
  36.     end subroutine RunTest2

The program above fills up a large 3D array with random values taken from a predetermined list. The user is asked to set the number of threads (lines 14-16), and the function omp_get_wtime() is used to obtain the number of seconds since epoch, while the function omp_get_wtick() gives the number of seconds between ticks. These functions can be used to get some timing data for each thread, but also for the entire program. For each thread, the starting time is stored in the variable stt. To protect this variable of being overwritten by each separate thread, this variable is declared as private to the thread (line 18: PRIVATE(stt) ). As a result, each thread will have it’s own private copy of the stt variable.

The DO directive on line 21, tells the compiler that the following loop needs to be parallelized. Putting the !$OMP DO pragma around the outer do-loop has the advantage that it minimizes the overhead produced by the parallelization (i.e. resources required to make local copies of variables, calculating the distribution of the workload over the different threads at the start of the loop, and combining the results at the end of the loop).

As you can see, parallelizing a loop is rather simple. It takes only 4 additional comment lines (!$OMP PARALLEL , !$OMP DO, !$OMP END DO and !$OMP END PARALLEL) and some time figuring out which variables should be private for each thread, i.e. which are the variables that get updated during each cycle of a loop. Loop counters you can even ignore as these are by default considered private. In addition, the number of threads is set on another line giving us 5 new lines of code in total. It is of course possible to go much further, but this is the basis of what you generally need.

Unfortunately, the presented example is not that computationally demanding, so it will be hard to see the full effect of  the parallelization. Simply increasing the array size will not resolve this as you will quickly run out of memory. Only with more complex operations in the loop will you clearly see the parallelization. An example of a more complex piece of code is given below (it is part of the phonon-subroutine in HIVE):

  1. !setup work space for lapack
  2.         N = this%DimDynMat
  3.         LWORK = 2*N - 1
  4.         call omp_set_num_threads(this%nthreads)
  5.         chunk=(this%nkz)/(this%nthreads*5)
  6.         chunk=max(chunk,1)
  7.         !$OMP PARALLEL PRIVATE(WORK, RWORK, DM, W, RPart,IO)
  8.         allocate(DM(N,N))
  9.         allocate( WORK(2*LWORK), RWORK(3*N-2), W(N) )
  10.         !the write statement only needs to be done by a single thread, and the other threads do not need to wait for it
  11.         !$OMP SINGLE
  12.         write(uni,'(A,I0,A)') " Loop over all ",this%nkpt," q-points."
  13.         !$OMP END SINGLE NOWAIT
  14.         !we have to loop over all q-points
  15.         !$OMP DO SCHEDULE(DYNAMIC,chunk)
  16.         do nrz=1,this%nkz
  17.             do nry=1,this%nky
  18.                 do nrx=1,this%nkx
  19.                     if (this%kpointListBZ1(nrx,nry,nrz)) then
  20.                         !do nrk=1,this%nkpt
  21.                         WORK = 0.0_R_double
  22.                         RWORK = 0.0_R_double                                                
  23.                         DM(1:this%DimDynMat,1:this%DimDynMat)=this%dynmatFIpart(1:this%DimDynMat,1:this%DimDynMat) ! make a local copy
  24.                         do nri=1,this%poscar%nrions
  25.                             do nrj=1,this%poscar%nrions
  26.                                 Rpart=cmplx(0.0_R_double,0.0_R_double)
  27.                                 do ns=this%vilst(1,nri,nrj),this%vilst(2,nri,nrj)
  28.                                     Rpart=Rpart + exp(i*(dot_product(this%rvlst(1:3,ns),this%kpointList(:,nrx,nry,nrz))))
  29.                                 end do
  30.                                 Rpart=Rpart/this%mult(nri,nrj)
  31.                                 DM(((nri-1)*3)+1:((nri-1)*3)+3,((nrj-1)*3)+1:((nrj-1)*3)+3) = &
  32.                                     & DM(((nri-1)*3)+1:((nri-1)*3)+3,((nrj-1)*3)+1:((nrj-1)*3)+3)*Rpart
  33.                             end do
  34.                         end do
  35.                         call MatrixHermitianize(DM,IOS=IO)
  36.                         call ZHEEV( 'V', 'U', N, DM, N, W, WORK, LWORK, RWORK, IO )
  37.                         this%FullPhonFreqList(:,nrx,nry,nrz)=sign(sqrt(abs(W)),W)*fac
  38.                     end if
  39.                 end do
  40.             end do
  41.         end do
  42.         !$OMP END DO
  43.         !$OMP SINGLE
  44.         write(uni,'(A)') " Freeing lapack workspace."
  45.         !$OMP END SINGLE NOWAIT
  46.         deallocate( WORK, RWORK,DM,W )
  47.         !$OP END PARALLEL

In the above code, a set of equations is solved using the LAPACK eigenvalue solver ZHEEV to obtain the energies of the phonon-modes in each point of the Brillouin zone. As the calculation of the eigenvalue spectrum for each point is independent of all other points, this is extremely well-suited for parallelization, so we can add !$OMP PARALLEL and !$OMP END PARALLEL on lines 7 and 47. Inside this parallel section there are several variables which are recycled for every grid point, so we will make them PRIVATE (cf. line 7, most of them are work-arrays for the ZHEEV subroutine).

Lines 12 and 44 both contain a write-statement. Without further action, each thread will perform this write action, and we’ll end up with multiple copies of the same line (Although this will not break your code it will look very sloppy to any user of the code). To circumvent this problem we make use of the !$OMP SINGLE directive. This directive makes sure only 1 thread (the first to arrive) will perform the write action. Unfortunately, the SINGLE block will create an implicit barrier at which all other threads will wait. To prevent this from happening, the NOWAIT clause is added at the end of the block. In this specific case, the NOWAIT clause will have only very limited impact due to the location of the write-statements. But this need not always to be the case.

On line 15 the !$OMP DO pragma indicates a loop will follow that should be parallelized. Again we choose for the outer loop, as to reduce the overhead due to the parallelization procedure. We also tell the compiler how the work should be distributed using the SCHEDULE(TYPE,CHUNK) clause. There are three types of scheduling:

  1. STATIC: which is best suited for homogeneous workloads. The loop is split in equal pieces (size given by the optional parameter CHUNK, else equal pieces with size=total size/#threads)
  2. DYNAMIC: which is better suited if the workload is not homogeneous.(in this case the central if-clause on line 19 complicates things). CHUNK can again be used to define the sizes of the workload blocks.
  3. GUIDED: which is a bit like dynamic but with decreasing block-sizes.

From this real life example, it is again clear that OpenMP parallelization in fortran can be very simple.

D. Speedup?

On my loyal sidekick (with hyper-threaded quad-core core i7) I was able to get following speedups for the phonon-code (the run was limited to performing only a phonon-DOS calculation):

speedup due to openMP parallelization

Speedup of the entire phonon-subroutine due to parallelization of the main-phonon-DOS loop.

The above graph shows the speed-up results for the two different modes for calculating the phonon-DOS. The reduced mode (DM red), uses a spectrum reduced to that of a unit-cell, but needs a much denser sampling of the Brillouin zone (second approach), and is shown by the black line. The serial calculation in this specific case only took 96 seconds, and the maximum speedup obtained was about x1.84. The red and green curves give the speedup of the calculation mode which makes use of the super-cell spectrum (DM nored, i.e. much larger matrix to solve), and shows for increasing grid sizes a maximum speedup of x2.74 (serial time: 45 seconds) and x3.43 (serial time 395 seconds) respectively. The reason none of the setups reaches a speedup of 4 (or more) is twofold:

  1. Amdahl’s law puts an upper limit to the global speedup of a calculation by taking into account that only part of the code is parallelized (e.g. write sections to a single file can not be parallelized.)
  2. There needs to be sufficient blocks of work for all threads (indicated by nkz in the plot)

In case of the DM nored calculations, the parallelized loop clearly takes the biggest part of the calculation-time, while for the DM red calculation, also the section generating the q-point grid takes a large fraction of the calculation time limiting the effect of parallelization. An improvement here would be to also parallelize  the subroutine generating the grid, but that will be for future work. For now, the expensive DM nored calculations show an acceptable speedup.

 

I have a Question: about thermal expansion

“I have a question”(ik heb een vraag). This is the name of a Belgian (Flemisch) website aimed at bringing Flemisch scientists and the general public together through scientific or science related questions. The basic idea is rather simple. Someone has a scientific question and poses it on this website, and a scientist will provide an answer. It is an excellent opportunity for the latter to hone his/her own science communication skills (and do some outreach) and for the former to get an good answer to his/her question.

All questions and answers are collected in a searchable database, which currently contains about fifteen thousand questions answered by a (growing) group of nearly one thousand scientists. This is rather impressive for a region of about 6.5 Million people. I recently joined the group of scientists providing answers.

An interesting materials-related question was posed by Denis (my translation of his question and context):

What is the relation between the density of a material and its thermal expansion?

I was wondering if there exists a relation between the density of a material and the thermal expansion (at the same temperature)? In general, gasses expand more than solids, so can I extend this to the following: Materials with a small density will expand more because the particles are separated more and thus experience a small cohesive force. If this statement is true, then this would imply that a volume of alcohol should expand more than the same volume of air, which I think is puzzling. Can you explain this to me?

Answer (a bit more expanded than the Dutch one):

Unfortunately there exists no simple relation between the density of a material and its thermal expansion coefficient.

Let us first correct something in the example given: the density of alcohol (or ethanol) is 46.07 g/mol (methanol would be 32.04 g/mol) which is significantly more than the density of air which is 28.96 g/mol. So following the suggested assumption, air should expand more. If we look at liquids, it is better to compare ethanol (0.789 g/cm3) to compare water (1 g/cm3) as liquid air (0.87 g/cm3) needs to be cooled below  -196 °C (77K). The thermal expansion coefficients of wtare and ethanol are 207×10-6/°C and 750×10-6/°C, respectively. So in this case, we see that alcohol will expand more than water (at 20°C). Supporting Denis’ statement.

Unfortunately, these are just two simple materials at a very specific temperature for which this statement is true. In reality, there are many interesting aspects complicating life. A few things to keep in mind are:

  • A gas (in contrast to a liquid or solid) has no own boundary. So if you do not put it in any type of a container, then it will just keep expanding. The change in volume observed when a gas is heated is due to an increase in pressure (the higher kinetic energy of the gas molecules makes them bounce harder of the walls of your container, which can make a piston move or a balloon grow). In a liquid or a solid on the other hand, the expansion is rather a stretching of the material itself.
  • Furthermore, the density does not play a role at all, in case of the expansion of an ideal gas, since p*V=n*R*T. From this it follows that 1 mole of H2 gas, at 20°C and a pressure of 1 atmosphere, has the exact same volume as 1 mole of O2 gas, at 20°C and a pressure of 1 atmosphere, even though the latter has a density which is 16 times higher.
  • There are quite a lot of materials which show a negative thermal expansion in a certain temperature region (i.e. they shrink when you increase the temperature). One well-known example is water. The density of liquid water at 0 °C is lower than that of water at 4 °C. This is the reason why there remains some liquid water at the bottom of a pond when it is frozen over.
  • There are also materials which show “breathing” behavior (this are reversible volume changes in solids which made the originators of the term think of human breathing: inhaling expands our lungs and chest, while exhaling contracts it again.) One specific class of these materials are breathing Metal-Organic Frameworks (MOFs). Some of these look like wine-racks (see figure here) which can open and close due to temperature variations. These volume variations can be 50% or more! 😯

The way a material expands due to temperature variations is a rather complex combination of different aspects. It depends on how thermal vibrations (or phonons) propagate through the material, but also on the possible presence of phase-transitions. In some materials there are even phase-transitions between solid phases with a different crystal structure. These, just like solid/liquid phase transitions can lead to very sudden jumps in volume during heating or cooling. These different crystal phases can also have very different physical properties. During the middle-ages, tin pest was a large source of worries for organ-builders. At a temperature below 13°C β-tin is more stable α-tin, which is what was used in organ pipes. However, the high activation energy prevents the phase-transformation from α-tin to β-tin to happen too readily. At temperatures of -30 °C and lower this barrier is more easily overcome.This phase-transition gives rise to a volume reduction of 27%. In addition, β-tin is also a brittle material, which easily disintegrates. During the middle ages this lead to the rapid deterioration and collapse of organ-pipes in church organs during strong winters. It is also said to have caused the buttons of the clothing of Napoleon’s troops to disintegrate during his Russian campaign. As a result, the troops’ clothing fell apart during the cold Russian winter, letting many of them freeze to death.

 

 

Folding Phonons

Game-diamondsAbout a year ago, I discussed the possibility of calculating phonons (the collective vibration of atoms) in the entire Brillouin zone for Metal-Organic Frameworks. Now, one year later, I return to this topic, but this time the subject matter is diamond. In contrast to Metal-Organic Frameworks, the unit-cell of diamond is very small (only 2 atoms). Because a phonon spectrum is calculated through the gradients of forces felt by one atom due to all other atoms, it is clear that within one diamond unit-cell these forces will not be converged. As such, a supercell will be needed to make sure the contribution, due to the most distant atoms, to the experienced forces, are negligible.

Using such a supercell has the unfortunate drawback that the dynamical matrix (which is 3N \times 3N, for N atoms) explodes in size, and, more importantly, that the number of eigenvalues, or phonon-frequencies also increases (3N) where we only want to have 6 frequencies ( 3 \times 2 atoms) for diamond. For an M \times M \times M supercell we end up with 24M^3 -6  additional phonon bands which are the result of band-folding. Or put differently, 24M^3 -6 phonon bands coming from the other unit-cells in the supercell. This is not a problem when calculating the phonon density of states. It is, however, a problem when one is interested in the phonon band structure.

The phonon spectrum at a specific q-point in the first Brillouin zone is given by the square root of the eigenvalues of the dynamical matrix of the system. For simplicity, we first assume a finite system of n atoms (a molecule). In that case, the first Brillouin zone is reduced to a single point q=(0,0,0) and the dynamical matrix looks more or less like the hessian:

With \varphi (N_a,N_b) = [\varphi_{i,j}(N_a , N_b)] 3 \times 3 matrices \varphi_{i,j}(N_a,N_b)=\frac{\partial^2\varphi}{\partial x_i(N_a) \partial x_j(N_b)} = - \frac{\partial F_i (N_a)}{\partial x_j (N_b)}  with i, j = x, y, z. Or in words, \varphi_{i,j}(N_a , N_b) represents the derivative of the force felt by atom N_a due to the displacement of atom N_b. Due to Newton’s second law, the dynamical matrix is expected to be symmetric.

When the system under study is no longer a molecule or a finite cluster, but an infinite solid, things get a bit more complicated. For such a solid, we only consider the symmetry in-equivalent atoms (in practice this is often a unit-cell). Because the first Brillouin zone is no longer a single point, one needs to sample multiple different points to get the phonon density-of-states. The role of the q-point is introduced in the dynamical matrix through a factor e^{iq \cdot (r_{N_a} - r_{N_b}) }, creating a dynamical matrix for a single unit-cell containing n atoms:

Because a real solid contains more than a single unit-cell, one should also take into account the interactions of the atoms of one unit-cell with those of all other unit-cells in the system, and as such the dynamical matrix becomes a sum of matrices like the one above:

Where the sum runs over all unit-cells in the system, and Ni indicates an atom in a specific reference unit-cell, and MRi  an atom in the Rth unit-cells, for which we give index 1 to the reference unit-cell. As the forces decay with the distance between the atoms, the infinite sum can be truncated. For a Metal-Organic Framework a unit-cell will quite often suffice. For diamond, however, a larger cell is needed.

An interesting aspect to the dynamical matrix above is that all matrix-elements for a sum over n unit-cells are also present in a single dynamical matrix for a supercell containing these n unit-cells. It becomes even more interesting if one notices that due to translational symmetry one does not need to calculate all elements of the entire supercell dynamical matrix to construct the full supercell dynamical matrix.

Assume a 2D 2×2 supercell with only a single atom present, which we represent as in the figure on the right. A single periodic copy of the supercell is added in each direction. The dynamical matrix for the supercell can now be constructed as follows: Calculate the elements of the first column (i.e. the gradient of the force felt by the atom in the reference unit-cell, in black, due to the atoms in each of the unit-cells in the supercell). Due to Newton’s third law (action = reaction), this first column and row will have the same elements (middle panel).

Translational symmetry on the other hand will allow us to determine all other elements. The most simple are the diagonal elements, which represent the self-interaction (so all are black squares). The other you can just as easily determine by looking at the schematic representation of the supercell under periodic boundary conditions. For example, to find the derivative of the force on the second cell (=second column, green square in supercell) due to the third cell (third row, blue square in supercell), we look at the square in the same relative position of the blue square to the green square, when starting from the black square: which is the red square (If you read this a couple of times it will start to make sense). Like this, the dynamical matrix of the entire supercell can be constructed.

This final supercell dynamical matrix can, with the same ease, be folded back into the sum of unit-cell dynamical matrices (it becomes an extended lookup-table). The resulting unit-cell dynamical matrix can then be used to create a band structure, which in my case was nicely converged for a 4x4x4 supercell. The bandstructure along high symmetry lines is shown below, but remember that these are actually 3D surfaces. A nice video of the evolution of the first acoustic band (i.e. lowest band) as function of its energy can be found here.

The phonon density of states can also be obtained in two ways, which should, in contrast to the band structure, give the exact same result: (for an M \times M \times M supercell with n atoms per unit-cell)

  1. Generate the density of states for the supercell and corresponding Brillouin zone. This has the advantage that the smaller Brillouin zone can be sampled with fewer q-points, as each q-point acts as M3 q-points in a unit-cell-approach. The drawback here is the fact that for each q-point a (3nM3)x(3nM3) dynamical matrix needs to be solved. This solution scales approximately as O(N3) ~ (3nM3)3 =(3n)3M9. Using linear algebra packages such as LAPACK, this may be done slightly more efficient (but you will not get O(N2) for example).
  2. Generate the density of states for the unit-cell and corresponding Brillouin zone. In this approach, the dynamical matrix to solve is more complex to construct (due to the sum which needs to be taken) but much smaller: 3nx3n. However to get the same q-point density, you will need to calculate M3 times as many q-points as for the supercell.

In the end, the choice will be based on whether you are limited by the accessible memory (when running a 32-bit application, the number of q-point will be detrimental) or CPU-time (solving the dynamical matrix quickly becomes very expensive).

 

To x64 or not to x64: Installing a 64-bit fortran compiler

Current day computers generally have 64-bit processors, and most even have 64-bit operating systems. On such systems, 32-bit programs will run fine, but 64-bit programs can make more efficient use of the underlying system. When we installed a fortran compiler and the code::blocks IDE, the default fortran compiler generated 32-bit programs. This generally is not an issue, unless you need a large amount of memory, for example to store a temporary array with 4003 double precision coordinates (as I did for a project I’m currently working on). You may first start to look for ways of increasing the stack-size of your program, but you will soon discover that the problem is more profound: a 32-bit program cannot access address spacing beyond 4Gb. (In practice, generally you will not even reach 4Gb before running into problems.) This is because the memory address of your data is stored as a 32-bit value (232 = 4 294 967 296 = 4Gb) so the only way out of this predicament is a “larger address” aka 64-bit. So you need to install a new compiler capable of providing 64-bit programs.

  1. Installing minGW64 for code::blocks
    1. Installing the compiler
    2. Setting the PATH-variable (win10)
    3. Adding the compiler to code::blocks
  2. Upgrading Lapack to 64-bit

Continue reading

One more digit of importance

Over the past few weeks I have bumped into several issues each tracing back to numerical accuracy. Although I have been  programming for almost two decades I never had to worry much about this, making these events seem as-if the universe is trying to tell me something.

Now, let me try to give a proper start to this story; Computational (materials) research is generally perceived as a subset of theoretical (materials) research, and it is true that such a case can be made. It is, however, also true that such thinking can trap us (i.e. the average computational physicist/chemist/mathematician/… programming his/her own code) with numerical accuracy problems. While theoretical equations use exact values for numbers, a computer program is limited by the numerical precision of the variables (e.g. single, double or quadruple precision for real numbers) used in the program. This means that actual numbers with a larger precision are truncated or rounded to the precision of the variable (e.g. 1/3 becomes 0.3333333 instead of 0.333… with an infinite series of 3’s). Most of the time, this is sufficient, and nothing strange will happen. Even more, most of the time, the additional digits would only increase the computational cost while not improving the results in a significant fashion.

Interstellar disc

To understand the importance, or the lack thereof, of additional significant digits, let us first have a look at the precision of \pi and the circumference and surface area of a disc. We will be looking at a rather large disc, one with a radius equal to the distance between the sun, and the nearest star, Alpha Centauri, which is 39 900 000 000 000 km away. The circumference of this disc is given by  2r\pi (or  2.5 \times 10^{14} km ). As a single precision variable \pi will have about 7-8 significant digits. This means the calculated circumference will have an accuracy of about 1 000 000 km (or a few times the distance between the earth and the moon). Using a double precision \pi variable, which has a precision of 16 decimal digits, the circumference will be accurately calculated to within a few meters. At quadrupal precision, the \pi variable would have 34 significant decimal digits, and we would even be able to calculate the surface of the disc ( r^2\pi or  5.0 \times 10^{33} m² ) to within 1 m². Even the surface of a disc the size of our milky way could be calculated with an accuracy of a few hundred square km (or ± the size of Belgium ).

Knowing this, our mind is quickly put at easy regarding possible issues regarding numerical accuracy. However, once in a while we run into one exceptional case (or three, in my case).

1. Infinitesimal finite elements

Temperature profile in the insulating layer of a cylindrical wire.

Temperature profile in the insulating layer of a cylindrical wire.

While looking into the theory behind finite elements, I had some fun implementing a simple program which calculated the temperature distribution due to heat transport in an insulating layer. The finite element approach performed rather nicely, leading to good approximate results, already for a few dozen elements. However, I wanted to push the implementation a bit (the limit of infinite elements should give the exact solution). Since the set of equations was solved by a LAPACK subroutine, using 10 000 elements instead of 10 barely impacted the required time (writing the results took most of 2-3 seconds anyway). The results on the other hand were quite funny as you can see in the picture. The initial implementation, with single precision variables, breaks down even worse already at 1000 elements. Apparently the elements had become too small leading to too small variations of the properties in the stiffness-matrix, resulting in the LAPACK subroutine returning nonsense.

So it turns out that you can have too many elements in a finite elements method.

2. Small volumes: A few more digits please

Optimized volume in Equation of State fit, as function of the range of the fitting data, and step size between data-points. green diamonds, blue triangles and black discs: 1% , 0.5% and 0.25% volume steps respectively.

Optimized volume in Equation of State fit, as function of the range of the fitting data, and step size between data-points. green diamonds, blue triangles and black discs: 1% , 0.5% and 0.25% volume steps respectively.

Recently, I started working at the Wide Band Gap Materials group at the University of Hasselt. So in addition to MOFs I am also working on diamond based materials. While setting up a series of reference calculations, using scripts which already suited me well during my work on MOFs, I was trying to figure out for which volume range, and step size I would get a sufficient convergence in my Equation-of-States Fitting procedure. For the MOFs this is a computationally rather expensive (and tedious) exercise, which, fortunately, gives clear results. For the 2-atom diamond unit cell the calculations are ridiculously fast (in comparison), but the results were confusing. As you can see in the picture, the values I obtained from the different fits seem to oscillate. Checking my E(V) data showed nothing out of the ordinary. All energies and volumes were clearly distinguishable, with the energies given with a precision of 0.001 meV, and the volumes with a precision of 0.01 Å3. However, as you can see in the figure, the volume-oscillations are of the order of 0.001 Å3, ten times smaller than our input precision. Calculating the volumes based on the lattice parameters to get a precision of 10-6 Å3 for the input volumes stabilizes the convergence behavior of the fits (open symbols in the figure). This problem was not present with the MOFs since these have a unit cell volume which is one hundred times larger, so a precision of 0.01 Åmakes the relative error on the volumes one hundred times smaller than was the case for diamond.

In essence, I was trying to get more accurate output than the input I provided, which will never give sensible results (even if they actually look sensible).

3. Many grains of sand really start to pile up after a while

The last one is a bit embarrassing as it lead to a bug in the HIVE-toolbox, which is fixed in the mean time.

One of the HIVE-toolbox users informed me that the dosgrabber routine had crashed because it could not find the Fermi-level in the output of a VASP calculation. Although VASP itself gives a value for the Fermi-level, I do not use it in the above sub-program, since this value tend to be incorrect for spin-polarized systems with different minority and majority spins. However, in an attempt to be smart (and efficient) I ended up in trouble. The basic idea behind my Fermi-level search is just running through the entire Density of States-spectrum until you have counted for all the electrons in the system. Because the VASP estimate for the Fermi-level is not that far of, you do not need to run through the entire list of several thousand entries, but you could just take a subset-centered around the estimated Fermi-level and check in that subset, speeding this up by a factor of 10 to 100. Unfortunately I calculated the energy step size between density of states entries as the difference between the first two entries, which are given to with an accuracy of 0.001 eV. I guess you already have a feeling what will be the problem. When the index of the estimated Fermi-level is 1000, the error will be of the order of 1 eV, which is much larger than the range I took into account. Fortunately, the problem is easily solved by calculating the energy step size as the difference between the first and last index, and divide by the number of steps, making the error in the particular case more than a thousand times smaller.

So, trying to be smart, you always need to make sure you really are being smart, and remember that small number can become very big when there are a lot of them. 

Annual Meeting of the Belgian Physical Society 2016

ConferenceLogoWebsite_1

Wednesday May 18th was a good day for our little family. Since my girlfriend an I both are physicists by training, we attended the annual meeting of the Belgian Physical Society in Ghent, together. What made this event even more special was the fact that both of us had an oral presentation at the same conference, which never happened before. 🙂

Sylvia talked about an example of indeterminism in Newtonian mechanics, and showed how the indeterminism can be clarified by using non-standard analysis. The example considers the Norton Dome, a hill with a specifically designed shape ( y(x)=-2/3(1-(1-3/2|x|)^{2/3})^{3/2} ). When considering a point mass, experiencing only gravitational force, there are two solutions for the equation of motion: (1) the mass is there, and remains there forever (r(t)=0) and (2) the mass was rolling uphill with a non-zero speed which becomes exactly zero at the top, and continues over the top (  r(t)=\frac{1}{144} (t-T)^4 with T the time the top is reached). Here, r refers to the arc length as measured along the dome (0 at the top). In addition, there also exists a family of solutions taking the first solution at t<T, while taking the second solution at t>T. (As the first and second derivatives of these latter solutions are continuous, Newton will not complain.) This leads to indeterminism in a Newtonian system; for instance, you start with a mass on the top of the hill, and at a random point in time it starts to roll off without the presence of an external something putting it into motion. Using infinitesimals, Sylvia shows that the probability for the mass to start rolling off the dome immediately is infinitesimally close to one.

My own talk was on the use of computational materials science as a means for understanding and explaining experimental observations. I presented results on the pressure-induced breathing of the MIL-47(V) MOF, showing how the experimentally observed S-shape of the transition-pressure-curve can be explained by the spin interactions of the unpaired vanadium-d electrons: it turns out that regions with only ferromagnetic chains compress already at 85 MPa, while the addition of higher and higher percentages of anti-ferromagnetic chains increases the pressure at which the pores collapse, up to 125 MPa for the regions containing 100% anti-ferromagnetic chains. As a second topic, I showed how the electronic band structure of the linker-functionalized UiO-66(Zr) MOF changes. When one or two -OH or -SH groups are added to the benzene ring of the linker, part of the valence band is split off and moves into the band gap. In semiconductors, this would be called a gap state; however, in this case, since every linker in the material contributes

Belgian Physical Society Meeting 2016

Top left: I am presenting computational results on MOFs. Top Right: Sylvia presents the Norton Dome. Bottom: Group picture at the central garden in “Het Pand”. (Photos: courtesy of Sylvia Wenmackers (TL), Philippe Smet (TR), and Michael Tytgat (B) )

a single electron state to this gap state, it practically becomes the valence band top. As a consequence, the color of such functionalized MOF’s changes from white to yellow and orange. As a third topic, I discussed the COK-69(Ti) MOF. In this MOF the electrons in the titaniumoxide clusters are strongly correlated, just as for pure titaniumoxide. Because such systems are poorly described with standard DFT, we used the DFT+U approach, which allowed us to discern between Ti3+ and Ti4+ ions. The latter was practically done by partitioning the electron density using the Hirshfeld-I scheme.

Next to our own talks, the BPS-meeting started with two very interesting plenary lectures on the two big machines/facilities of the physics community: ITER (fusion reactor under construction) and LHC (circular collider, under constant upgrade) at CERN. Prof. Jean Jacquinot, presented the progress in fusion research (among which simulations of plasma-instabilities) and the actual building progress of the ITER facility. Prof. Sergio Bertolucci on the other hand informed us on the latest results obtained with the LHC at CERN, but also about future plans (Future Circular Collider, with a circumference of about 100 km!!). He also showed us the amount of data involved in running the CERN experiments, puting them into perspective: LHC produced in 2012 about 15 Petabyte of data per year (15.000 Terabyte) which is the same as the mount of data added to Youtube on yearly basis. At that time the ATLAS experiment had a dataset of 140 Petabyte (compare to the 100 Petabyte of google’s search index or the 180 Petabyte of facebook uploads/year). The presenters, both excellent and enthusiastic speakers, reminded us that these projects thrive on the enthusiasm of young researchers with open minds. But they also noted, something that is rather often forgotten, that it is the journey not the goal which is most important. Of course, ITER is the next step on the road to commercial fusion power, but along the way much more is learned as a result of tackling practical problems. This is even more so for the CERN experiments, where the “goal” is not as related to our daily lives (keeping the lights on) but focuses on understanding the world. This is at the core of what it means to be a physicist: the need and drive to understand the world. This is also what should drive research but becomes increasingly hampered by the funding-question: how/what profit will it make in the “real world”. Remember the transistor which makes your computer and smartphone as powerful as they are, the laser in CD/DVD-players, the internet allowing you to read this post, and so many more.

Following these plenary presentations, four young scientists competed for the young speaker award presenting their PhD research. Two presentations (1),(2) focused on vortices in superconductors, a third one discussed the use of plasmons in graphene nanoribbons to enhance telecommunication while the fourth talk introduced us into the world of string theory.

In the afternoon, there were six parallel session, of which I mainly attended the Condensed Matter and Nanostructure Physics-session (since I had my own talk there) and the Biological, Medical, Statistical and Mathematical Physics-session rooting for Sylvia. During the Condensed matter session I was mainly fascinated by the presentation of Prof. Sara Bals, on coloring atoms in 3 dimensions. She showed how, using energy-dispersive X-ray (EDX) mapping it is possible to create a 3D atomic lattice of nano-materials and clusters. This is a more direct approach than the usual X-ray diffraction (XRD) approach for identifying a crystal structure. Unfortunately, I am afraid this technique may not be well suited for the MOFs I’m working on, since they contain mainly light elements and not heavy metals(although it may be interesting to try once the technique is optimized further). It is, however, definitely a technique to remember for future projects, to suggest to experimental collaborators.

Links:

Call for Abstracts: Condensed Matter Science in Porous Frameworks: On Zeolites, Metal- and Covalent-Organic Frameworks

Flyer for the Colloquium on Porous Frameworks at the CMD26Together with Ionut Tranca (TU Eindhoven, The Netherlands) and Bartłomiej Szyja (Wrocław University of Technology, Poland) I am organizing a colloquium “Condensed Matter Science in Porous Frameworks: On Zeolites, Metal- and Covalent-Organic Frameworks” which will take place during the 26th biannual Conference & Exhibition CMD26 – Condensed Matter in Groningen (September 4th – 9th, 2016). During our colloquium, we hope to bring together experimental and theoretical researchers working in the field of porous frameworks, providing them the opportunity to present and discuss their latest work and discoveries.

Zeolites, Metal-Organic Frameworks, and Covalent-Organic Frameworks are an interesting class of hybrid materials. They are situated at the boundary of research fields, with properties akin to both molecules and solids. In addition, their porosity puts them at the boundary between surfaces and bulk materials, while their modular nature provides a wealthy playground for materials design.

We invite you to submit your abstract for oral or poster contributions to our colloquium. Poster contributions participate in a Best Poster Prize competition.

The deadline for abstract submission is April 30th, 2016.

The extended deadline for abstract submission is May 14th, 2016.

 

CMD26 – Condensed Matter in Groningen is an international conference, organized by the Condensed Matter Division of the European Physical Society, covering all aspects of condensed matter physics, including soft condensed matter, biophysics, materials science, quantum physics and quantum simulators, low temperature physics, quantum fluids, strongly correlated materials, semiconductor physics, magnetism, surface and interface physics, electronic, optical and structural properties of materials. The scientific programme will consist of a series of plenary and semi-plenary talks and Mini-colloquia. Within each Mini-colloquium, there will be invited lectures, oral contributions and posters.

 

Feel free to distribute this call for abstracts and our flyer and we hope to see you in Groningen!

Animated Primes

Optimus Prime

Optimus Prime, the most prime of all autobots.[source]

2, 3, 5, 7, 11, 13, 17, 19, 23, …and on toward infinity. Prime numbers are a part of mathematics which seduce most of us at some point in our lives to try and “solve them”. There is structure in them, but too little to be easily understandable, and too much to let go and consider them chaos.

Recently, my girlfriend got sucked into this madness. Luckily she is not trying to generate the next largest prime (which would need to have more than 22 Million digits, although I might have jinxed it now). Instead she is looking for order into chaos; leading to pen and paper graphs, evolving into excel worksheets being abused as ordinary bitmap pictures and finally mathematica applications for even more heavy lifting.

These last exercises came to a somewhat grinding halt when mathematica refused to generate a gif-animation (it just failed to do the job without providing any warning that something might have gone awry). However, because it could generate the separate frames without any issues, I was tempted to dig up an old fortran module which allowed me to generate gif images and animations. I remembered correctly that it was able to both read and write gif-images, and so, after 10 minutes of writing a small program (9 minutes remembering how things worked) and another 10 linking in all dependencies and fixing some inconsistencies, I had a working program for generating a gif animation from separate gif images.

Animated gif, generated in fortran.

The initial test went smoothly. Using some simple gif images I created in paint, the resulting 4 frame animation is the one on the left (which is probably giving you a headache as you read this ;-).

The real test, with the single frame gifs Sylvia had prepared, was rather interesting: It failed miserably. Fortunately it also taught us the origin of the problem: mathematica was generating frames of different dimensions. And although this does not give rise to problems in the workbook or generated avi, it does kill an animated gif. After fixing this, also mathematica successfully generated the animated gif Sylvia wanted to see. Also my own small program was now able to perform the intended trick, although it showed that a different color-map was created for each frame by mathematica, something my own implementation did not handle well (we’ll fix that somewhere in the future).

The executable (including prerequisite dll’s) can be downloaded here.

If you do not suffer from epileptic seizures (or not yet), you could check out the results in the spoiler below.

Spoiler Inside: gif-animations SelectShow

SBDD XXI

SBDD XXI logoToday was the first day of the three-day long diamond conference at the university of Hasselt. And although this sounds as-if it is a mere small-scale local conference, it is actually one of the two main international conferences in the field. The Surface and Bulk Defects in Diamond (SBDD) workshop grew in twenty years from a small event with only a few dozen participants to the current event with over 200 participants. As such, it is the place to be, for one as me, who is dipping into a new field of materials.

One thing that already became quite clear today, is the fact that there are many opportunities in this field for the computational materials scientist, as the large majority of the researchers are experimentalists. Of the >120 posters presented, I have only discovered about 5 theoretical ones. Having had very nice chats with their presenters I already learned a lot of what I will have to keep in mind when studying diamond. But so far, I have not come across any issues that are impossible to resolve, which is good news :-).

New sidekick

“One ring to rule them all, one ring to find them, one ring to bring them all, and in the darkness bind them” – J.R.R. Tolkien

“One ring to rule them all, one ring to find them, one ring to bring them all, and in the darkness bind them” – J.R.R. Tolkien

With the start of my new chapter, I also decided to finally buy a new laptop. It’s intended to replace the machine I used at work for the past six years, my desktop at home and my current little netbook. Of these the latter has been my true sidekick for the better part of the last decade. Although nearly all my calculations have been performed on various supercomputers in Belgium and The Netherlands, my little Asus Eee-pc 1000H, being one of the last vestiges of windows XP, was the one which I used to write nearly all of my publications, a PhD in Physics and in Chemistry, my FWO project proposal on MOFs, developed most of the HIVE toolbox, wrote the better part of this blog and website, developed and tested the agent-tutorials, and much more. In short, I did (and still do) protect it with my life.

With the unfortunate and in the end terminal shutdown of WIn XP (i.e. some people discovered how to detect the OS and link this to a kill event…just because XP no longer receives updates, and therefore suddenly becomes as leaky as a sieve 🙄 ) a new OS was needed. In addition, I also wanted some more resources for development (and just running a browser…It feels like programming skills are quickly deteriorating these days if you see how memory intensive even trivial applications are), so the new OS ended up being a new computer altogether. After some searching, I found the laptop which I hope to be using for the next decade: an Acer Predator 15. It has a nice intel skylake cpu (dual-threaded quad-core with 6Mb L3 cache, the 8Mb version was not to be found with the 32Gb of RAM I was looking for). The contrast with my netbook could not be greater, so I’ll be checking and comparing performance of my HIVE-code on this new machine to that on my old Asus netbook (dual-threaded atom cpu, with 1Gb of RAM) . This should give some interesting results.

Not having installed a new windows OS since 2007, windows 10 came as quite a shock (and the aftershocks are still coming.) It seems like privacy is a thing of the past. Anything you say, watch, type, draw is by default send to the home office, and distributed to third parties which may be interested in providing you “user specific content” (i.e. commercials 😥 ) This is in stark contrast to the days where people often installed several antivirus programs and firewalls to protect their data from hackers…now we just seem to throw that same information out for grabs. Then to say there are people who believe we are on the verge of the end of capitalism. I’m afraid they either are misreading the signs, or some are reading the signs very accurately and are collecting the new gold before it has been declared gold. I feel like I’m getting old, or just old-school.

One of the advantages of having lived through those early years where fear of internet-hackers and piracy were defaults, is that we also learned to get around all the same protections. Maybe you remember the irony of having to rip a CD to be able to simply listen to it because of all protection-software (and hardware when you rented it at the local library)? It seems those days are back. While installing some old games, I ran into a piracy-protection software (which is no longer supported due to security leaks) which blocked both playing and even installing the game. So you end up either buying the game through steam (as many will suggest you…why should I do that if I bought the game already…twice?) or installing it on an older machine, just brute-force copying the entire installed game, copying/installing missing dll’s and finding a no-cd crack (Yes, it is claimed as illegal, but since most machines these days come without optical drives I beg to differ), or figure out another way to play old games. In the end, you start with a feeling of victory before even playing a new round of Civilizations 4…which is nice and sad at the same time (it should not have been necessary).

Another interesting experience was the transfer of my emails from my UGent address to the UHasselt one. Mail-clients like Outlook and Thunderbird seem not that well adapted to easily handle such exercises (missing folders and emails upon copy-actions, which needed to be fixed manually), especially if you have a very extensive folder-structure. The most nasty problem I encountered when setting up accounts was the fact that the new UHasselt (gmail) account could not be linked to the thunderbird installation (even though the intermediate gmail-account for transferring my UGent mail was not that problematic). Apparently there was also a cookies option embedded in thunderbird itself, which should be switched on, woeps.

Two weeks after the start of my new chapter, most hurdles have been overcome. Nearly all necessary software has been installed (with or without the cooperation of the windows 10 OS  👿 ), my e-mail has been transferred, as well as 4Tb of data on the HPC systems (for which I am infinitely grateful to both the HPC teams of the UGent and UHasselt). Now it is time to start working again. Tomorrow, the diamond conference starts at the UHasselt, giving me the opportunity to quickly get involved in a new, additional field of materials.