Category: blog

Modern art in research.

Which combination to take?

Although it looks a bit like a modern piece of art, it is one more attempt at trying to find an optimum combination of parameters.

I’m currently trying to find “the best choice” for U and J for a DFT+U based project… DFT??? Density Functional Theory. This is an approximate method which is used in computational materials science to calculate the quantum mechanical behavior of electrons in matter. Instead of solving the Schrödinger equation, known from any quantum mechanic course, one solves the Hohenberg-Kohn-Sham equations. In these equations it are not the electrons which play a central role (which they do in the Schrödinger equations) but the electron density. Hohenberg, Kohn and Sham were able to show that their equations give the exact same results as the Schrödinger equations. There is, however, one small caveat: you need to have an “exact” exchange-correlation functional (a functional is just a function of a function). Unfortunately there is no known analytic form for this functional, so one needs to use approximated functionals. As you probably guessed, with these approximate functionals the solution of the Hohenberg-Kohn-Sham equations is no longer an exact solution.

For some molecules or solids the error is much larger than average due to the error in the exchange-correlation functional. These systems are therefore called “strongly-correlated” systems. Over the years, several ways have been devised to solve this problem in DFT. One of them is called DFT+U. It entails adding additional coulomb interactions (Hubbard-U-potential) between the “strongly interacting electrons”. However this additional interaction depends on the system at hand, so one always needs to fit this parameter against one of more properties one is interested in. The law of conservation of misery, however, makes sure that improving one property goes hand in hand with a deterioration of another property.

Since actual DFT+U has two independent parameters (U and J, though for many systems they can be dependent reducing to a single parameter) I had quite some fun running calculations for a 21×21 grid of possible pairs. Afterward, collecting the data I wanted to use for fitting purposes took my script about 2h! 😯 Unfortunately the 10 properties of interest I wanted to fit give optimum (U,J)-pair all over the grid. In the picture above, you see my most recent attempt at trying to deal with them. It shows for the entire grid how many of the 10 properties are reasonably well fit.There are two regions which fit 6 properties; One around (U,J)=(5,10) and another around (U,J)=(8.5,17.5). There will be more work before this gives a satisfactory result, the show will go on.

BrENIAC: the new Flemish TIER-1 Supercomputer.

breniacYesterday was a good day for computational scientists in Flanders. The new TIER-1 machine, named BrENIAC, located at the university of Leuven, was inaugurated and is now officially open to all users of the Flemish university associations: UAntwerpen, VUB, UGhent, UHasselt, and KULeuven. The name refers to one of the first (super)computers ever built: ENIAC. This new machine will take over the task of the first TIER-1 machine (muk, located at the university of Ghent), which will be decommissioned at the end of this year. BrENIAC is ranked 196th in the current top 500 of supercomputers, and costs 5.5 M€. This is of course without the annual cost of power usage and technical personnel which will maintain the machine and provide support for the scientists running calculations. With its 580 compute nodes, containing 28 cores each (or 2 14-core CPU’s of the type Broadwell E5-2680v4), the number of available cores has roughly doubled. Also memory access should have improved, which gives rise to a theoretical threefold increase of the peak performance.

However, this peak performance is measured with “benchmark” tests, which tend to behave much better than real  life programs. This is because the average scientific programmer doesn’t write the best optimized code (ok, “commercial” programs these days may even behave worse :p )  for various reasons, time constraints being one of them. So my first task, before I start running my simulations on the new TIER-1 machine, will be to benchmark VASP and my own HIVE-code.

Two videos of my new sidekick:

 

You can see me in my front-row position in this picture taken during the non-academic part of the inauguration.

tUL Life Sciences Research Day 2016

Yesterday was the tUL Life Sciences Research Day 2016. A conference event build around finding collaboration possibilities between the University of Hasselt in Belgium and the University of Maastricht (The Netherlands)…after all tUL is the “transnational University Limburg” which brings two universities together that are only separated some 26 km, but you have to cross a national border.

Although Life sciences itself is not my personal niche, I went to look for opportunities, as nano-particles which are used for drug delivery often consist of metals or oxides. These materials on the other hand are my niche. I used my current work on MOFs as a means to show what is possible from the ab-initio point of view, and presented this as a poster.

tUL Life Science Research Day 2016 Poster

Poster presented at the tUL Life Sciences Research Day, depicting my work on the unfunctionalized and the functionalized MIL-47(V) MOF.

Colloquium on Porous Frameworks: Day 2

Program Porous Frameworks ColloquiumOn Monday, we had the second day of our colloquium on Porous Frameworks, containing no less than 4 full sessions, covering all types of frameworks. We started the day with the invited presentation of Prof. Dirk De Vos of the KU Leuven, who discussed the breathing behavior in Zr and Ti containing MOFs, including the work on the COK-69 in which I was involved myself. In the MOFs presented, the breathing behavior was shown to originate from the folding of the linkers, in contrast to breathing due to the hinging motion of the chains in MIL-47/53 MOFs.

After the transition metals, things were stepped up even further by Dr. Stefania Tanase who talked about the use of lanthanide ions in MOFs. These lanthanides give rise to coordinated water molecules which appear to be crucial to their luminescence. Prof. Donglin Jiang, of JAIST in Japan, changed the subject to the realm of COFs, consisting of 2D porous sheets which, through Van Der Waals interactions form 3D structures (similar to graphite). The tunability of these materials would make them well suited for photoconductors and photoenergy conversion (i.e. solar cells).

With Prof. Rochus Schmid of the University of Bochum we delved into the nitty-gritty details of developing Force-Fields for MOFs. He noted that such force-fields can provide good first approximations for structure determination of new MOFs, and if structure related terms are missing in the force-field these will pop up as missing phonon-frequencies.

Prof. Monique Van der Veen showed us how non-polar guest molecules can make a MOF polar, while Agnes Szecsenyi bravely tackled the activity in Iron based MIL-53 MOFs from the DFT point of view. The row of 3 TU Delft contributions was closed by the invited presentation of Prof. Jorge Gascon who provided an overview of the work in his group and discussed how the active sites in MOFs can be improved through cooperative effects.

Prof. Jaroslaw Handzlik provided the last invited contribution, with a comparative theoretical study of Cr-adsorption on various silicate based materials (from amorphous silicate to zeolites). The final session was then closed by the presentations of Dr. Katrine Svane (Bath University) who discussed the effect of defects in UiO-66 MOFs in further detail and Marcus Rose presenting his findings on hyper-crosslinked Polymers, a type of COFs with an amorphous structure and a wide distribution in different pore sizes.

This brought us to a happy end of a successful colloquium, which was celebrated with a drink in the city center of Groningen. Tuesday we traveled back home, such that Wednesday Sylvia could start at the third part of the conference-holiday roller coaster by leaving for Saltzburg.

Colloquium on Porous Frameworks: Day 1

Program Porous Frameworks ColloquiumToday the CMD26 conference started in Groningen, and with its kick-off also our own 2-day colloquium on porous frameworks (aka MOFs, COFs and Zeolites) was launched. During the two sessions of the day, the focus mainly went out to the Zeolites, with Prof. Emiel Hensen of the Technical university of Eindhoven introducing us to the subject and discussing how new zeolites could be designed in a more rational way. He showed us how the template used during synthesis plays a crucial role in the final growth and structure. Dr. Nakato explained how alkali-metal nanoclusters can undergo insulator to metal transitions when incorporated in zeolites (it is due to the competition between electron-electron repulsion and electron-phonon coupling), while Dr. De Wijs informed us on how Al T-sites need to be ordered and assigned in zeolites to allow for the prediction of NMR parameters.

After the coffee break Dr. Palcic, from the Rudjer Boskovic Institute in Croatia, taught us about the role of heteroatoms in zeolites. She told us that even though more than 2 million theoretical structures exist, only 231 have officially been recognized as having been synthesized, so there is a lot more work to be done. She also showed that to get stable zeolites with pores larger than 7-8 Angstrom one needs to have 3 and 4-membered rings in the structure, since these lead to more rigid configurations. Unfortunately these rings are themselves less stable, and need to be stabilized by different atoms at the T-sites.

Dr. Vandichel, still blushing from his tight traveling scheme, changed the subject from zeolites to MOFs, in providing new understanding in the role of defects in MOFs on their catalytic performance. Dr. Liu changed the subject even further with the introduction of COFs and showing us how Hydrogen atoms migrate through these materials. Using the wisdom of Bruce Lee :

You must be shapeless, formless, like water. When you pour water in a cup, it becomes the cup. When you pour water in a bottle, it becomes the bottle. When you pour water in a teapot, it becomes the teapot.

he clarified how water behaves inside these porous materials. Our first colloquium day was closed by Ir. Rohling, who took us back to the zeolite scene (although he was comparing the zeolites to enzymes). He discussed how reactivity in zeolites can be tweaked by the confinement of the reacting agents, and how this can be used for molecule identification. More importantly he showed how multiple active site collaborate, making chemical reactions much easier than one would expect from single active site models.

After all was said and done, it was time to relax a little during the conference welcome reception. And now time to prepare for tomorrow, day 2 of our colloquium on porous frameworks.

 

Holiday-Conference roller coaster

Visit to Stockholm. The knight at the Medeltidsmuseet (top left), brown bear in Skansen (top right), visiting the Royal palace (bottom left) and local entertainment in the old city center (bottom right).

Visit to Stockholm. The knight at the Medeltidsmuseet (top left), brown bear in Skansen (top right), visiting the Royal palace (bottom left) and local entertainment in the old city center (bottom right).

Summertime is a time of rest for most people. For our little academic family, last summer was a bit of a roller coaster; alternating holidays with hard work which had been postponed too much. The last vestige of my start of a new chapter (moving the remaining stuff from the apartment to our house) was finally bested. Now the conference roller coaster has started with Sylvia’s plenary lecture on conceptual spaces in Stockholm.

As neither of us ever visited Sweden before, we decided to turn it into a semi-family-holiday as well. Our 4-year-old son enjoyed his first ever plane flight (he wasn’t really convinced something impressive was going on). And while Sylvia was of to the conference, the two of us went to explore Stockholm: Finding the knight in the Medeltidsmuseet (at the left in the back of this beautiful museum 🙂 ) and searching for the king and queen at their palace (they weren’t there 🙁 ). Or visiting one of the oldest open-air musea; Skansen (similar to Bokrijk in Belgium) where we saw old professions at work (making cheese for example) and native Scandinavian farm and wild animals (from peacocks to brown bears).

Next weekend starts the next episode of the conference roller-coaster with me hosting a 2-day colloquium on porous frameworks together with Bartek Szyja and Ionut Tranca at the CMD-26 conference in Groningen. We have a nicely packed colloquium with about 20 presentations (8 invited and 12 contributed) covering the whole realm of porous materials from zeolites to COFs and MOFs. The program of the colloquium can be downloaded below:Program Porous Frameworks Colloquium

Simple Parallelization in Fortran: OpenMP

Pentium2speedThe first PC we got at our home was a Pentium II. My dad got it, because I was going to university, and I would be able to do something “useful” with it. (Yup, I survived my entire high school career searching stuff in the library and the home encyclopedia. Even more, Google didn’t even exist before we got our computer, as the company was only founded in 1998 🙂 ). The machine was advertised as state of the art with a clock rate of a whooping 233 MHz! During the decade that followed, the evolution of the clock rates kept going at a steady pace, until it saturated at about 3-4 GHz(15 times faster than the 233 MHz) around 2005. Since then, the clock rate has not increased a bit. If anything, the average clock rate has even decreased to the range 2-3 GHz. As power-consumption grows quadratically with with the clock rate, this means that (1) there is much more heat produced, that needs to be transported away from your CPU (otherwise it get’s destroyed), (2) reducing the clock rate by a factor 2, allows you to power 4 CPU’s at half the clock rate, effectively doubling your calculation power. (There are even more tricks involved in modern CPU’s which crack up performance such that the clock rate isn’t a real measure for performance any longer, and sales people need to learn more new buzzword to sell your computer/laptop 👿 )

intelCoreCloneWhere in 2005 you bought a single CPU with a high clock rate, you now get a machine with multiple cores. Most machines you can get these days have a minimum of 2 cores, with quad-core machines becoming more and more common. But, there is always a but, even though you now have access to multiple times the processing power of 2005, this does not mean that your own code will be able to use it. Unfortunately there is no simple compiler switch which makes your code parallel (like the -m64 switch which makes your code 64-bit), you have to do this yourself (the free lunch is over). Two commonly used frameworks for this task are OpenMP and MPI. The former mainly focuses on shared memory configurations (laptops, desktops, single nodes in a cluster), while the latter focuses of large distributed memory setups (multi-node clusters) and is thus well-suited for creating codes that need to run on hundreds or even thousands of CPU’s. The two frameworks differ significantly in their complexity, fortunately for us, OpenMP is both the easier one, and the one most suited for a modern multi-core computer. The OpenMP framework consists of pragma’s (or directives) which can be added in an existing code as comment lines, and tell a compiler knowledgeable of OpenMP how to parallelize the code. (It is interesting to note that MPI and OpenMP are inteded for parallel programming in either C, C++ or fortran … a hint that what the important programming languages are.)

OpenMP in Fortran: Basics

A. Compiler-options and such

As most modern fortran compilers also are well aware of openMP (you can check which version of openMP is supported here), you generally will not need to install a new compiler to write parallel fortran code. You only need to add a single compiler flag: -fopenmp (gcc/gfortran), -openmp (intel compiler), or -mp (Portland Group). In Code::Blocks you will find this option under Settings > Compiler > Compiler Settings tab > Compiler Flags tab (If the option isn’t present try adding it to “other compiler options” and hope your compiler recognizes one of the flags). 

Secondly, you need to link in the OpenMP library. In Code::Blocks go to Settings > Compiler > Linker Settings tab > Link Libraries: add. Where you add the libgomp.dll.a library (generally found in the folder of your compiler…in case of 64 bit compilers, make sure you get the 64 bit version)

Finally, you may want to get access to OpenMP functions inside your code. This can be achieved by a use statement: use omp_lib.

B. Machine properties

OpenMP contains several functions which allow you to query and set several environment variables (check out these cheat-sheets for OpenMP v3.0 and v4.0).

  • omp_get_num_procs() : returns the number of processors your code sees (in hyper-threaded CPU’s this will be double of the actual number of processor cores).
  • omp_get_num_threads() : returns the number of threads available in a specific section of the code.
  • omp_set_num_threads(I): Sets the number of threads for the openMP parallel section to I
  • omp_get_thread_num() : returns the index of the specific thread you are in [0..I[

 

  1. subroutine OpenMPTest1()
  2.         use omp_lib;
  3.  
  4.         write(*,*) "Running OpenMP Test 1: Environment variables"
  5.         write(*,*) "Number of threads :",omp_get_num_threads()
  6.         write(*,*) "Number of CPU's available:",omp_get_num_procs()
  7.         call omp_set_num_threads(8) ! set the number of threads to 8
  8.         write(*,*) "#Threads outside the parallel section:",omp_get_num_threads()
  9.         !below we start a parallel section
  10.         !$OMP PARALLEL
  11.         write(*,*) "Number of threads in a parallel section :",omp_get_num_threads()
  12.         write(*,*) "Currently in thread with ID = ",omp_get_thread_num()
  13.         !$OMP END PARALLEL
  14.  
  15. end subroutine OpenMPTest1

 

Notice in the example code above that outside the parallel section indicated with the directives $OMP PARALLEL and $OMP END PARALLEL, the program only sees a single thread, while inside the parallel section 8 threads will run (independent of the number of cores available). 

C. Simple parallelization

The OpenMP frameworks consists of an set of directives which can be used to manage the parallelization of your code (cheat-sheets for OpenMP v3.0 and v4.0). I will not describe them in detail as there exists several very well written and full tutorials on the subject, we’ll just have a look at a quick and easy parallelization of a big for-loop. As said, OpenMP makes use of directives (or Pragma’s) which are placed as comments inside the code. As such they will not interfere with your code when it is compiled as a serial code (i.e. without the -fopenmp compiler flag). The directives are preceded by what is called a sentinel ( $OMP ). In the above example code, we already saw a first directive: PARALLEL. Only inside blocks delimited by this directive, can your code be parallel.

  1. subroutine OMPTest2()
  2.         use omp_lib;
  3.  
  4.         integer :: IDT, NT,nrx,nry,nrz
  5.         doubleprecision, allocatable :: A(:,:,:)
  6.         doubleprecision :: RD(1:1000)
  7.         doubleprecision :: startT, TTime, stt
  8.  
  9.         call random_seed()
  10.         call random_number(RD(1:1000))
  11.         IDT=500 ! we will make a 500x500x500 matrix
  12.         allocate(A(1:IDT,1:IDT,1:IDT))
  13.  
  14.         write(*,'(A)') "Number of preferred threads:"
  15.         read(*,*) NT
  16.         call omp_set_num_threads(NT)
  17.         startT=omp_get_wtime()
  18.         !$OMP PARALLEL PRIVATE(stt)
  19.         stt=omp_get_wtime()
  20.        
  21.         !$OMP DO
  22.         do nrz=1,IDT
  23.            do nry=1,IDT
  24.               do nrx=1,IDT
  25.               A(nrx,nry,nrz)=RD(modulo(nrx+nry+nrz,1000)+1)
  26.               end do
  27.            end do
  28.         end do
  29.         !$OMP END DO
  30.         write(*,*) "time=",(omp_get_wtime()-stt)/omp_get_wtick()," ticks for thread ",omp_get_thread_num()
  31.         !$OMP END PARALLEL
  32.         TTime=(omp_get_wtime()-startT)/omp_get_wtick()
  33.         write(*,*)" CPU-resources:",Ttime," ticks."
  34.  
  35.         deallocate(A)
  36.     end subroutine RunTest2

The program above fills up a large 3D array with random values taken from a predetermined list. The user is asked to set the number of threads (lines 14-16), and the function omp_get_wtime() is used to obtain the number of seconds since epoch, while the function omp_get_wtick() gives the number of seconds between ticks. These functions can be used to get some timing data for each thread, but also for the entire program. For each thread, the starting time is stored in the variable stt. To protect this variable of being overwritten by each separate thread, this variable is declared as private to the thread (line 18: PRIVATE(stt) ). As a result, each thread will have it’s own private copy of the stt variable.

The DO directive on line 21, tells the compiler that the following loop needs to be parallelized. Putting the !$OMP DO pragma around the outer do-loop has the advantage that it minimizes the overhead produced by the parallelization (i.e. resources required to make local copies of variables, calculating the distribution of the workload over the different threads at the start of the loop, and combining the results at the end of the loop).

As you can see, parallelizing a loop is rather simple. It takes only 4 additional comment lines (!$OMP PARALLEL , !$OMP DO, !$OMP END DO and !$OMP END PARALLEL) and some time figuring out which variables should be private for each thread, i.e. which are the variables that get updated during each cycle of a loop. Loop counters you can even ignore as these are by default considered private. In addition, the number of threads is set on another line giving us 5 new lines of code in total. It is of course possible to go much further, but this is the basis of what you generally need.

Unfortunately, the presented example is not that computationally demanding, so it will be hard to see the full effect of  the parallelization. Simply increasing the array size will not resolve this as you will quickly run out of memory. Only with more complex operations in the loop will you clearly see the parallelization. An example of a more complex piece of code is given below (it is part of the phonon-subroutine in HIVE):

  1. !setup work space for lapack
  2.         N = this%DimDynMat
  3.         LWORK = 2*N - 1
  4.         call omp_set_num_threads(this%nthreads)
  5.         chunk=(this%nkz)/(this%nthreads*5)
  6.         chunk=max(chunk,1)
  7.         !$OMP PARALLEL PRIVATE(WORK, RWORK, DM, W, RPart,IO)
  8.         allocate(DM(N,N))
  9.         allocate( WORK(2*LWORK), RWORK(3*N-2), W(N) )
  10.         !the write statement only needs to be done by a single thread, and the other threads do not need to wait for it
  11.         !$OMP SINGLE
  12.         write(uni,'(A,I0,A)') " Loop over all ",this%nkpt," q-points."
  13.         !$OMP END SINGLE NOWAIT
  14.         !we have to loop over all q-points
  15.         !$OMP DO SCHEDULE(DYNAMIC,chunk)
  16.         do nrz=1,this%nkz
  17.             do nry=1,this%nky
  18.                 do nrx=1,this%nkx
  19.                     if (this%kpointListBZ1(nrx,nry,nrz)) then
  20.                         !do nrk=1,this%nkpt
  21.                         WORK = 0.0_R_double
  22.                         RWORK = 0.0_R_double                                                
  23.                         DM(1:this%DimDynMat,1:this%DimDynMat)=this%dynmatFIpart(1:this%DimDynMat,1:this%DimDynMat) ! make a local copy
  24.                         do nri=1,this%poscar%nrions
  25.                             do nrj=1,this%poscar%nrions
  26.                                 Rpart=cmplx(0.0_R_double,0.0_R_double)
  27.                                 do ns=this%vilst(1,nri,nrj),this%vilst(2,nri,nrj)
  28.                                     Rpart=Rpart + exp(i*(dot_product(this%rvlst(1:3,ns),this%kpointList(:,nrx,nry,nrz))))
  29.                                 end do
  30.                                 Rpart=Rpart/this%mult(nri,nrj)
  31.                                 DM(((nri-1)*3)+1:((nri-1)*3)+3,((nrj-1)*3)+1:((nrj-1)*3)+3) = &
  32.                                     & DM(((nri-1)*3)+1:((nri-1)*3)+3,((nrj-1)*3)+1:((nrj-1)*3)+3)*Rpart
  33.                             end do
  34.                         end do
  35.                         call MatrixHermitianize(DM,IOS=IO)
  36.                         call ZHEEV( 'V', 'U', N, DM, N, W, WORK, LWORK, RWORK, IO )
  37.                         this%FullPhonFreqList(:,nrx,nry,nrz)=sign(sqrt(abs(W)),W)*fac
  38.                     end if
  39.                 end do
  40.             end do
  41.         end do
  42.         !$OMP END DO
  43.         !$OMP SINGLE
  44.         write(uni,'(A)') " Freeing lapack workspace."
  45.         !$OMP END SINGLE NOWAIT
  46.         deallocate( WORK, RWORK,DM,W )
  47.         !$OP END PARALLEL

In the above code, a set of equations is solved using the LAPACK eigenvalue solver ZHEEV to obtain the energies of the phonon-modes in each point of the Brillouin zone. As the calculation of the eigenvalue spectrum for each point is independent of all other points, this is extremely well-suited for parallelization, so we can add !$OMP PARALLEL and !$OMP END PARALLEL on lines 7 and 47. Inside this parallel section there are several variables which are recycled for every grid point, so we will make them PRIVATE (cf. line 7, most of them are work-arrays for the ZHEEV subroutine).

Lines 12 and 44 both contain a write-statement. Without further action, each thread will perform this write action, and we’ll end up with multiple copies of the same line (Although this will not break your code it will look very sloppy to any user of the code). To circumvent this problem we make use of the !$OMP SINGLE directive. This directive makes sure only 1 thread (the first to arrive) will perform the write action. Unfortunately, the SINGLE block will create an implicit barrier at which all other threads will wait. To prevent this from happening, the NOWAIT clause is added at the end of the block. In this specific case, the NOWAIT clause will have only very limited impact due to the location of the write-statements. But this need not always to be the case.

On line 15 the !$OMP DO pragma indicates a loop will follow that should be parallelized. Again we choose for the outer loop, as to reduce the overhead due to the parallelization procedure. We also tell the compiler how the work should be distributed using the SCHEDULE(TYPE,CHUNK) clause. There are three types of scheduling:

  1. STATIC: which is best suited for homogeneous workloads. The loop is split in equal pieces (size given by the optional parameter CHUNK, else equal pieces with size=total size/#threads)
  2. DYNAMIC: which is better suited if the workload is not homogeneous.(in this case the central if-clause on line 19 complicates things). CHUNK can again be used to define the sizes of the workload blocks.
  3. GUIDED: which is a bit like dynamic but with decreasing block-sizes.

From this real life example, it is again clear that OpenMP parallelization in fortran can be very simple.

D. Speedup?

On my loyal sidekick (with hyper-threaded quad-core core i7) I was able to get following speedups for the phonon-code (the run was limited to performing only a phonon-DOS calculation):

speedup due to openMP parallelization

Speedup of the entire phonon-subroutine due to parallelization of the main-phonon-DOS loop.

The above graph shows the speed-up results for the two different modes for calculating the phonon-DOS. The reduced mode (DM red), uses a spectrum reduced to that of a unit-cell, but needs a much denser sampling of the Brillouin zone (second approach), and is shown by the black line. The serial calculation in this specific case only took 96 seconds, and the maximum speedup obtained was about x1.84. The red and green curves give the speedup of the calculation mode which makes use of the super-cell spectrum (DM nored, i.e. much larger matrix to solve), and shows for increasing grid sizes a maximum speedup of x2.74 (serial time: 45 seconds) and x3.43 (serial time 395 seconds) respectively. The reason none of the setups reaches a speedup of 4 (or more) is twofold:

  1. Amdahl’s law puts an upper limit to the global speedup of a calculation by taking into account that only part of the code is parallelized (e.g. write sections to a single file can not be parallelized.)
  2. There needs to be sufficient blocks of work for all threads (indicated by nkz in the plot)

In case of the DM nored calculations, the parallelized loop clearly takes the biggest part of the calculation-time, while for the DM red calculation, also the section generating the q-point grid takes a large fraction of the calculation time limiting the effect of parallelization. An improvement here would be to also parallelize  the subroutine generating the grid, but that will be for future work. For now, the expensive DM nored calculations show an acceptable speedup.

 

I have a Question: about thermal expansion

“I have a question”(ik heb een vraag). This is the name of a Belgian (Flemisch) website aimed at bringing Flemisch scientists and the general public together through scientific or science related questions. The basic idea is rather simple. Someone has a scientific question and poses it on this website, and a scientist will provide an answer. It is an excellent opportunity for the latter to hone his/her own science communication skills (and do some outreach) and for the former to get an good answer to his/her question.

All questions and answers are collected in a searchable database, which currently contains about fifteen thousand questions answered by a (growing) group of nearly one thousand scientists. This is rather impressive for a region of about 6.5 Million people. I recently joined the group of scientists providing answers.

An interesting materials-related question was posed by Denis (my translation of his question and context):

What is the relation between the density of a material and its thermal expansion?

I was wondering if there exists a relation between the density of a material and the thermal expansion (at the same temperature)? In general, gasses expand more than solids, so can I extend this to the following: Materials with a small density will expand more because the particles are separated more and thus experience a small cohesive force. If this statement is true, then this would imply that a volume of alcohol should expand more than the same volume of air, which I think is puzzling. Can you explain this to me?

Answer (a bit more expanded than the Dutch one):

Unfortunately there exists no simple relation between the density of a material and its thermal expansion coefficient.

Let us first correct something in the example given: the density of alcohol (or ethanol) is 46.07 g/mol (methanol would be 32.04 g/mol) which is significantly more than the density of air which is 28.96 g/mol. So following the suggested assumption, air should expand more. If we look at liquids, it is better to compare ethanol (0.789 g/cm3) to compare water (1 g/cm3) as liquid air (0.87 g/cm3) needs to be cooled below  -196 °C (77K). The thermal expansion coefficients of wtare and ethanol are 207×10-6/°C and 750×10-6/°C, respectively. So in this case, we see that alcohol will expand more than water (at 20°C). Supporting Denis’ statement.

Unfortunately, these are just two simple materials at a very specific temperature for which this statement is true. In reality, there are many interesting aspects complicating life. A few things to keep in mind are:

  • A gas (in contrast to a liquid or solid) has no own boundary. So if you do not put it in any type of a container, then it will just keep expanding. The change in volume observed when a gas is heated is due to an increase in pressure (the higher kinetic energy of the gas molecules makes them bounce harder of the walls of your container, which can make a piston move or a balloon grow). In a liquid or a solid on the other hand, the expansion is rather a stretching of the material itself.
  • Furthermore, the density does not play a role at all, in case of the expansion of an ideal gas, since p*V=n*R*T. From this it follows that 1 mole of H2 gas, at 20°C and a pressure of 1 atmosphere, has the exact same volume as 1 mole of O2 gas, at 20°C and a pressure of 1 atmosphere, even though the latter has a density which is 16 times higher.
  • There are quite a lot of materials which show a negative thermal expansion in a certain temperature region (i.e. they shrink when you increase the temperature). One well-known example is water. The density of liquid water at 0 °C is lower than that of water at 4 °C. This is the reason why there remains some liquid water at the bottom of a pond when it is frozen over.
  • There are also materials which show “breathing” behavior (this are reversible volume changes in solids which made the originators of the term think of human breathing: inhaling expands our lungs and chest, while exhaling contracts it again.) One specific class of these materials are breathing Metal-Organic Frameworks (MOFs). Some of these look like wine-racks (see figure here) which can open and close due to temperature variations. These volume variations can be 50% or more! 😯

The way a material expands due to temperature variations is a rather complex combination of different aspects. It depends on how thermal vibrations (or phonons) propagate through the material, but also on the possible presence of phase-transitions. In some materials there are even phase-transitions between solid phases with a different crystal structure. These, just like solid/liquid phase transitions can lead to very sudden jumps in volume during heating or cooling. These different crystal phases can also have very different physical properties. During the middle-ages, tin pest was a large source of worries for organ-builders. At a temperature below 13°C β-tin is more stable α-tin, which is what was used in organ pipes. However, the high activation energy prevents the phase-transformation from α-tin to β-tin to happen too readily. At temperatures of -30 °C and lower this barrier is more easily overcome.This phase-transition gives rise to a volume reduction of 27%. In addition, β-tin is also a brittle material, which easily disintegrates. During the middle ages this lead to the rapid deterioration and collapse of organ-pipes in church organs during strong winters. It is also said to have caused the buttons of the clothing of Napoleon’s troops to disintegrate during his Russian campaign. As a result, the troops’ clothing fell apart during the cold Russian winter, letting many of them freeze to death.

 

 

Folding Phonons

Game-diamondsAbout a year ago, I discussed the possibility of calculating phonons (the collective vibration of atoms) in the entire Brillouin zone for Metal-Organic Frameworks. Now, one year later, I return to this topic, but this time the subject matter is diamond. In contrast to Metal-Organic Frameworks, the unit-cell of diamond is very small (only 2 atoms). Because a phonon spectrum is calculated through the gradients of forces felt by one atom due to all other atoms, it is clear that within one diamond unit-cell these forces will not be converged. As such, a supercell will be needed to make sure the contribution, due to the most distant atoms, to the experienced forces, are negligible.

Using such a supercell has the unfortunate drawback that the dynamical matrix (which is 3N \times 3N, for N atoms) explodes in size, and, more importantly, that the number of eigenvalues, or phonon-frequencies also increases (3N) where we only want to have 6 frequencies ( 3 \times 2 atoms) for diamond. For an M \times M \times M supercell we end up with 24M^3 -6  additional phonon bands which are the result of band-folding. Or put differently, 24M^3 -6 phonon bands coming from the other unit-cells in the supercell. This is not a problem when calculating the phonon density of states. It is, however, a problem when one is interested in the phonon band structure.

The phonon spectrum at a specific q-point in the first Brillouin zone is given by the square root of the eigenvalues of the dynamical matrix of the system. For simplicity, we first assume a finite system of n atoms (a molecule). In that case, the first Brillouin zone is reduced to a single point q=(0,0,0) and the dynamical matrix looks more or less like the hessian:

With \varphi (N_a,N_b) = [\varphi_{i,j}(N_a , N_b)] 3 \times 3 matrices \varphi_{i,j}(N_a,N_b)=\frac{\partial^2\varphi}{\partial x_i(N_a) \partial x_j(N_b)} = - \frac{\partial F_i (N_a)}{\partial x_j (N_b)}  with i, j = x, y, z. Or in words, \varphi_{i,j}(N_a , N_b) represents the derivative of the force felt by atom N_a due to the displacement of atom N_b. Due to Newton’s second law, the dynamical matrix is expected to be symmetric.

When the system under study is no longer a molecule or a finite cluster, but an infinite solid, things get a bit more complicated. For such a solid, we only consider the symmetry in-equivalent atoms (in practice this is often a unit-cell). Because the first Brillouin zone is no longer a single point, one needs to sample multiple different points to get the phonon density-of-states. The role of the q-point is introduced in the dynamical matrix through a factor e^{iq \cdot (r_{N_a} - r_{N_b}) }, creating a dynamical matrix for a single unit-cell containing n atoms:

Because a real solid contains more than a single unit-cell, one should also take into account the interactions of the atoms of one unit-cell with those of all other unit-cells in the system, and as such the dynamical matrix becomes a sum of matrices like the one above:

Where the sum runs over all unit-cells in the system, and Ni indicates an atom in a specific reference unit-cell, and MRi  an atom in the Rth unit-cells, for which we give index 1 to the reference unit-cell. As the forces decay with the distance between the atoms, the infinite sum can be truncated. For a Metal-Organic Framework a unit-cell will quite often suffice. For diamond, however, a larger cell is needed.

An interesting aspect to the dynamical matrix above is that all matrix-elements for a sum over n unit-cells are also present in a single dynamical matrix for a supercell containing these n unit-cells. It becomes even more interesting if one notices that due to translational symmetry one does not need to calculate all elements of the entire supercell dynamical matrix to construct the full supercell dynamical matrix.

Assume a 2D 2×2 supercell with only a single atom present, which we represent as in the figure on the right. A single periodic copy of the supercell is added in each direction. The dynamical matrix for the supercell can now be constructed as follows: Calculate the elements of the first column (i.e. the gradient of the force felt by the atom in the reference unit-cell, in black, due to the atoms in each of the unit-cells in the supercell). Due to Newton’s third law (action = reaction), this first column and row will have the same elements (middle panel).

Translational symmetry on the other hand will allow us to determine all other elements. The most simple are the diagonal elements, which represent the self-interaction (so all are black squares). The other you can just as easily determine by looking at the schematic representation of the supercell under periodic boundary conditions. For example, to find the derivative of the force on the second cell (=second column, green square in supercell) due to the third cell (third row, blue square in supercell), we look at the square in the same relative position of the blue square to the green square, when starting from the black square: which is the red square (If you read this a couple of times it will start to make sense). Like this, the dynamical matrix of the entire supercell can be constructed.

This final supercell dynamical matrix can, with the same ease, be folded back into the sum of unit-cell dynamical matrices (it becomes an extended lookup-table). The resulting unit-cell dynamical matrix can then be used to create a band structure, which in my case was nicely converged for a 4x4x4 supercell. The bandstructure along high symmetry lines is shown below, but remember that these are actually 3D surfaces. A nice video of the evolution of the first acoustic band (i.e. lowest band) as function of its energy can be found here.

The phonon density of states can also be obtained in two ways, which should, in contrast to the band structure, give the exact same result: (for an M \times M \times M supercell with n atoms per unit-cell)

  1. Generate the density of states for the supercell and corresponding Brillouin zone. This has the advantage that the smaller Brillouin zone can be sampled with fewer q-points, as each q-point acts as M3 q-points in a unit-cell-approach. The drawback here is the fact that for each q-point a (3nM3)x(3nM3) dynamical matrix needs to be solved. This solution scales approximately as O(N3) ~ (3nM3)3 =(3n)3M9. Using linear algebra packages such as LAPACK, this may be done slightly more efficient (but you will not get O(N2) for example).
  2. Generate the density of states for the unit-cell and corresponding Brillouin zone. In this approach, the dynamical matrix to solve is more complex to construct (due to the sum which needs to be taken) but much smaller: 3nx3n. However to get the same q-point density, you will need to calculate M3 times as many q-points as for the supercell.

In the end, the choice will be based on whether you are limited by the accessible memory (when running a 32-bit application, the number of q-point will be detrimental) or CPU-time (solving the dynamical matrix quickly becomes very expensive).

 

To x64 or not to x64: Installing a 64-bit fortran compiler

Current day computers generally have 64-bit processors, and most even have 64-bit operating systems. On such systems, 32-bit programs will run fine, but 64-bit programs can make more efficient use of the underlying system. When we installed a fortran compiler and the code::blocks IDE, the default fortran compiler generated 32-bit programs. This generally is not an issue, unless you need a large amount of memory, for example to store a temporary array with 4003 double precision coordinates (as I did for a project I’m currently working on). You may first start to look for ways of increasing the stack-size of your program, but you will soon discover that the problem is more profound: a 32-bit program cannot access address spacing beyond 4Gb. (In practice, generally you will not even reach 4Gb before running into problems.) This is because the memory address of your data is stored as a 32-bit value (232 = 4 294 967 296 = 4Gb) so the only way out of this predicament is a “larger address” aka 64-bit. So you need to install a new compiler capable of providing 64-bit programs.

  1. Installing minGW64 for code::blocks
    1. Installing the compiler
    2. Setting the PATH-variable (win10)
    3. Adding the compiler to code::blocks
  2. Upgrading Lapack to 64-bit

Continue reading