Tag: HIVE-code

Creating online forms and catching spam-bots

Recently, I decided to add a custom registration form  to my website, as part of an effort to improve and streamline the “HIVE-STM tool experience” 😉 . Up until now, potential users had to directly send me an e-mail, telling me a bit more about themselves and their work. I would then e-mail them the program, and add their information to a user list for future reference (i.e., support and some statistics for my personal entertainment).

This has the drawback that any future user needs to wait until I find the time to reply. To improve on the user-friendliness, I thought it would be nice to automate this a bit. A first step in this process entails making the application a bit more uniform: using an online registration form.

The art of learning something new: Do it from scratch

What started out with the intention of being an almost trivial exercise in building a web-form, turned into a steep learning curve about web-development and cyber-security. I am aware there exists many tools which generate forms for websites or even provide you a platform which hosts the form (e.g., google-forms, which I used in the past), but I wanted to do implement it myself (…something to do with pride 😉 ). Having build websites using HTML and CSS in the past, and having some basic experience with Javascript, this looked like a fun afternoon project. The HTML for the form was easily created using the tutorials found on w3schools.com and an old second edition “Handboek HTML5 en CSS3“, I picked up a few years ago browsing a second hand bookshop. Trouble, however, started rearing its ugly head as soon as I wanted to integrate this form in this WordPress website. Just pasting this into a page or post doesn’t really work, as WordPress wants to “help” you, and prevent you from hurting yourself. This is a fantastic feature if you have no clue about HTML/CSS/… or don’t want to care about it. Unfortunately, if you want to do something slightly more  advanced you are in for a hell of a ride, as you find out the relevant bits get redacted or disabled.

Searching for specific solutions with regard to creating a custom form in WordPress I was astounded at how often the default suggestion is: “use plugin XXX” or “use tool YYY”. Are we loosing the ability to want to craft something ourselves? Yes of course, there are professional tools available which can be better than anything you yourself can build in a short amount of time…but should it discourage you of trying, and feeling the satisfaction of having created something? I digress.

In the end, I discovered a good quality tutorial (once you get past the reasons why not to do it) and I started a long uphill battle trying to bend WordPress to my will:

  1. Paste form-code in postWP countermove: remove relevant tags essentially killing the form.
  2. Solution: put the form in a dedicated template ⇒ WP countermove: hard to integrate in existing theme, will be removed upon update of the theme
  3. Solution: create a child-theme ⇒ WP countermove: interesting exercise is getting the CSS style-sheet to work together with that of the parent theme. (wp_enqueue_style, wp_enqueue_scripts, get_template_directory_uri() and get_stylesheet_directory_uri() saved the day.)
  4. Add PHP back-end to the form…and deal with the idiosyncrasies of this scripting language. Crashed the website a few time due to missing “;”… error messages would be nice, instead of the blank web-page.

 

Trying not to torture future users

At this point, the form accepted input, and collected it via the PHP $_POST global variable. En route to this point, I read quite a few warnings about Cross-Site Request Forgery (CSRF) and that one should protect against it. Luckily, the tutorial practically showed how to do this in WordPress using nonces…in contrast to WordPress theme handbook which gives in formation, but not easy to understand if you are new to the subject.

With a basic sense of security, I was aiming at making things user-friendly, i.e., if something goes wrong it would be nice if you do not need to again fill out the form entirely. Searching for ways to keep this information I came across a lot of options, none of which seemed to work (cookies, PHP variables, global variables, etc). The problem appeared to originate from the fact that the information was not persistent. Once the web page started reloading, everything got erased. It was only at this point that I learned about “transients” in WordPress, and using get_transient() and set_transient() resolved all the issues instantaneously. There is only one caveat at this point: If two potential users submit their registration at almost the same time one may end up seeing the registration information of the other. (However, at this time the program is far from famous enough to present any issues, so statistics will save us from this).

Only one thing remained to be done: put all relevant information into two e-mail messages, one to be sent to myself, and one to be sent to the potential user. For this, I made use of the PHP mail() function. It works quite nicely, and after playing around with it for a bit (and convincing myself a nice HTML formatted layout will not work for example in gmail) the setup was complete. That evening, I went to bed, happy with the accomplishment: I had created something.

Too popular for comfort

Bot Activity on the HIVE registration form during February and March of 2021.

Bot Activity on the HIVE registration form during February and March of 2021.

The next morning, I was amazed to find already several applications for the HIVE-STM program in my mailbox (that is, in addition to my own test runs). These were not sent by real humans, but appeared to be the work of bots just filling out the form and sending it off. This left me a bit puzzled, and I have been looking for the reason why anyone would actually bother writing a bot for this purpose. So far I’ve seen the suggestion that this is to improve the SEO of websites, generate spam-email (to yourself or with you as middleman), DOS-attacks, get access to your SQL database via code injection,…and after all my searches, I start to get the impression this may also be a means of promoting all the plugins, tools, frameworks that block these bots? In roughly each discussion you find, there will be at least one person promoting such a foolproof perfect tool 😯 🙄 …but might just be me.

So how do we deal with these bots, preferably without driving potential users crazy? Reading all the suggestions (which unfortunately provide extremely little information on the actual working and logic of spam-bots themselves) I added, in several rounds, some tricks to block/catch the bots, and have been tracking the submits since the form went live. As you can see there is a steady stream of some 50 bots weekly trying to fill out the form. The higher number in the first week is due to any submission being redirected to the original form page, as such the same bots performed multiple attempts within the time-range of a few minutes. In about two months, I collected the results of 400 registration attempts by bots (and 4 by humans).

Analyzing the results, I learned learned some interesting things.

How to catch a bot? I track 4 different signals which may be indicative of bot behavior.

How to catch a bot? I track 4 different signals which may be indicative of bot behavior.

1. To Captcha or not to Captcha?

One of the first things to add, from a human perspective is “a captcha”. The captcha is manually implemented simple random sum/product/subtraction. It should be easy for humans, but it is annoying as they need to fill out an extra field (and may fill it out incorrectly). Interestingly, 56% of the bots fill out the Captcha correctly. Of course more complicated versions could be implemented or used…but the bottom line is simple: it generally does not do the job, and annoys the actual human being.

2. Bot Trapping for furs?

Going beyond captcha’s, a lot of tutorials suggest the use of a honeypot. One can either make use of automated options of existing frameworks, plugins or …implement these oneself. This option appears to be very successful in targeting bots. The 1% successful cases coincided with the only human submissions. At this point we appear to have a “fool-proof” method for distinguishing between humans and bots.

3. Dropping the bot down the box?

Interestingly, drop-down menu’s with not generally used topics seem to throw off bots as well. The seniority drop-down menu shows failure rates even better than the captcha.

Conclusion

Writing your own form from scratch is a very interesting exercise, and well worth the time if you want to learn more about web-security as well as the inner workings of the framework used for your website. Bots are an interesting nuisance, and captcha’s just bother your user as most bots can easily deal with them. Logging the inputs of the bots does show a wide range in quality of these bots. Some just fill out garbage, while others appear to be quite smart, filling out reasonable answers. Other bots clearly have malignant purposes, which becomes clear from the code they try to plug into the form fields.

For now, the registration form seems to be able to distinguish between human-and bot-users. As such, we have successfully completed another step in upgrading the STM-program

Partitioning the vibrational spectrum: Fingerprinting defects in solids

Authors:  Danny E. P. Vanpoucke
Journal: Computational Materials Science 181, 109736 (2020)
doi: 10.1016/j.commatsci.2020.109736
IF(2019): 2.863
export: bibtex
pdf: <ComputMaterSci>   (Open Access)
github: <Hive-toolbox>

 

Graphical abstract Computational Materials Science 181, 109736 (2020)
Graphical Abstract: Finger printing defects in diamond through the creation of the vibrational spectrum of a defect.

Abstract

Vibrational spectroscopy techniques are some of the most-used tools for materials
characterization. Their simulation is therefore of significant interest, but commonly
performed using low cost approximate computational methods, such as force-fields.
Highly accurate quantum-mechanical methods, on the other hand are generally only used
in the context of molecules or small unit cell solids. For extended solid systems,
such as defects, the computational cost of plane wave based quantum mechanical simulations
remains prohibitive for routine calculations. In this work, we present a computational scheme
for isolating the vibrational spectrum of a defect in a solid. By quantifying the defect character
of the atom-projected vibrational spectra, the contributing atoms are identified and the strength
of their contribution determined. This method could be used to systematically improve phonon
fragment calculations. More interestingly, using the atom-projected vibrational spectra of the
defect atoms directly, it is possible to obtain a well-converged defect spectrum at lower
computational cost, which also incorporates the host-lattice interactions. Using diamond as
the host material, four point-defect test cases, each presenting a distinctly different
vibrational behaviour, are considered: a heavy substitutional dopant (Eu), two intrinsic
point-defects (neutral vacancy and split interstitial), and the negatively charged N-vacancy
center. The heavy dopant and split interstitial present localized modes at low and high
frequencies, respectively, showing little overlap with the host spectrum. In contrast, the
neutral vacancy and the N-vacancy center show a broad contribution to the upper spectral range
of the host spectrum, making them challenging to extract. Independent of the vibrational behaviour,
the main atoms contributing to the defect spectrum can be clearly identified. Recombination of
their atom-projected spectra results in the isolated spectrum of the point-defect.

Predicting Partial Atomic Charges in Siliceous Zeolites

Authors: Jarod J. Wolffis, Danny E. P. Vanpoucke, Amit Sharma, Keith V. Lawler, and Paul M. Forster
Journal: Microporous Mesoporous Mater. 277, 184-196 (2019)
doi: 10.1016/j.micromeso.2018.10.028
IF(2019): 4.551
export: bibtex
pdf: <MicroporousMesoporousMater>

 

Partial charges in zeolites for force fields.
Graphical Abstract: Partial charges in zeolites for force fields.

Abstract

Partial atomic charge, which determines the magnitude of the Coulombic non-bonding interaction, represents a critical parameter in molecular mechanics simulations. Partial charges may also be used as a measure of physical properties of the system, i.e. covalency, acidic/catalytic sites, etc. A range of methods, both empirical and ab initio, exist for calculating partial charges in a given solid, and several of them are compared here for siliceous (pure silica) zeolites. The relationships between structure and the predicted partial charge are examined. The predicted partial charges from different methods are also compared with related experimental observations, showing that a few of the methods offer some guidance towards identifying the T-sites most likely to undergo substitution or for proton localization in acidic framework forms. Finally, we show that assigning unique calculated charges to crystallographically unique framework atoms makes an appreciable difference in simulating predicting N2 and O2 adsorption with common dispersion-repulsion parameterizations.

Review of 2017

Happy New Year

2017 has come and gone. 2018 eagerly awaits getting acquainted. But first we look back one last time, trying to turn this into a old tradition. What have I done during the last year of some academic merit.

Publications: +4

Completed refereeing tasks: +8

  • The Journal of Physical Chemistry (2x)
  • Journal of Physics: Condensed Matter (3x)
  • Diamond and Related Materials (3x)

Conferences & workshops: +5 (Attended) 

  • Int. Conference on Diamond and Carbon Materials (DCM) 2017, Gothenburg, Sweden, September 3rd-7th, 2017 [oral presentation]
  • Summerschool: “Upscaling techniques for mathematical models involving multiple scales”, Hasselt, Belgium, June 26th-29th, 2017 [poster presentation]
  • VSC-user day, Brussels, Belgium, June 2nd, 2017 [poster presentation]
  • E-MRS 2017 Spring Meeting, Strasbourg, France, May 22nd-26th, 2017 [1 oral + 2 poster presentations]
  • SBDD XXII, Hasselt University, Belgium, March 8th-10th, 2017 [poster presentation]

PhD-students: +1

  • Mohammadreza Hosseini (okt.-… ,Phd student physical chemistry, Tarbiat Modares University, Teheran, Iran)

Bachelor-students: +2

Current size of HIVE:

  • 48.5K lines of program (code: 70 %)
  • ~70 files
  • 45 (command line) options

Hive-STM program:

And now, upward and onward, a new year, a fresh start.

Review of 2016

2016 has come and gone. 2017 eagerly awaits getting acquainted. But first we look back one last time, trying to turn this into a tradition. What have I done during the last year of some academic merit.

Publications: +4

Completed refereeing tasks: +5

  • ACS Sustainable Chemistry & Engineering
  • The Journal of Physical Chemistry
  • Journal of Physics: Condensed Matter (2x)
  • Diamond and Related Materials

Conferences: +4 (Attended) & + 1 (Organized)

PhD-students: +2

  • Arthur De Vos : (Jan.-Mar., Ghent University, Ghent, Belgium )
  • Mohammadreza Hosseini (okt.-… ,Phd student physical chemistry, Tarbiat Modares University, Teheran, Iran)

Current size of HIVE:

  • 47K lines of program (code: 70 %)
  • 70 files
  • 44 (command line) options

Hive-STM program:

And now, upward and onward, a new year, a fresh start.

Simple Parallelization in Fortran: OpenMP

Pentium2speedThe first PC we got at our home was a Pentium II. My dad got it, because I was going to university, and I would be able to do something “useful” with it. (Yup, I survived my entire high school career searching stuff in the library and the home encyclopedia. Even more, Google didn’t even exist before we got our computer, as the company was only founded in 1998 🙂 ). The machine was advertised as state of the art with a clock rate of a whooping 233 MHz! During the decade that followed, the evolution of the clock rates kept going at a steady pace, until it saturated at about 3-4 GHz(15 times faster than the 233 MHz) around 2005. Since then, the clock rate has not increased a bit. If anything, the average clock rate has even decreased to the range 2-3 GHz. As power-consumption grows quadratically with with the clock rate, this means that (1) there is much more heat produced, that needs to be transported away from your CPU (otherwise it get’s destroyed), (2) reducing the clock rate by a factor 2, allows you to power 4 CPU’s at half the clock rate, effectively doubling your calculation power. (There are even more tricks involved in modern CPU’s which crack up performance such that the clock rate isn’t a real measure for performance any longer, and sales people need to learn more new buzzword to sell your computer/laptop 👿 )

intelCoreCloneWhere in 2005 you bought a single CPU with a high clock rate, you now get a machine with multiple cores. Most machines you can get these days have a minimum of 2 cores, with quad-core machines becoming more and more common. But, there is always a but, even though you now have access to multiple times the processing power of 2005, this does not mean that your own code will be able to use it. Unfortunately there is no simple compiler switch which makes your code parallel (like the -m64 switch which makes your code 64-bit), you have to do this yourself (the free lunch is over). Two commonly used frameworks for this task are OpenMP and MPI. The former mainly focuses on shared memory configurations (laptops, desktops, single nodes in a cluster), while the latter focuses of large distributed memory setups (multi-node clusters) and is thus well-suited for creating codes that need to run on hundreds or even thousands of CPU’s. The two frameworks differ significantly in their complexity, fortunately for us, OpenMP is both the easier one, and the one most suited for a modern multi-core computer. The OpenMP framework consists of pragma’s (or directives) which can be added in an existing code as comment lines, and tell a compiler knowledgeable of OpenMP how to parallelize the code. (It is interesting to note that MPI and OpenMP are inteded for parallel programming in either C, C++ or fortran … a hint that what the important programming languages are.)

OpenMP in Fortran: Basics

A. Compiler-options and such

As most modern fortran compilers also are well aware of openMP (you can check which version of openMP is supported here), you generally will not need to install a new compiler to write parallel fortran code. You only need to add a single compiler flag: -fopenmp (gcc/gfortran), -openmp (intel compiler), or -mp (Portland Group). In Code::Blocks you will find this option under Settings > Compiler > Compiler Settings tab > Compiler Flags tab (If the option isn’t present try adding it to “other compiler options” and hope your compiler recognizes one of the flags). 

Secondly, you need to link in the OpenMP library. In Code::Blocks go to Settings > Compiler > Linker Settings tab > Link Libraries: add. Where you add the libgomp.dll.a library (generally found in the folder of your compiler…in case of 64 bit compilers, make sure you get the 64 bit version)

Finally, you may want to get access to OpenMP functions inside your code. This can be achieved by a use statement: use omp_lib.

B. Machine properties

OpenMP contains several functions which allow you to query and set several environment variables (check out these cheat-sheets for OpenMP v3.0 and v4.0).

  • omp_get_num_procs() : returns the number of processors your code sees (in hyper-threaded CPU’s this will be double of the actual number of processor cores).
  • omp_get_num_threads() : returns the number of threads available in a specific section of the code.
  • omp_set_num_threads(I): Sets the number of threads for the openMP parallel section to I
  • omp_get_thread_num() : returns the index of the specific thread you are in [0..I[

 

  1. subroutine OpenMPTest1()
  2.         use omp_lib;
  3.  
  4.         write(*,*) "Running OpenMP Test 1: Environment variables"
  5.         write(*,*) "Number of threads :",omp_get_num_threads()
  6.         write(*,*) "Number of CPU's available:",omp_get_num_procs()
  7.         call omp_set_num_threads(8) ! set the number of threads to 8
  8.         write(*,*) "#Threads outside the parallel section:",omp_get_num_threads()
  9.         !below we start a parallel section
  10.         !$OMP PARALLEL
  11.         write(*,*) "Number of threads in a parallel section :",omp_get_num_threads()
  12.         write(*,*) "Currently in thread with ID = ",omp_get_thread_num()
  13.         !$OMP END PARALLEL
  14.  
  15. end subroutine OpenMPTest1

 

Notice in the example code above that outside the parallel section indicated with the directives $OMP PARALLEL and $OMP END PARALLEL, the program only sees a single thread, while inside the parallel section 8 threads will run (independent of the number of cores available). 

C. Simple parallelization

The OpenMP frameworks consists of an set of directives which can be used to manage the parallelization of your code (cheat-sheets for OpenMP v3.0 and v4.0). I will not describe them in detail as there exists several very well written and full tutorials on the subject, we’ll just have a look at a quick and easy parallelization of a big for-loop. As said, OpenMP makes use of directives (or Pragma’s) which are placed as comments inside the code. As such they will not interfere with your code when it is compiled as a serial code (i.e. without the -fopenmp compiler flag). The directives are preceded by what is called a sentinel ( $OMP ). In the above example code, we already saw a first directive: PARALLEL. Only inside blocks delimited by this directive, can your code be parallel.

  1. subroutine OMPTest2()
  2.         use omp_lib;
  3.  
  4.         integer :: IDT, NT,nrx,nry,nrz
  5.         doubleprecision, allocatable :: A(:,:,:)
  6.         doubleprecision :: RD(1:1000)
  7.         doubleprecision :: startT, TTime, stt
  8.  
  9.         call random_seed()
  10.         call random_number(RD(1:1000))
  11.         IDT=500 ! we will make a 500x500x500 matrix
  12.         allocate(A(1:IDT,1:IDT,1:IDT))
  13.  
  14.         write(*,'(A)') "Number of preferred threads:"
  15.         read(*,*) NT
  16.         call omp_set_num_threads(NT)
  17.         startT=omp_get_wtime()
  18.         !$OMP PARALLEL PRIVATE(stt)
  19.         stt=omp_get_wtime()
  20.        
  21.         !$OMP DO
  22.         do nrz=1,IDT
  23.            do nry=1,IDT
  24.               do nrx=1,IDT
  25.               A(nrx,nry,nrz)=RD(modulo(nrx+nry+nrz,1000)+1)
  26.               end do
  27.            end do
  28.         end do
  29.         !$OMP END DO
  30.         write(*,*) "time=",(omp_get_wtime()-stt)/omp_get_wtick()," ticks for thread ",omp_get_thread_num()
  31.         !$OMP END PARALLEL
  32.         TTime=(omp_get_wtime()-startT)/omp_get_wtick()
  33.         write(*,*)" CPU-resources:",Ttime," ticks."
  34.  
  35.         deallocate(A)
  36.     end subroutine RunTest2

The program above fills up a large 3D array with random values taken from a predetermined list. The user is asked to set the number of threads (lines 14-16), and the function omp_get_wtime() is used to obtain the number of seconds since epoch, while the function omp_get_wtick() gives the number of seconds between ticks. These functions can be used to get some timing data for each thread, but also for the entire program. For each thread, the starting time is stored in the variable stt. To protect this variable of being overwritten by each separate thread, this variable is declared as private to the thread (line 18: PRIVATE(stt) ). As a result, each thread will have it’s own private copy of the stt variable.

The DO directive on line 21, tells the compiler that the following loop needs to be parallelized. Putting the !$OMP DO pragma around the outer do-loop has the advantage that it minimizes the overhead produced by the parallelization (i.e. resources required to make local copies of variables, calculating the distribution of the workload over the different threads at the start of the loop, and combining the results at the end of the loop).

As you can see, parallelizing a loop is rather simple. It takes only 4 additional comment lines (!$OMP PARALLEL , !$OMP DO, !$OMP END DO and !$OMP END PARALLEL) and some time figuring out which variables should be private for each thread, i.e. which are the variables that get updated during each cycle of a loop. Loop counters you can even ignore as these are by default considered private. In addition, the number of threads is set on another line giving us 5 new lines of code in total. It is of course possible to go much further, but this is the basis of what you generally need.

Unfortunately, the presented example is not that computationally demanding, so it will be hard to see the full effect of  the parallelization. Simply increasing the array size will not resolve this as you will quickly run out of memory. Only with more complex operations in the loop will you clearly see the parallelization. An example of a more complex piece of code is given below (it is part of the phonon-subroutine in HIVE):

  1. !setup work space for lapack
  2.         N = this%DimDynMat
  3.         LWORK = 2*N - 1
  4.         call omp_set_num_threads(this%nthreads)
  5.         chunk=(this%nkz)/(this%nthreads*5)
  6.         chunk=max(chunk,1)
  7.         !$OMP PARALLEL PRIVATE(WORK, RWORK, DM, W, RPart,IO)
  8.         allocate(DM(N,N))
  9.         allocate( WORK(2*LWORK), RWORK(3*N-2), W(N) )
  10.         !the write statement only needs to be done by a single thread, and the other threads do not need to wait for it
  11.         !$OMP SINGLE
  12.         write(uni,'(A,I0,A)') " Loop over all ",this%nkpt," q-points."
  13.         !$OMP END SINGLE NOWAIT
  14.         !we have to loop over all q-points
  15.         !$OMP DO SCHEDULE(DYNAMIC,chunk)
  16.         do nrz=1,this%nkz
  17.             do nry=1,this%nky
  18.                 do nrx=1,this%nkx
  19.                     if (this%kpointListBZ1(nrx,nry,nrz)) then
  20.                         !do nrk=1,this%nkpt
  21.                         WORK = 0.0_R_double
  22.                         RWORK = 0.0_R_double                                                
  23.                         DM(1:this%DimDynMat,1:this%DimDynMat)=this%dynmatFIpart(1:this%DimDynMat,1:this%DimDynMat) ! make a local copy
  24.                         do nri=1,this%poscar%nrions
  25.                             do nrj=1,this%poscar%nrions
  26.                                 Rpart=cmplx(0.0_R_double,0.0_R_double)
  27.                                 do ns=this%vilst(1,nri,nrj),this%vilst(2,nri,nrj)
  28.                                     Rpart=Rpart + exp(i*(dot_product(this%rvlst(1:3,ns),this%kpointList(:,nrx,nry,nrz))))
  29.                                 end do
  30.                                 Rpart=Rpart/this%mult(nri,nrj)
  31.                                 DM(((nri-1)*3)+1:((nri-1)*3)+3,((nrj-1)*3)+1:((nrj-1)*3)+3) = &
  32.                                     & DM(((nri-1)*3)+1:((nri-1)*3)+3,((nrj-1)*3)+1:((nrj-1)*3)+3)*Rpart
  33.                             end do
  34.                         end do
  35.                         call MatrixHermitianize(DM,IOS=IO)
  36.                         call ZHEEV( 'V', 'U', N, DM, N, W, WORK, LWORK, RWORK, IO )
  37.                         this%FullPhonFreqList(:,nrx,nry,nrz)=sign(sqrt(abs(W)),W)*fac
  38.                     end if
  39.                 end do
  40.             end do
  41.         end do
  42.         !$OMP END DO
  43.         !$OMP SINGLE
  44.         write(uni,'(A)') " Freeing lapack workspace."
  45.         !$OMP END SINGLE NOWAIT
  46.         deallocate( WORK, RWORK,DM,W )
  47.         !$OP END PARALLEL

In the above code, a set of equations is solved using the LAPACK eigenvalue solver ZHEEV to obtain the energies of the phonon-modes in each point of the Brillouin zone. As the calculation of the eigenvalue spectrum for each point is independent of all other points, this is extremely well-suited for parallelization, so we can add !$OMP PARALLEL and !$OMP END PARALLEL on lines 7 and 47. Inside this parallel section there are several variables which are recycled for every grid point, so we will make them PRIVATE (cf. line 7, most of them are work-arrays for the ZHEEV subroutine).

Lines 12 and 44 both contain a write-statement. Without further action, each thread will perform this write action, and we’ll end up with multiple copies of the same line (Although this will not break your code it will look very sloppy to any user of the code). To circumvent this problem we make use of the !$OMP SINGLE directive. This directive makes sure only 1 thread (the first to arrive) will perform the write action. Unfortunately, the SINGLE block will create an implicit barrier at which all other threads will wait. To prevent this from happening, the NOWAIT clause is added at the end of the block. In this specific case, the NOWAIT clause will have only very limited impact due to the location of the write-statements. But this need not always to be the case.

On line 15 the !$OMP DO pragma indicates a loop will follow that should be parallelized. Again we choose for the outer loop, as to reduce the overhead due to the parallelization procedure. We also tell the compiler how the work should be distributed using the SCHEDULE(TYPE,CHUNK) clause. There are three types of scheduling:

  1. STATIC: which is best suited for homogeneous workloads. The loop is split in equal pieces (size given by the optional parameter CHUNK, else equal pieces with size=total size/#threads)
  2. DYNAMIC: which is better suited if the workload is not homogeneous.(in this case the central if-clause on line 19 complicates things). CHUNK can again be used to define the sizes of the workload blocks.
  3. GUIDED: which is a bit like dynamic but with decreasing block-sizes.

From this real life example, it is again clear that OpenMP parallelization in fortran can be very simple.

D. Speedup?

On my loyal sidekick (with hyper-threaded quad-core core i7) I was able to get following speedups for the phonon-code (the run was limited to performing only a phonon-DOS calculation):

speedup due to openMP parallelization

Speedup of the entire phonon-subroutine due to parallelization of the main-phonon-DOS loop.

The above graph shows the speed-up results for the two different modes for calculating the phonon-DOS. The reduced mode (DM red), uses a spectrum reduced to that of a unit-cell, but needs a much denser sampling of the Brillouin zone (second approach), and is shown by the black line. The serial calculation in this specific case only took 96 seconds, and the maximum speedup obtained was about x1.84. The red and green curves give the speedup of the calculation mode which makes use of the super-cell spectrum (DM nored, i.e. much larger matrix to solve), and shows for increasing grid sizes a maximum speedup of x2.74 (serial time: 45 seconds) and x3.43 (serial time 395 seconds) respectively. The reason none of the setups reaches a speedup of 4 (or more) is twofold:

  1. Amdahl’s law puts an upper limit to the global speedup of a calculation by taking into account that only part of the code is parallelized (e.g. write sections to a single file can not be parallelized.)
  2. There needs to be sufficient blocks of work for all threads (indicated by nkz in the plot)

In case of the DM nored calculations, the parallelized loop clearly takes the biggest part of the calculation-time, while for the DM red calculation, also the section generating the q-point grid takes a large fraction of the calculation time limiting the effect of parallelization. An improvement here would be to also parallelize  the subroutine generating the grid, but that will be for future work. For now, the expensive DM nored calculations show an acceptable speedup.

 

One more digit of importance

Over the past few weeks I have bumped into several issues each tracing back to numerical accuracy. Although I have been  programming for almost two decades I never had to worry much about this, making these events seem as-if the universe is trying to tell me something.

Now, let me try to give a proper start to this story; Computational (materials) research is generally perceived as a subset of theoretical (materials) research, and it is true that such a case can be made. It is, however, also true that such thinking can trap us (i.e. the average computational physicist/chemist/mathematician/… programming his/her own code) with numerical accuracy problems. While theoretical equations use exact values for numbers, a computer program is limited by the numerical precision of the variables (e.g. single, double or quadruple precision for real numbers) used in the program. This means that actual numbers with a larger precision are truncated or rounded to the precision of the variable (e.g. 1/3 becomes 0.3333333 instead of 0.333… with an infinite series of 3’s). Most of the time, this is sufficient, and nothing strange will happen. Even more, most of the time, the additional digits would only increase the computational cost while not improving the results in a significant fashion.

Interstellar disc

To understand the importance, or the lack thereof, of additional significant digits, let us first have a look at the precision of \pi and the circumference and surface area of a disc. We will be looking at a rather large disc, one with a radius equal to the distance between the sun, and the nearest star, Alpha Centauri, which is 39 900 000 000 000 km away. The circumference of this disc is given by  2r\pi (or  2.5 \times 10^{14} km ). As a single precision variable \pi will have about 7-8 significant digits. This means the calculated circumference will have an accuracy of about 1 000 000 km (or a few times the distance between the earth and the moon). Using a double precision \pi variable, which has a precision of 16 decimal digits, the circumference will be accurately calculated to within a few meters. At quadrupal precision, the \pi variable would have 34 significant decimal digits, and we would even be able to calculate the surface of the disc ( r^2\pi or  5.0 \times 10^{33} m² ) to within 1 m². Even the surface of a disc the size of our milky way could be calculated with an accuracy of a few hundred square km (or ± the size of Belgium ).

Knowing this, our mind is quickly put at easy regarding possible issues regarding numerical accuracy. However, once in a while we run into one exceptional case (or three, in my case).

1. Infinitesimal finite elements

Temperature profile in the insulating layer of a cylindrical wire.

Temperature profile in the insulating layer of a cylindrical wire.

While looking into the theory behind finite elements, I had some fun implementing a simple program which calculated the temperature distribution due to heat transport in an insulating layer. The finite element approach performed rather nicely, leading to good approximate results, already for a few dozen elements. However, I wanted to push the implementation a bit (the limit of infinite elements should give the exact solution). Since the set of equations was solved by a LAPACK subroutine, using 10 000 elements instead of 10 barely impacted the required time (writing the results took most of 2-3 seconds anyway). The results on the other hand were quite funny as you can see in the picture. The initial implementation, with single precision variables, breaks down even worse already at 1000 elements. Apparently the elements had become too small leading to too small variations of the properties in the stiffness-matrix, resulting in the LAPACK subroutine returning nonsense.

So it turns out that you can have too many elements in a finite elements method.

2. Small volumes: A few more digits please

Optimized volume in Equation of State fit, as function of the range of the fitting data, and step size between data-points. green diamonds, blue triangles and black discs: 1% , 0.5% and 0.25% volume steps respectively.

Optimized volume in Equation of State fit, as function of the range of the fitting data, and step size between data-points. green diamonds, blue triangles and black discs: 1% , 0.5% and 0.25% volume steps respectively.

Recently, I started working at the Wide Band Gap Materials group at the University of Hasselt. So in addition to MOFs I am also working on diamond based materials. While setting up a series of reference calculations, using scripts which already suited me well during my work on MOFs, I was trying to figure out for which volume range, and step size I would get a sufficient convergence in my Equation-of-States Fitting procedure. For the MOFs this is a computationally rather expensive (and tedious) exercise, which, fortunately, gives clear results. For the 2-atom diamond unit cell the calculations are ridiculously fast (in comparison), but the results were confusing. As you can see in the picture, the values I obtained from the different fits seem to oscillate. Checking my E(V) data showed nothing out of the ordinary. All energies and volumes were clearly distinguishable, with the energies given with a precision of 0.001 meV, and the volumes with a precision of 0.01 Å3. However, as you can see in the figure, the volume-oscillations are of the order of 0.001 Å3, ten times smaller than our input precision. Calculating the volumes based on the lattice parameters to get a precision of 10-6 Å3 for the input volumes stabilizes the convergence behavior of the fits (open symbols in the figure). This problem was not present with the MOFs since these have a unit cell volume which is one hundred times larger, so a precision of 0.01 Åmakes the relative error on the volumes one hundred times smaller than was the case for diamond.

In essence, I was trying to get more accurate output than the input I provided, which will never give sensible results (even if they actually look sensible).

3. Many grains of sand really start to pile up after a while

The last one is a bit embarrassing as it lead to a bug in the HIVE-toolbox, which is fixed in the mean time.

One of the HIVE-toolbox users informed me that the dosgrabber routine had crashed because it could not find the Fermi-level in the output of a VASP calculation. Although VASP itself gives a value for the Fermi-level, I do not use it in the above sub-program, since this value tend to be incorrect for spin-polarized systems with different minority and majority spins. However, in an attempt to be smart (and efficient) I ended up in trouble. The basic idea behind my Fermi-level search is just running through the entire Density of States-spectrum until you have counted for all the electrons in the system. Because the VASP estimate for the Fermi-level is not that far of, you do not need to run through the entire list of several thousand entries, but you could just take a subset-centered around the estimated Fermi-level and check in that subset, speeding this up by a factor of 10 to 100. Unfortunately I calculated the energy step size between density of states entries as the difference between the first two entries, which are given to with an accuracy of 0.001 eV. I guess you already have a feeling what will be the problem. When the index of the estimated Fermi-level is 1000, the error will be of the order of 1 eV, which is much larger than the range I took into account. Fortunately, the problem is easily solved by calculating the energy step size as the difference between the first and last index, and divide by the number of steps, making the error in the particular case more than a thousand times smaller.

So, trying to be smart, you always need to make sure you really are being smart, and remember that small number can become very big when there are a lot of them. 

Review of 2015

With 2015 having past on moving quickly toward oblivion, and 2016 freshly knocking at our door, it is time to look back and contemplate what we have done over the course of the previous year.

Publications: +5

 

Journal covers:+1Cover image CrystEngComm 2015 Vol 17 Issue 45

 

Completed refereeing tasks: +11

  • ACS Catalysis
  • Frontiers in Physics (2x)
  • Journal of Physics: Condensed Matter
  • Proceedings for 39th International Conference & Exposition on Advanced Ceramics & Composites
  • Applied Physics Letters (2x)
  • Materials Science in Semiconductor Processing
  • Journal of Superconductivity and Novel Magnetism (2x)
  • Surface Science

 

Conferences: +3 (Attended) & + 1 (Organized)

 

Master-students: +1

  • Arthur De Vos : Combined theoretical-experimental study of chromium doped zinc gallate phosphor

 

Jury member of PhD-thesis committee: +1

  • Ir. Yuanyuan Guan
    Title: Development of a method to determine the solubility ranges of intermetallic compounds in metal-metal connections
    PhD candidate at KU Leuven with Prof. Dr. Ir. Nele Moelans
    Department of Materials Engineering

Current size of HIVE:

  • 44K lines of program (code: 71 %)
  • 64 files
  • 40 (command line) options

Hive-STM program:

 

And now, upward and onward, a new year, a fresh start.

HIVE-STM: A simple post-processing tool for simulating STM

While I was working on my PhD-thesis on Pt nanowires at the university of Twente, one of the things I needed was a method for simulating scanning-tunneling microscopy (STM) images in a quick and easy way. This was because the main experimental information on on these nanowires was contained in STM-images.

Because I love programming, I ended up writing a Delphi-program for this task. Delphi, being an Object Oriented version of the Pascal-programming language containing a Visual Components Library, was ideally suited for writing an easy to use program with a graphical user interface (GUI). The resulting STM-program was specifically designed for my personal needs and the system I was working on at that time.

In August 2008, I was contacted by two German PhD students, with the request if it would be possible for them to use my STM program. In October, an American post-doc and a South-Korean graduate student followed with similar requests, from which point onward I started getting more and more requests from researchers from all over the world. Now, seven years later, I decided to put all “HIVE-users” in a small data-base just to keep track of their number and their affiliation. I already knew I send the program to quite a lot of people, but I was still amazed to discover that it were 225 people from 34 countries.

Hive Requests December 2015

Bar-graph showing the evolution in requests for the HIVE-STM program.

There is a slow but steady increase in requests over the years, with currently on average about one request every week. It is also funny to see there was a slight setback in requests both times I started in a new research-group. For 2015, the data is incomplete, as it does not include all requests of the month December. Another way to distribute the requests is by the month of the year. This is a very interesting graph, since it clearly shows the start of the academic year (October). There are two clear minima (March and September), for which the later is probably related due to the fact that it is the last month of before the start of the academic year (much preparation for new courses) and, in case of the solid state community, this month is also filled with conferences. The reason why there is a minimum in March, however, escapes me ( 💡 all suggestions are welcome 💡 ).

Hive requests per month.

Distribution of requests for the HIVE-STM program on a monthly basis.

The geographic distribution of affiliations of those requesting the STM-program shows Europe, Azia and America to take roughly equal shares, while African affiliations are missing entirety. Hopefully this will change after the workshop on visualization and analysis of VASP outputs delivered at the Center for High Performance Computing‘s 9th National Meeting in South Africa by Dr. David Carballal. By far the most requests come from the USA (57), followed by China(23) and then Germany(15). South-Korea(14) unexpectedly takes the fourth place, while the fifth place is a tie between the UK, Spain and India(12 each).

Hive requests demographics 2015

Distribution of Hive requests per country and continent.

All in all, the STM program seems to be of interest to many more researchers than I would have ever expected, and has currently been cited about 25 times, so it is time to add a page listing these papers as examples of what can be done with HIVE(which has in the mean time been done, check out useful link n°2).

Happy Hiving to all of you, and thank you for your trust.

 

Useful link:
[1] More information on the HIVE-STM program and how to acquire it.

[2] List of publications using and citing the HIVE-STM program.

HIVE reaches 40K lines

HIVE 3.x BannerCode statistics for HiveWith the inclusion of the phonon-module and some minor fixes and extensions to existing code, the hive 3.x program now counts over 40.000 lines, spread over 60 files. 8% of all lines are blank lines, 71% of all lines contain code, and 21% contain comments, an indication of the extent of the documentation of the code.

The code now provides access to 33 command line options of varying complexity. The simplest options (extracting a geometry, or creating a cif-file) are nearly instantaneous, while more complex options (such as Hirshfeld-I calculations) can take up to several hours.

Also from this point onward, HIVE will require to be linked with a lapack-library during compilation, to allow for the efficient solution of eigenvalue-problems.

Time for a little celebration.