Tag: tools

Gnuplot animated gifs: Visualizing Machine-Learning models

One of the most important aspects in machine-learning—in addition to the modeling itself—is undoubtedly visualization. This can be of either the data set itself or the resulting model. When dealing with small or sparse data sets and a limited number of features, visualization can be extremely helpful to get a feel for your model and data. In this tutorial, we show how you can use gnuplot to generate interesting animations of your data, such as the example above.

What do you need?

  • Install Gnuplot  version 5.2.8 (or higher) for your OS (under windows you can also install it under your Cygwin installation)
  • A data set as a simple multi-column text-file data.dat .
  • A similar text-file, model.dat, with your model calculated on a grid .

1. Starting simple: a static image

The main difference between an animation and a static image is the fact the former is just a series of such static images shown one after the other.

1.a. Basic image

Gnuplot allows both interactive and scripted command-line usage. The commands used in interactive mode can simply be placed in a text-file (e.g., myplot.gpl) and run using  the command:

> gnuplot myplot.gpl

Comments can be added in such a file by preceding them with a single “#“. In the examples below, I’m using “###” as a personal choice. It shows clearly the location of the comments, and also gives me an easy way to distinguish with script lines I commented out for testing purposes, in which case I use a single #. In the following I also indicate gnuplot commands in red, while options are indicated in turquise. Let u start by plotting the data set in a simple png:

### Set the output to a png file
set termopt enhanced
set terminal pngcairo size 300,300 font "Helvetica-Bold,6"
### The file to write to
set output 'modelplot_v1.png'
### The Title label
set title 'ML model tutor' font "Helvetica-Bold,10"

splot "data.dat" u 1:2:3
model plot v1

model plot v1

With this we set the output to be a png image of 300×300 pixels. (Note: pngcairo also provides png-functionality using the cairo-library. For more complex plotting, it gives much nicer images.) The default font for text is set to “Helvetical-Bold” with a font-size of 6pt. The enhanced option further allows us to use LaTeX type strings, for example indicating subscripts as A_n to print An. The resulting image is stored as ‘modelplot_v1.png‘.

The last two commands are used to create the actual plot. With set title a title can be added to the graph. The default font is replaced in this case by a slightly larger version of 10pt. The splot command  allows you to plot 3D surfaces using the same basic information as the gnuplot plot command for 2D plots. In this case, I used the first 3 columns of the data.dat file to plot 3D data, with the x:y:z giving the respective column numbers. The result is shown on the right.

NOTE: An important point to consider is the fact that the font size is absolute. So if you decide later-on to change your image size to say 500×500 pixels, your text labels may look rather small, and you will have to tweak the font-size to compensate of this behavior. Therefore, it is important to make sure you start with the right image size straight away. The 300×300 pixels used in this tutorial are too small for any scientific quality image, it was chosen to be a suitable image size to incorporate in this blog.

1.b. Pimp the axis

With the basics for the graph set up, we can start setting up the graph to our liking.

###settings for the boxplot
set xlabel "M_{n polyX} (g/mol)" offset 0,-1,0 font "Helvetica-Bold, 8" rotate parallel
set xrange[1000:10000] noreverse writeback
set xtics 2000,2000,10000 out scale 1.0 nomirror offset 0,-0.5,0

set ylabel "Graft (%)" offset 2,0,0 font "Helvetica-Bold, 8" rotate parallel
set yrange[0:30] noreverse writeback
set ytics 0,10,30 out scale 1.0 nomirror offset 0,-0.5,0

set zlabel "Particle size (nm)" offset 1,1,0 font "Helvetica-Bold, 8" rotate parallel
set zrange[0:300] noreverse writeback
set ztics 0,50,300 out scale 1.0 nomirror offset 0,-0.5,0

set xyplane at 0
set border lw 3

 

The label of each of the three axes can be modified individually using set {x/y/z}label followed by the same options available to any other string (such as the graph title earlier). Here you can see how the enhanced mode allows the use of a subscript using standard LaTeX formatting. The offset makes sure the axis-label does not overlap with the tick-labels. Gnuplot also allows you to define the range to be plotted using set {x/y/z}range[min:max], while set {x/y/z}tics gives you access to the specifics of the individual tics. The latter can be very useful to manually add specific tics, or, as in the current case, manually set the splitting between the different tics. The out option places the tic-marks at the outside of the graph, and their size is set by the scale option.

The command set xyplane can be used to set the intercept of the xy-plane and the z-axis, and set border gives access to the axis-line properties. Here I have set the line-width (lw) to 3.

1.c. Add the (machine-learning) model

Now that the basics properties of the 3D graph are alright*, let us add the model to the plot. This can easily be done by just adding additional input for the splot command.

splot \
     "data.dat" using 1:2:3 with points pointtype 7 pointsize 1 linecolor rgb "brown4" notitle, \
     "model.dat" u 1:2:3 w line lc rgb "sea-green" notitle

model plot v3

The “\” can be used to split the command to multiple lines. In this case, each curve/surface/data set is set on a separate line. Since the command for a single plot can become very long, gnuplot also has a shorthand for most common keywords/options it uses.  For the data the extended keywords are shown, and the shorthand is used for the model. The length of the command becomes significantly shorter, but at the same time harder to read. (Note that both shorthand as longhand keywords can be mixed in a single command.)

The data set is now being shown as points, using the 7th pointtype (which are discs). The size of these symbols is set to 1 and the linecolor is a predefined color used by gnuplot.  Finally, the notitle option removes the legend entry of this curve. The model data is presented as a line-surface. The end result is shown on the right.

1.d. A better surface-plot: Multi-plot

model plot v4

As you can see in the previous version of the plot, the model data is plotted as a surface, but this is not a very nice surface. This is because gnuplot just connected the sequential points in the file as a single very long and very complex curve. If you would rotate this plot, it would become clear, several things are very wrong. Luckily there is a very simple solution. Gnuplot has the ability to transform a point-cloud into a surface. This is done by setting a 3D grid using set dgrid3d X,Y. This creates a 3D surface for which the nodes are interpolated between the points of your point-cloud. When you set this option, it is applied on all data curves you plot (i.e., including the set of data-points, which we would like to avoid.). Using the multiplot option of gnuplot the two curves can be drawn separately, using different settings. In the script the splot command is replaced by:

### switch to a multiplot
set multiplot
set dgrid3d 26, 26 
splot \
     "model.dat" u 1:2:3 w line lc rgb "sea-green" notitle

unset dgrid3d
splot \
     "data.dat" using 1:2:3 with points pointtype 7 pointsize 1 linecolor rgb "brown4" notitle

unset multiplot

By setting the multiplot environment, we can unset dgrid3d before drawing the second data set. At the end of script we also unset multiplot to switch of the multiplot environment. At this point it become interesting to see the impact of the terminal pngcairo over png.

1.e. Surface coloring

Drawing a surface is nice, but you can also give it some color. Either by using the z-value as a color scale, or by using another metric/feature to color the surface.

###settings for the color scale
set colorbox vertical
set cblabel "Colormap\n (RGB)" font "Helvetica-Bold, 8" offset -6.75,8 rotate by 0
set pm3d at s explicit

splot \
      "model.dat" u 1:2:3 w pm3d notitle

model plot v5

Surface coloring is switched on via the command set pm3d which is set at the surface, and is used in splot at the with option. In addition to surface coloring also a color scale is added, with the label formatted using the same options as for other labels. To get the label above the color scale it needs to be shifted using the offset and the rotate option.

The result is rather fancy, but for practical purposes, the surface may actually block the view of the data points. This can be avoided by projecting the color on the xy-plane and retaining a grid representation of the surface. This is done by setting pm3d at the bottom.

model plot v6

 

 

In addition, we  also need to plot the model surface twice, once to generate the color map and once to generate the model surface as a grid-image.

set pm3d at b explicit

splot \
    "model.dat" u 1:2:3 w pm3d notitle,\
    "model.dat" u 1:2:3 w line lc rgb "sea-green" notitle

 

 

2. Creating an animation

With gnuplot it is quite easy to generate stunning 3D animated gif images. Some nice examples can be found all over the web, such as this animated Bessel function, my own (very old) molecular d-and f-orbitals, or this collection. Once you finish creating a script to generate a single image, creating an animation requires only some minor modifications. First of all we need to select the correct terminal (i.e., gif instead of png)

set terminal gif transparent animate nooptimize delay 10 size 300,300 font "Helvetica-Bold,10"
set output 'modelplot_v7.gif'

This generates a transparent animated gif with a delay of 10 ms between frames, and stores it in a gif image. In addition, a change in time/image frame needs to be implemented. This can easily be done by a simple for loop, which is wrapped around the plotting section.

n=60
do for [i=1:n]{
    set view 60, i*360/n

    ### do the multiplot plotting section 
    set multiplot 
    ...all the other plotting stuff of before
    unset multiplot
}
set output

model plot v7: animated

In the example, the 3D graph is rotated. This is done by changing the view via the set view command which takes two angles in degrees. As you can see from “i*360/n“, gnuplot also accepts simple mathematical equations.

Once the loop is finished we need to close our gif animation. This is done via (a side-effect of) the command set output. The set output {filename} command sets the output to a file with name filename, or if a filename is omitted to the standard output. As a side-effect it closes the current output file, c.q. our animated gif.

An alternative method for creating an animation would be creating a series of images (in the image-format of your choice, e.g. pngcairo and then create an apng) and combining them yourself or via additional scripting into an animated image format using additional software, such as is done here.

3. Animated surfaces and coloring 

The example above is rather trivial in regard to animations. The ability to perform math inside a gnuplot script provides you the ability to make things a lot more interesting. In the following, we are going to construct a small imaginary solar system, to present some of the things which are possible.

The basic script for the solar system above can be downloaded here.

set termopt enhanced
set terminal gif animate nooptimize delay 10 size 300,300 font "Helvetica,10"
set output 'Solarplot_v1.gif'
set title 'Magic Solar System' font "Helvetica-Bold,10"

maxl=10
set xrange[-maxl:maxl] noreverse writeback
set yrange[-maxl:maxl] noreverse writeback
set zrange[-maxl:maxl] noreverse writeback
set xyplane at 0
set border lw 1.5

###Use parametric coordinated for plotting spheres
set parametric # enable parametric mode with angles (u,v)
set urange [0:2*pi]
set vrange [-pi/2.0:pi/2.0]
set isosample 360,180
fx(u,v)=cos(u)*cos(v)
fy(u,v)=sin(u)*cos(v)
fz(v)=sin(v)

### Surface coloring
set colorbox vertical
set cblabel "Planet\n colors" font "Helvetia, 8" offset -4.75,5 rotate by 0
set pm3d depthorder base nohidden3d
unset hidden3d

### The animated drawing
n = 60
do for [i=1:n]{
    # The star
    x=0.0
    y=0.0
    z=0.0
    r=3.0

    # The first planet
    x1=5.0*sin(i*2*pi/n)
    y1=5.0*cos(i*2*pi/n)
    z1=0.0
    r1=0.50
    color1(u,v)=0.5

    # The second planet
    x2=6.5*sin(i*2*pi/n)
    y2=7.0*cos(i*2*pi/n)
    z2=1.0*cos(i*2*pi/n)
    r2=1.0
    color2(u,v)=sin(v)*2*pi

    splot \
         "++" using (x+r*fx(u,v)):(y+r*fy(u,v)):(z+r*fz(v)):(6.5) w pm3d notitle,\
         "++" using (x1+r1*fx(u,v)):(y1+r1*fy(u,v)):(z1+r1*fz(v)):(color1(u,v)) w pm3d notitle,\
         "++" using (x2+r2*fx(u,v)):(y2+r2*fy(u,v)):(z2+r2*fz(v)):(color2(u,v)) w pm3d notitle 
}
set output

Most of the commands and options have already been covered above. To draw our spherical planets, we introduce a set of parametric coordinates (u,v) via the command set parametric. Next we set their ranges, just as you would do for the x,y, and z coordinates via set {u|v}range. The command set isosample is used to define the grid over the parametric space.  With this setup, you can now define any parametric surface you want. In our case, we want to have a sphere. For this we define three transformation functions. With u and v representing the θ and φ angles of a sphere, the transformation to Cartesian coordinates is given by the functions fx(u,v),fy(u,v), and fz(v).

In the main loop of the gif we define the center position (x,y,z) and the radius (r) of our planets and star, with the latter nicely fixed at the origin, and our planets having an orbit around it. To have a nice periodic gif, you should make sure that any periodic behavior ends up where it started, hence the 2pi factor in the sines and cosines.  Everything is drawn with a single splot command where we use a pseudo 4-column input style:

(x-coord):(y-coord):(z-coord):(color-coord)

Note that the brackets ‘(‘ & ‘)‘ are important to include as gnuplot will throw errors otherwise. The selected color, can be either a real value in the color scale or a function. The resulting solar system is shown below on the left.

Solar v1, no depth

Solar v1, depth

Not that bad for a first attempt. There is however a small snag: the 3D effect is somehow off when the planets move behind the star. This is due to the depth-buffering. The newest versions of gnuplot (≥5.2.8) provide the option depthorder for the set pm3d command. Using the value base for the depthorder option results in the depthorder to be decided based on the z-projected position of the object. This is sufficient to fix our little solar system, as you can see on the right.

3.a. Some cleanup work: removing the box and complex coloring 

As we are creating an imaginary (magical) solar system, we should maybe get rid of the x-,y-, and z-axis. This is done via the commands unset border to get rid of the axis-bars and unset {x|y|z}tics to remove the tic-marks and-labels.

unset border
unset xtics
unset ytics
unset ztics
set palette defined (0 "red", 1 "yellow",1 "brown", 2 "brown4",3 "dark-green", 4 "blue", 5 "white")
set cbrange[0:10]
#unset cbtics

Solar v2

And although the color palette provided is nice, if we want different color schemes on each of the planets, we quickly run into a small problem: you can only have 1 color palette per splot. In case of a static image, you might be able to get around this problem by using a multiplot (as before) and have overlapping splots with each their own palette. But…in that case you will also be responsible for getting the 3D order of your objects correct yourself. And although this may be doable for a single frame, in case of an animated 3D solar system this will be a hassle nearly impossible to overcome**. For this you need to tackle the problem in a different way: create your own color palette consisting of sub-palettes. This can be done via the command set palette defined. The pairs give the color at the endpoints of gradient ranges, with the overall range (here 0-5) representing the entire color scale. The intermediate points are placed equidistant, so for a color range from 0:10 the red-to-yellow gradient is linked to color values in the range 0:2, while the blue-to-white gradient is linked to the color values in the range 8:10. Applying this to our solar system we can give nice individual color palettes to each of the objects. I changed the size and position of the three objects a bit, and as you can see, the outer planet moves outside of the x/y/z-ranges. Now that we know how to add different color palettes to each of our planets, we can also remove the color-bar on the right using the command unset colorbox, remove the tics via unset cbtics, and remove the label via unset cblabel.

3.b. time-dependent colors 

We have motion of objects and different color palettes, what about changing the colors during the animation? As we saw earlier, the color component can be defined as a function, which means we can make this time-dependent as well. Let’s imagine that our outer planet is traveling on a rather elliptic orbit, making it heat up when it approaches our star.

# The first planet
    color1(u,v)=0.5*(cos(u)**5+sin(v)**3)+sin(i*6*pi/n) +8.0 
	
# The second planet
    x2=8+15*sin((i+20)*2*pi/n)
    y2=6.0*cos((i+20)*2*pi/n)
    z2=3.0*cos((i+20)*2*pi/n)
    color2(u,v)=0.99*sin((i+20)*2*pi/n+pi)+1

By making the color dependent on the frame number, the (uniform) coloring of our second planet will now cycle through the red-yellow gradient. The first planet experiences a variation at 3x the speed but have a non-uniform surface coloring.

Solar v3

3.c. time-dependent colors and shapes 

Once you have time dependent coloring, and time dependent motion, you can also have time dependent shapes and combine all three. This is all possible within the same basic framework set up above. For example, we can make our star a bit more active, letting it bulge and swirl. Adding another planet and some moons the magic solar system below is created by this gnuplot script.

Solar v4

4. Conclusion 

Gnuplot provides a versatile tool for creating animated gifs of your machine learning data and models, or anything else you could imagine. It has an extensive number of options which allow you to tweak each single property of your graph. The ability to perform simple arithmetic within a gnuplot-script further increases the potential.

 


* Ignore the rather crummy quality of the embedded images. This is an artifact of only having a 300×300 pixel image, the animation at the top of the page has an 1000×1000 pixel resolution and shows a much better quality.

** Of course with enough persistence you may find a way to  get it done…but there are less sadomasochistic ways of doing this 😉

Partitioning the vibrational spectrum: Fingerprinting defects in solids

Authors:  Danny E. P. Vanpoucke
Journal: Computational Materials Science 181, 109736 (2020)
doi: 10.1016/j.commatsci.2020.109736
IF(2018): 2.644
export: bibtex
pdf: <ComputMaterSci>   (Open Access)
github: <Hive-toolbox>

 

Graphical abstract Computational Materials Science 181, 109736 (2020)
Graphical Abstract: Finger printing defects in diamond through the creation of the vibrational spectrum of a defect.

Abstract

Vibrational spectroscopy techniques are some of the most-used tools for materials
characterization. Their simulation is therefore of significant interest, but commonly
performed using low cost approximate computational methods, such as force-fields.
Highly accurate quantum-mechanical methods, on the other hand are generally only used
in the context of molecules or small unit cell solids. For extended solid systems,
such as defects, the computational cost of plane wave based quantum mechanical simulations
remains prohibitive for routine calculations. In this work, we present a computational scheme
for isolating the vibrational spectrum of a defect in a solid. By quantifying the defect character
of the atom-projected vibrational spectra, the contributing atoms are identified and the strength
of their contribution determined. This method could be used to systematically improve phonon
fragment calculations. More interestingly, using the atom-projected vibrational spectra of the
defect atoms directly, it is possible to obtain a well-converged defect spectrum at lower
computational cost, which also incorporates the host-lattice interactions. Using diamond as
the host material, four point-defect test cases, each presenting a distinctly different
vibrational behaviour, are considered: a heavy substitutional dopant (Eu), two intrinsic
point-defects (neutral vacancy and split interstitial), and the negatively charged N-vacancy
center. The heavy dopant and split interstitial present localized modes at low and high
frequencies, respectively, showing little overlap with the host spectrum. In contrast, the
neutral vacancy and the N-vacancy center show a broad contribution to the upper spectral range
of the host spectrum, making them challenging to extract. Independent of the vibrational behaviour,
the main atoms contributing to the defect spectrum can be clearly identified. Recombination of
their atom-projected spectra results in the isolated spectrum of the point-defect.

Defensive Programming and Debugging

Last few months, I finally was able to remove something which had been lingering on my to-do list for a very long time: studying debugging in Fortran. Although I have been programming in Fortran for over a decade, and getting quite good at it, especially in the more exotic aspects such as OO-programming, I never got around to learning how to use decent debugging tools. The fact that I am using Fortran was the main contributing factor. Unlike other languages, everything you want to do in Fortran beyond number-crunching in procedural code has very little documentation (e.g., easy dll’s for objects), is not natively supported (e.g., find a good IDE for fortran, which also supports modern aspects like OO, there are only very few who attempt), or you are just the first to try it (e.g., fortran programs for android :o, definitely on my to-do list). In a long bygone past I did some debugging in Delphi (for my STM program) as the debugger was nicely integrated in the IDE. However, for Fortran I started programming without an IDE and as such did my initial debugging with well placed write statements. And I am a bit ashamed to say, I’m still doing it this way, because it can be rather efficient for a large code spread over dozens of files with hundreds of procedures.

However, I am trying to repent for my sins. A central point in this penance was enlisting for the online MOOC  “Defensive Programming and Debugging“. Five weeks of intense study followed, in which I was forced to use command line gdb and valgrind. During these five weeks I also sharpened my skills at identifying possible sources of bugs (and found some unintentional bugs in the course…but that is just me). Five weeks of hard study, and taking tests, I successfully finished the course, earning my certificate as defensive programmer and debugger. (In contrast to my sometimes offensive programming and debugging skills before 😉 .)

Merry Christmas & Happy New Year

Having fun with my xmgrace-fortran library and fractal code!

A Spectre and Meltdown victim: VASP

Over the last weekend, two serious cyber security issues were hot news: Meltdown and Spectre [more links, and links](not to be mistaken for a title of a bond-movie). As a result, also academic HPC centers went into overdrive installing patches as fast as possible. The news of the two security issues went hand-in-hand with quite a few belittling comments toward the chip-designers ignoring the fact that no-one (including those complaining now) discovered the problem for over decade. Of course there was also the usual scare-mongering (cyber-criminals will hack our devices by next Monday, because hacks using these bugs are now immediately becoming their default tools etc.) typical since the beginning of the  21st century…but now it is time to return back to reality.

One of the big users on scientific HPC installations is the VASP program(an example), aimed at the quantum mechanical simulation of materials, and a program central to my own work. Due to an serendipitous coincidence of a annoyingly hard to converge job I had the opportunity to see the impact of the Meltdown and Spectre patches on the performance of VASP: 16% performance loss (within the range of the expected 10-50% performance loss for high performance applications [1][2][3]).

The case:

  • large HSE06 calculation of a 71 atom defective ZnO supercell.
  • 14 irreducible k-points (no reduction of the Hartree-Fock k-points)
  • 14 nodes of 24 cores, with KPAR=14, and NPAR=1 (I know NPAR=24 is the recommended option)

The calculation took several runs of about 10 electronic steps (of each about 5-6 h wall-time, about 2.54 years of CPU-time per run) . The relative average time is shown below (error-bars are the standard deviation of the times within a single run). As the final step takes about 50% longer it is treated separately. As you can see, the variation in time between different electronic steps is rather small (even running on a different cluster only changes the time by a few %). The impact of the Meltdown/Spectre patch gives a significant impact.

Impact of Meltdown/Spectre patch on VASP performance

Impact of Meltdown/Spectre patch on VASP performance for a 336 core MPI job.

 

The HPC team is currently looking into possible workarounds that could (partially) alleviate the problem. VASP itself is rather little I/O intensive, and a first check by the HPC team points toward MPI (the parallelisation framework required for multi-node jobs) being ‘a’ if not ‘the’ culprit. This means that also an impact on other multi-node programs is to be expected. On the bright side, finding a workaround for MPI would be beneficial for all of them as well.

So far, tests I performed with the HPC team not shown any improvements (recompiling VASP didn’t help, nor an MPI related fix). Let’s keep our fingers crossed, and hope the future brings insight and a solution.

 

Resource management on HPC infrastructures.

Computational as a third pillar of science (next to experimental and theoretical) is steadily developing in many fields of science. Even some fields you would less expect it, such as sociology or psychology. In other fields such as physics, chemistry or biology it is much more widespread, with people pushing the boundaries of what is possible. Larger facilities provide access to larger problems to tackle. If a computational physicist is asked if larger infrastructures would not become too big, he’ll just shrug and reply: “Don’t worry, we will easily fill it up, even a machine 1000x larger than that.” An example is given by a pair of physicists who recently published their atomic scale study of the HIV-1 virus. Their simulation of a model containing more than 64 million atoms used force fields, making the simulation orders of magnitude cheaper than quantum mechanical calculations. Despite this enormous speedup, their simulation of 1.2 µs out of the life of an HIV-1 virus (actually it was only the outer skin of the virus, the inside was left empty) still took about 150 days on 3880 nodes of 16 cores on the Titan super computer of Oak Ridge National Laboratory (think about 25 512 years on your own computer).

In Flanders, scientist can make use of the TIER-1 facilities provided by the Flemish Super Computer (VSC). The first Tier-1 machine was installed and hosted at Ghent University. At the end of it’s life cycle the new Tier-1 machine (Breniac) was installed and is hosted at KULeuven. Although our Tier-1 supercomputer is rather modest compared to the Oak Ridge supercomputer (The HIV-1 calculation I mentioned earlier would require 1.5 years of full time calculations on the entire machine!) it allows Flemish scientists (including myself) to do things which are not possible on personal desktops or local clusters. I have been lucky, as all my applications for calculation time were successful (granting me between 1.5 and 2.5 million hours of CPU time every year). With the installment of the new supercomputer accounting of the requested resources has become fully integrated and automated. Several commands are available which provide accounting information, of which mam-balance is the most important one, as it tells how much credits are still available. However, if you are running many calculations you may want to know how many resources you are actually asking and using in real-time. For this reason, I wrote a small bash-script that collects the number of requested and used resources for running jobs:

Output of the Bash Script.

Currently, the last part, on the completed jobs, only provides data based on the most recent jobs. Apparently the full qstat information of older jobs is erased. However, it still provides an educated guess of what you will be using for the still queued jobs.

 

Scaling of VASP 5.4.1 on TIER-1b BrENIAC

When running programs on HPC infrastructure, one of the first questions to ask yourself is: “How well does this program scale?

In applications for HPC resources, this question plays a central role, often with the additional remark: “But for your specific system!“. For some software packages this is an important remark, for other packages this remark has little relevance, as the package performs similarly for all input (or for given classes of input). The VASP package is one of the latter. For my current resource application at the Flemish TIER-1 I set out to do a more extensive scaling test of the VASP package. This is for 2 reasons. The first being the fact that I will be using a newer version of VASP: vasp 5.4.1 (I am currently using my own multiply patched version 5.3.3.). The second being the fact that I will be using a brand new TIER-1 machine (second Flemish TIER-1, as our beloved muk retired the end of 2016).

Why should I put in the effort to get access to resources on such a TIER-1 supercomputer? Because such machines are the life blood of the computational materials scientist. They are their sidekick in the quest for understanding of materials. Over the past 4 years, I was granted (and used) 20900 node-days of calculation time (i.e. over 8 Million hours of CPU time, or 916 years of calculation time) on the first TIER-1 machine.

Now back to the topic. How well does VASP 5.4.1 behave? That depends on the system at hand, and how good you choose the parallelization settings.

1. Parallelization in VASP

VASP provides several parameters which allow for straightforward parallelization of the simulation:

  • NPAR : This parameter can be set to parallelize over the electronic bands. As a consequence, the number of bands included in a calculation by VASP will be a multiple of NPAR. (Note : Hybrid calculations are an exception as they require NPAR to be set to the number of cores used per k-point.)
  • NCORE : The NCORE parameter is related to NPAR via NCORE=#cores/NPAR, so only one of these can be set.
  • KPAR : This parameter can be set to parallelize over the set of irreducible k-points used to integrate over the first Brillouin zone. KPAR should therefore best be a divisor of the number of irreducible k-points.
  • LPLANE : This boolean parameter allows one to switch on parallelization over plane waves. In general this will give rise to a small but consistent speedup (observed in previous scaling tests). As such we have chosen to set this parameter = .TRUE. for all calculations.
  • NSIM : Sets up a blocked mode for the RMM-DIIS algorithm. (cf. manual, and further tests by Peter Larsson). As our tests do not involve the RMM-DIIS algorithm, this parameter was not set.

In addition, one needs to keep the architecture of the HPC-system in mind as well: NPAR, KPAR and their product should be divisors of the number of nodes (and cores) used.

2. Results

Both NPAR and KPAR parameters can be set simultaneously and will have a serious influence on the speed of your calculations. However, not all possible combinations give the same speedup. Even worse, not all combinations are beneficial with regard to speed. This is best seen for a small 2 atom diamond primitive cell.

Timing primitive uni cell diamond.

Timing results for various combinations of the NPAR and KPAR parameters for a 2 atom primitive unit cell of diamond.

Two things are clear. First of all, switching on a parallelization parameter does not necessarily mean the calculation will speed up. In some case it may actually slow you down. Secondly, the best and worst performance is consistently obtained using the same settings. Best: KPAR = maximal, and NPAR = 1, worst: KPAR = 1 and NPAR = maximal.

This small system shows us what you can expect for systems with a lot of k-points and very few electronic bands ( actually, any real calculation on this system would only require 8 electronic bands, not 56 as was used here to be able to asses the performance of the NPAR parameter.)

In a medium sized system (20-100 atoms), the situation will be different. There, the number of k-points will be small (5-50) while the natural number of electronic bands will be large (>100). As a test-case I looked at my favorite Metal-Organic Framework: MIL-47(V).

scaling of MIL-47(V) at PBE level

Timing results for several NPAR/KPAR combinations for the MIL-47(V) system.

This system has only 12 k-points to parallelize over, and 224 electronic bands. The spread per number of nodes is more limited than for the small system. In contrast, the general trend remains the same: KPAR=high, NPAR=low, with an optimum performance when KPAR=#nodes. Going beyond standard DFT, using hybrid functionals, also retains the same picture, although in some cases about 10% performance can be gained when using one half node per k-point. Unfortunatly, as we have very few k-points to start from, this will only be an advantage if the limiting factor is the number of nodes available.

An interesting behaviour is seen when one keeps the k-points/#nodes ratio constant:

scaling at constant k-point/node ratio

Scaling behavior for either a constant number of k-points (for dense k-point grid in medium sized system) or constant k-point/#nodes ratio.

As you can see, VASP performs really well up to KPAR=#k-points (>80% efficiency). More interestingly, if the k-point/#node ratio is kept constant, the efficiency (now calculated as T1/(T2*NPAR) with T1 timing for a single node, and T2 for multiple nodes) is roughly constant. I.e. if you know the walltime for a 2-k-point/2-nodes job, you can use/expect the same for the same system but now considering 20-k-points/20-nodes (think Density of States and Band-structure calculations, or just change of #k-points due to symmetry reduction or change in k-point grid.)  😆

3. Conclusions and Guidelines

If one thing is clear from the current set of tests, it is the fact that good scaling is possible. How it is attained, however, depends greatly on the system at hand. More importantly, making a poor choice of parallelization settings can be very detrimental to the obtained speed-up and efficiency. Unfortunately when performing calculations on an HPC system, one has externally imposed limitations to work with:

  • Memory available per core
  • Number of cores per node[1]
  • Size of your system: #atoms, #k-points, & #bands

Here are some guidelines (some open doors as well):

  • Wherever possible k-point parallelization should be driven to a maximum (KPAR as large as possible). The limiting factor here is the actual number of k-points and the amount of memory available. The latter due to the fact that a higher value of KPAR leads to a higher memory requirement.[2]
  • Use the Γ-version of VASP for Γ-point only calculations. It reduces memory usage significantly (3.7Gb→ 2.8Gb/core for a 512 atom diamond system) and increases the computational efficiency, sometimes even by a factor of 2.
  • NPAR parallelization can be used to reduce the memory load for high KPAR calculations, but increasing KPAR will always outperform the same increase of NPAR.
  • In case only NPAR parallelization is available, due to too few k-points, and working with large systems, NPAR parallelization is your last resort, and will perform reasonably well, up to a point.
  • Electronic steps show very consistent timing, so scaling tests can be performed with only a 5-10 electronic steps, with standard deviations comparable in absolute size for PBE and HSE06.

 

In short:

K-point parallelism will save you, wherever possible !

 

[1] 28 is a lousy number in that regard as its prime-decomposition is 2x2x7, leaving little overlap with prime-decompositions of the number of k-points, which more often than you wish end up being prime numbers themselves 😥
[2] The small system’s memory requirements varied from 0.15 to 1.09 Gb/core for the different combinations.

One more digit of importance

Over the past few weeks I have bumped into several issues each tracing back to numerical accuracy. Although I have been  programming for almost two decades I never had to worry much about this, making these events seem as-if the universe is trying to tell me something.

Now, let me try to give a proper start to this story; Computational (materials) research is generally perceived as a subset of theoretical (materials) research, and it is true that such a case can be made. It is, however, also true that such thinking can trap us (i.e. the average computational physicist/chemist/mathematician/… programming his/her own code) with numerical accuracy problems. While theoretical equations use exact values for numbers, a computer program is limited by the numerical precision of the variables (e.g. single, double or quadruple precision for real numbers) used in the program. This means that actual numbers with a larger precision are truncated or rounded to the precision of the variable (e.g. 1/3 becomes 0.3333333 instead of 0.333… with an infinite series of 3’s). Most of the time, this is sufficient, and nothing strange will happen. Even more, most of the time, the additional digits would only increase the computational cost while not improving the results in a significant fashion.

Interstellar disc

To understand the importance, or the lack thereof, of additional significant digits, let us first have a look at the precision of \pi and the circumference and surface area of a disc. We will be looking at a rather large disc, one with a radius equal to the distance between the sun, and the nearest star, Alpha Centauri, which is 39 900 000 000 000 km away. The circumference of this disc is given by  2r\pi (or  2.5 \times 10^{14} km ). As a single precision variable \pi will have about 7-8 significant digits. This means the calculated circumference will have an accuracy of about 1 000 000 km (or a few times the distance between the earth and the moon). Using a double precision \pi variable, which has a precision of 16 decimal digits, the circumference will be accurately calculated to within a few meters. At quadrupal precision, the \pi variable would have 34 significant decimal digits, and we would even be able to calculate the surface of the disc ( r^2\pi or  5.0 \times 10^{33} m² ) to within 1 m². Even the surface of a disc the size of our milky way could be calculated with an accuracy of a few hundred square km (or ± the size of Belgium ).

Knowing this, our mind is quickly put at easy regarding possible issues regarding numerical accuracy. However, once in a while we run into one exceptional case (or three, in my case).

1. Infinitesimal finite elements

Temperature profile in the insulating layer of a cylindrical wire.

Temperature profile in the insulating layer of a cylindrical wire.

While looking into the theory behind finite elements, I had some fun implementing a simple program which calculated the temperature distribution due to heat transport in an insulating layer. The finite element approach performed rather nicely, leading to good approximate results, already for a few dozen elements. However, I wanted to push the implementation a bit (the limit of infinite elements should give the exact solution). Since the set of equations was solved by a LAPACK subroutine, using 10 000 elements instead of 10 barely impacted the required time (writing the results took most of 2-3 seconds anyway). The results on the other hand were quite funny as you can see in the picture. The initial implementation, with single precision variables, breaks down even worse already at 1000 elements. Apparently the elements had become too small leading to too small variations of the properties in the stiffness-matrix, resulting in the LAPACK subroutine returning nonsense.

So it turns out that you can have too many elements in a finite elements method.

2. Small volumes: A few more digits please

Optimized volume in Equation of State fit, as function of the range of the fitting data, and step size between data-points. green diamonds, blue triangles and black discs: 1% , 0.5% and 0.25% volume steps respectively.

Optimized volume in Equation of State fit, as function of the range of the fitting data, and step size between data-points. green diamonds, blue triangles and black discs: 1% , 0.5% and 0.25% volume steps respectively.

Recently, I started working at the Wide Band Gap Materials group at the University of Hasselt. So in addition to MOFs I am also working on diamond based materials. While setting up a series of reference calculations, using scripts which already suited me well during my work on MOFs, I was trying to figure out for which volume range, and step size I would get a sufficient convergence in my Equation-of-States Fitting procedure. For the MOFs this is a computationally rather expensive (and tedious) exercise, which, fortunately, gives clear results. For the 2-atom diamond unit cell the calculations are ridiculously fast (in comparison), but the results were confusing. As you can see in the picture, the values I obtained from the different fits seem to oscillate. Checking my E(V) data showed nothing out of the ordinary. All energies and volumes were clearly distinguishable, with the energies given with a precision of 0.001 meV, and the volumes with a precision of 0.01 Å3. However, as you can see in the figure, the volume-oscillations are of the order of 0.001 Å3, ten times smaller than our input precision. Calculating the volumes based on the lattice parameters to get a precision of 10-6 Å3 for the input volumes stabilizes the convergence behavior of the fits (open symbols in the figure). This problem was not present with the MOFs since these have a unit cell volume which is one hundred times larger, so a precision of 0.01 Åmakes the relative error on the volumes one hundred times smaller than was the case for diamond.

In essence, I was trying to get more accurate output than the input I provided, which will never give sensible results (even if they actually look sensible).

3. Many grains of sand really start to pile up after a while

The last one is a bit embarrassing as it lead to a bug in the HIVE-toolbox, which is fixed in the mean time.

One of the HIVE-toolbox users informed me that the dosgrabber routine had crashed because it could not find the Fermi-level in the output of a VASP calculation. Although VASP itself gives a value for the Fermi-level, I do not use it in the above sub-program, since this value tend to be incorrect for spin-polarized systems with different minority and majority spins. However, in an attempt to be smart (and efficient) I ended up in trouble. The basic idea behind my Fermi-level search is just running through the entire Density of States-spectrum until you have counted for all the electrons in the system. Because the VASP estimate for the Fermi-level is not that far of, you do not need to run through the entire list of several thousand entries, but you could just take a subset-centered around the estimated Fermi-level and check in that subset, speeding this up by a factor of 10 to 100. Unfortunately I calculated the energy step size between density of states entries as the difference between the first two entries, which are given to with an accuracy of 0.001 eV. I guess you already have a feeling what will be the problem. When the index of the estimated Fermi-level is 1000, the error will be of the order of 1 eV, which is much larger than the range I took into account. Fortunately, the problem is easily solved by calculating the energy step size as the difference between the first and last index, and divide by the number of steps, making the error in the particular case more than a thousand times smaller.

So, trying to be smart, you always need to make sure you really are being smart, and remember that small number can become very big when there are a lot of them. 

Animated Primes

Optimus Prime

Optimus Prime, the most prime of all autobots.[source]

2, 3, 5, 7, 11, 13, 17, 19, 23, …and on toward infinity. Prime numbers are a part of mathematics which seduce most of us at some point in our lives to try and “solve them”. There is structure in them, but too little to be easily understandable, and too much to let go and consider them chaos.

Recently, my girlfriend got sucked into this madness. Luckily she is not trying to generate the next largest prime (which would need to have more than 22 Million digits, although I might have jinxed it now). Instead she is looking for order into chaos; leading to pen and paper graphs, evolving into excel worksheets being abused as ordinary bitmap pictures and finally mathematica applications for even more heavy lifting.

These last exercises came to a somewhat grinding halt when mathematica refused to generate a gif-animation (it just failed to do the job without providing any warning that something might have gone awry). However, because it could generate the separate frames without any issues, I was tempted to dig up an old fortran module which allowed me to generate gif images and animations. I remembered correctly that it was able to both read and write gif-images, and so, after 10 minutes of writing a small program (9 minutes remembering how things worked) and another 10 linking in all dependencies and fixing some inconsistencies, I had a working program for generating a gif animation from separate gif images.

Animated gif, generated in fortran.

The initial test went smoothly. Using some simple gif images I created in paint, the resulting 4 frame animation is the one on the left (which is probably giving you a headache as you read this ;-).

The real test, with the single frame gifs Sylvia had prepared, was rather interesting: It failed miserably. Fortunately it also taught us the origin of the problem: mathematica was generating frames of different dimensions. And although this does not give rise to problems in the workbook or generated avi, it does kill an animated gif. After fixing this, also mathematica successfully generated the animated gif Sylvia wanted to see. Also my own small program was now able to perform the intended trick, although it showed that a different color-map was created for each frame by mathematica, something my own implementation did not handle well (we’ll fix that somewhere in the future).

The executable (including prerequisite dll’s) can be downloaded here.

If you do not suffer from epileptic seizures (or not yet), you could check out the results in the spoiler below.

Spoiler Inside: gif-animations SelectShow

Virtual Winterschool 2016: Computational Solid State Physics & Chemistry

In just an hour, I’ll be presenting my talk at the virtual winterschool 2016. In an attempt to tempt fate as much as possible I will try to give/run real-time examples on our HPC in Gent, however at this moment no nodes are available yet to do so. Let’s keep our fingers crossed and see if it all works out.

Abstract

Modern materials research has evolved to the point where it is now common practice to manipulate materials at nanometer scale or even at the atomic scale (e.g. Intel’s skylake architecture with 14nm features, atomic layer deposition and surface structure manipulations with an STM-tip). At these scales, quantum mechanical effects become ever more relevant, making their prediction important for the field of materials science.

In this session, we will discuss how advanced quantum mechanical calculations can be performed for solids and indicate some differences with standard quantum chemical approaches. We will touch upon the relevant concepts for performing such calculations (plane-wave basis-sets, pseudo-potentials, periodic boundary conditions,…) and show how the basic calculations are performed with the VASP-code. You will familiarize yourself with the required input files and we will discuss several of the most important output-files and the data they contain.

At the end of this session you should be able to set up a single-point calculation, a structure optimization, a density of states and band structure calculation.

Additional Files/Info