Danny Vanpoucke

Most commented posts

  1. Phonons: shake those atoms — 3 comments
  2. Start to Fortran — 1 comment

Author's posts

Gnuplot animated gifs: Visualizing Machine-Learning models

One of the most important aspects in machine-learning—in addition to the modeling itself—is undoubtedly visualization. This can be of either the data set itself or the resulting model. When dealing with small or sparse data sets and a limited number of features, visualization can be extremely helpful to get a feel for your model and data. In this tutorial, we show how you can use gnuplot to generate interesting animations of your data, such as the example above.

What do you need?

  • Install Gnuplot  version 5.2.8 (or higher) for your OS (under windows you can also install it under your Cygwin installation)
  • A data set as a simple multi-column text-file data.dat .
  • A similar text-file, model.dat, with your model calculated on a grid .

1. Starting simple: a static image

The main difference between an animation and a static image is the fact the former is just a series of such static images shown one after the other.

1.a. Basic image

Gnuplot allows both interactive and scripted command-line usage. The commands used in interactive mode can simply be placed in a text-file (e.g., myplot.gpl) and run using  the command:

> gnuplot myplot.gpl

Comments can be added in such a file by preceding them with a single “#“. In the examples below, I’m using “###” as a personal choice. It shows clearly the location of the comments, and also gives me an easy way to distinguish with script lines I commented out for testing purposes, in which case I use a single #. In the following I also indicate gnuplot commands in red, while options are indicated in turquise. Let u start by plotting the data set in a simple png:

### Set the output to a png file
set termopt enhanced
set terminal pngcairo size 300,300 font "Helvetica-Bold,6"
### The file to write to
set output 'modelplot_v1.png'
### The Title label
set title 'ML model tutor' font "Helvetica-Bold,10"

splot "data.dat" u 1:2:3
model plot v1

model plot v1

With this we set the output to be a png image of 300×300 pixels. (Note: pngcairo also provides png-functionality using the cairo-library. For more complex plotting, it gives much nicer images.) The default font for text is set to “Helvetical-Bold” with a font-size of 6pt. The enhanced option further allows us to use LaTeX type strings, for example indicating subscripts as A_n to print An. The resulting image is stored as ‘modelplot_v1.png‘.

The last two commands are used to create the actual plot. With set title a title can be added to the graph. The default font is replaced in this case by a slightly larger version of 10pt. The splot command  allows you to plot 3D surfaces using the same basic information as the gnuplot plot command for 2D plots. In this case, I used the first 3 columns of the data.dat file to plot 3D data, with the x:y:z giving the respective column numbers. The result is shown on the right.

NOTE: An important point to consider is the fact that the font size is absolute. So if you decide later-on to change your image size to say 500×500 pixels, your text labels may look rather small, and you will have to tweak the font-size to compensate of this behavior. Therefore, it is important to make sure you start with the right image size straight away. The 300×300 pixels used in this tutorial are too small for any scientific quality image, it was chosen to be a suitable image size to incorporate in this blog.

1.b. Pimp the axis

With the basics for the graph set up, we can start setting up the graph to our liking.

###settings for the boxplot
set xlabel "M_{n polyX} (g/mol)" offset 0,-1,0 font "Helvetica-Bold, 8" rotate parallel
set xrange[1000:10000] noreverse writeback
set xtics 2000,2000,10000 out scale 1.0 nomirror offset 0,-0.5,0

set ylabel "Graft (%)" offset 2,0,0 font "Helvetica-Bold, 8" rotate parallel
set yrange[0:30] noreverse writeback
set ytics 0,10,30 out scale 1.0 nomirror offset 0,-0.5,0

set zlabel "Particle size (nm)" offset 1,1,0 font "Helvetica-Bold, 8" rotate parallel
set zrange[0:300] noreverse writeback
set ztics 0,50,300 out scale 1.0 nomirror offset 0,-0.5,0

set xyplane at 0
set border lw 3

 

The label of each of the three axes can be modified individually using set {x/y/z}label followed by the same options available to any other string (such as the graph title earlier). Here you can see how the enhanced mode allows the use of a subscript using standard LaTeX formatting. The offset makes sure the axis-label does not overlap with the tick-labels. Gnuplot also allows you to define the range to be plotted using set {x/y/z}range[min:max], while set {x/y/z}tics gives you access to the specifics of the individual tics. The latter can be very useful to manually add specific tics, or, as in the current case, manually set the splitting between the different tics. The out option places the tic-marks at the outside of the graph, and their size is set by the scale option.

The command set xyplane can be used to set the intercept of the xy-plane and the z-axis, and set border gives access to the axis-line properties. Here I have set the line-width (lw) to 3.

1.c. Add the (machine-learning) model

Now that the basics properties of the 3D graph are alright*, let us add the model to the plot. This can easily be done by just adding additional input for the splot command.

splot \
     "data.dat" using 1:2:3 with points pointtype 7 pointsize 1 linecolor rgb "brown4" notitle, \
     "model.dat" u 1:2:3 w line lc rgb "sea-green" notitle

model plot v3

The “\” can be used to split the command to multiple lines. In this case, each curve/surface/data set is set on a separate line. Since the command for a single plot can become very long, gnuplot also has a shorthand for most common keywords/options it uses.  For the data the extended keywords are shown, and the shorthand is used for the model. The length of the command becomes significantly shorter, but at the same time harder to read. (Note that both shorthand as longhand keywords can be mixed in a single command.)

The data set is now being shown as points, using the 7th pointtype (which are discs). The size of these symbols is set to 1 and the linecolor is a predefined color used by gnuplot.  Finally, the notitle option removes the legend entry of this curve. The model data is presented as a line-surface. The end result is shown on the right.

1.d. A better surface-plot: Multi-plot

model plot v4

As you can see in the previous version of the plot, the model data is plotted as a surface, but this is not a very nice surface. This is because gnuplot just connected the sequential points in the file as a single very long and very complex curve. If you would rotate this plot, it would become clear, several things are very wrong. Luckily there is a very simple solution. Gnuplot has the ability to transform a point-cloud into a surface. This is done by setting a 3D grid using set dgrid3d X,Y. This creates a 3D surface for which the nodes are interpolated between the points of your point-cloud. When you set this option, it is applied on all data curves you plot (i.e., including the set of data-points, which we would like to avoid.). Using the multiplot option of gnuplot the two curves can be drawn separately, using different settings. In the script the splot command is replaced by:

### switch to a multiplot
set multiplot
set dgrid3d 26, 26 
splot \
     "model.dat" u 1:2:3 w line lc rgb "sea-green" notitle

unset dgrid3d
splot \
     "data.dat" using 1:2:3 with points pointtype 7 pointsize 1 linecolor rgb "brown4" notitle

unset multiplot

By setting the multiplot environment, we can unset dgrid3d before drawing the second data set. At the end of script we also unset multiplot to switch of the multiplot environment. At this point it become interesting to see the impact of the terminal pngcairo over png.

1.e. Surface coloring

Drawing a surface is nice, but you can also give it some color. Either by using the z-value as a color scale, or by using another metric/feature to color the surface.

###settings for the color scale
set colorbox vertical
set cblabel "Colormap\n (RGB)" font "Helvetica-Bold, 8" offset -6.75,8 rotate by 0
set pm3d at s explicit

splot \
      "model.dat" u 1:2:3 w pm3d notitle

model plot v5

Surface coloring is switched on via the command set pm3d which is set at the surface, and is used in splot at the with option. In addition to surface coloring also a color scale is added, with the label formatted using the same options as for other labels. To get the label above the color scale it needs to be shifted using the offset and the rotate option.

The result is rather fancy, but for practical purposes, the surface may actually block the view of the data points. This can be avoided by projecting the color on the xy-plane and retaining a grid representation of the surface. This is done by setting pm3d at the bottom.

model plot v6

 

 

In addition, we  also need to plot the model surface twice, once to generate the color map and once to generate the model surface as a grid-image.

set pm3d at b explicit

splot \
    "model.dat" u 1:2:3 w pm3d notitle,\
    "model.dat" u 1:2:3 w line lc rgb "sea-green" notitle

 

 

2. Creating an animation

With gnuplot it is quite easy to generate stunning 3D animated gif images. Some nice examples can be found all over the web, such as this animated Bessel function, my own (very old) molecular d-and f-orbitals, or this collection. Once you finish creating a script to generate a single image, creating an animation requires only some minor modifications. First of all we need to select the correct terminal (i.e., gif instead of png)

set terminal gif transparent animate nooptimize delay 10 size 300,300 font "Helvetica-Bold,10"
set output 'modelplot_v7.gif'

This generates a transparent animated gif with a delay of 10 ms between frames, and stores it in a gif image. In addition, a change in time/image frame needs to be implemented. This can easily be done by a simple for loop, which is wrapped around the plotting section.

n=60
do for [i=1:n]{
    set view 60, i*360/n

    ### do the multiplot plotting section 
    set multiplot 
    ...all the other plotting stuff of before
    unset multiplot
}
set output

model plot v7: animated

In the example, the 3D graph is rotated. This is done by changing the view via the set view command which takes two angles in degrees. As you can see from “i*360/n“, gnuplot also accepts simple mathematical equations.

Once the loop is finished we need to close our gif animation. This is done via (a side-effect of) the command set output. The set output {filename} command sets the output to a file with name filename, or if a filename is omitted to the standard output. As a side-effect it closes the current output file, c.q. our animated gif.

An alternative method for creating an animation would be creating a series of images (in the image-format of your choice, e.g. pngcairo and then create an apng) and combining them yourself or via additional scripting into an animated image format using additional software, such as is done here.

3. Animated surfaces and coloring 

The example above is rather trivial in regard to animations. The ability to perform math inside a gnuplot script provides you the ability to make things a lot more interesting. In the following, we are going to construct a small imaginary solar system, to present some of the things which are possible.

The basic script for the solar system above can be downloaded here.

set termopt enhanced
set terminal gif animate nooptimize delay 10 size 300,300 font "Helvetica,10"
set output 'Solarplot_v1.gif'
set title 'Magic Solar System' font "Helvetica-Bold,10"

maxl=10
set xrange[-maxl:maxl] noreverse writeback
set yrange[-maxl:maxl] noreverse writeback
set zrange[-maxl:maxl] noreverse writeback
set xyplane at 0
set border lw 1.5

###Use parametric coordinated for plotting spheres
set parametric # enable parametric mode with angles (u,v)
set urange [0:2*pi]
set vrange [-pi/2.0:pi/2.0]
set isosample 360,180
fx(u,v)=cos(u)*cos(v)
fy(u,v)=sin(u)*cos(v)
fz(v)=sin(v)

### Surface coloring
set colorbox vertical
set cblabel "Planet\n colors" font "Helvetia, 8" offset -4.75,5 rotate by 0
set pm3d depthorder base nohidden3d
unset hidden3d

### The animated drawing
n = 60
do for [i=1:n]{
    # The star
    x=0.0
    y=0.0
    z=0.0
    r=3.0

    # The first planet
    x1=5.0*sin(i*2*pi/n)
    y1=5.0*cos(i*2*pi/n)
    z1=0.0
    r1=0.50
    color1(u,v)=0.5

    # The second planet
    x2=6.5*sin(i*2*pi/n)
    y2=7.0*cos(i*2*pi/n)
    z2=1.0*cos(i*2*pi/n)
    r2=1.0
    color2(u,v)=sin(v)*2*pi

    splot \
         "++" using (x+r*fx(u,v)):(y+r*fy(u,v)):(z+r*fz(v)):(6.5) w pm3d notitle,\
         "++" using (x1+r1*fx(u,v)):(y1+r1*fy(u,v)):(z1+r1*fz(v)):(color1(u,v)) w pm3d notitle,\
         "++" using (x2+r2*fx(u,v)):(y2+r2*fy(u,v)):(z2+r2*fz(v)):(color2(u,v)) w pm3d notitle 
}
set output

Most of the commands and options have already been covered above. To draw our spherical planets, we introduce a set of parametric coordinates (u,v) via the command set parametric. Next we set their ranges, just as you would do for the x,y, and z coordinates via set {u|v}range. The command set isosample is used to define the grid over the parametric space.  With this setup, you can now define any parametric surface you want. In our case, we want to have a sphere. For this we define three transformation functions. With u and v representing the θ and φ angles of a sphere, the transformation to Cartesian coordinates is given by the functions fx(u,v),fy(u,v), and fz(v).

In the main loop of the gif we define the center position (x,y,z) and the radius (r) of our planets and star, with the latter nicely fixed at the origin, and our planets having an orbit around it. To have a nice periodic gif, you should make sure that any periodic behavior ends up where it started, hence the 2pi factor in the sines and cosines.  Everything is drawn with a single splot command where we use a pseudo 4-column input style:

(x-coord):(y-coord):(z-coord):(color-coord)

Note that the brackets ‘(‘ & ‘)‘ are important to include as gnuplot will throw errors otherwise. The selected color, can be either a real value in the color scale or a function. The resulting solar system is shown below on the left.

Solar v1, no depth

Solar v1, depth

Not that bad for a first attempt. There is however a small snag: the 3D effect is somehow off when the planets move behind the star. This is due to the depth-buffering. The newest versions of gnuplot (≥5.2.8) provide the option depthorder for the set pm3d command. Using the value base for the depthorder option results in the depthorder to be decided based on the z-projected position of the object. This is sufficient to fix our little solar system, as you can see on the right.

3.a. Some cleanup work: removing the box and complex coloring 

As we are creating an imaginary (magical) solar system, we should maybe get rid of the x-,y-, and z-axis. This is done via the commands unset border to get rid of the axis-bars and unset {x|y|z}tics to remove the tic-marks and-labels.

unset border
unset xtics
unset ytics
unset ztics
set palette defined (0 "red", 1 "yellow",1 "brown", 2 "brown4",3 "dark-green", 4 "blue", 5 "white")
set cbrange[0:10]
#unset cbtics

Solar v2

And although the color palette provided is nice, if we want different color schemes on each of the planets, we quickly run into a small problem: you can only have 1 color palette per splot. In case of a static image, you might be able to get around this problem by using a multiplot (as before) and have overlapping splots with each their own palette. But…in that case you will also be responsible for getting the 3D order of your objects correct yourself. And although this may be doable for a single frame, in case of an animated 3D solar system this will be a hassle nearly impossible to overcome**. For this you need to tackle the problem in a different way: create your own color palette consisting of sub-palettes. This can be done via the command set palette defined. The pairs give the color at the endpoints of gradient ranges, with the overall range (here 0-5) representing the entire color scale. The intermediate points are placed equidistant, so for a color range from 0:10 the red-to-yellow gradient is linked to color values in the range 0:2, while the blue-to-white gradient is linked to the color values in the range 8:10. Applying this to our solar system we can give nice individual color palettes to each of the objects. I changed the size and position of the three objects a bit, and as you can see, the outer planet moves outside of the x/y/z-ranges. Now that we know how to add different color palettes to each of our planets, we can also remove the color-bar on the right using the command unset colorbox, remove the tics via unset cbtics, and remove the label via unset cblabel.

3.b. time-dependent colors 

We have motion of objects and different color palettes, what about changing the colors during the animation? As we saw earlier, the color component can be defined as a function, which means we can make this time-dependent as well. Let’s imagine that our outer planet is traveling on a rather elliptic orbit, making it heat up when it approaches our star.

# The first planet
    color1(u,v)=0.5*(cos(u)**5+sin(v)**3)+sin(i*6*pi/n) +8.0 
	
# The second planet
    x2=8+15*sin((i+20)*2*pi/n)
    y2=6.0*cos((i+20)*2*pi/n)
    z2=3.0*cos((i+20)*2*pi/n)
    color2(u,v)=0.99*sin((i+20)*2*pi/n+pi)+1

By making the color dependent on the frame number, the (uniform) coloring of our second planet will now cycle through the red-yellow gradient. The first planet experiences a variation at 3x the speed but have a non-uniform surface coloring.

Solar v3

3.c. time-dependent colors and shapes 

Once you have time dependent coloring, and time dependent motion, you can also have time dependent shapes and combine all three. This is all possible within the same basic framework set up above. For example, we can make our star a bit more active, letting it bulge and swirl. Adding another planet and some moons the magic solar system below is created by this gnuplot script.

Solar v4

4. Conclusion 

Gnuplot provides a versatile tool for creating animated gifs of your machine learning data and models, or anything else you could imagine. It has an extensive number of options which allow you to tweak each single property of your graph. The ability to perform simple arithmetic within a gnuplot-script further increases the potential.

 


* Ignore the rather crummy quality of the embedded images. This is an artifact of only having a 300×300 pixel image, the animation at the top of the page has an 1000×1000 pixel resolution and shows a much better quality.

** Of course with enough persistence you may find a way to  get it done…but there are less sadomasochistic ways of doing this 😉

Partitioning the vibrational spectrum: Fingerprinting defects in solids

Authors:  Danny E. P. Vanpoucke
Journal: Computational Materials Science 181, 109736 (2020)
doi: 10.1016/j.commatsci.2020.109736
IF(2018): 2.644
export: bibtex
pdf: <ComputMaterSci>   (Open Access)
github: <Hive-toolbox>

 

Graphical abstract Computational Materials Science 181, 109736 (2020)
Graphical Abstract: Finger printing defects in diamond through the creation of the vibrational spectrum of a defect.

Abstract

Vibrational spectroscopy techniques are some of the most-used tools for materials
characterization. Their simulation is therefore of significant interest, but commonly
performed using low cost approximate computational methods, such as force-fields.
Highly accurate quantum-mechanical methods, on the other hand are generally only used
in the context of molecules or small unit cell solids. For extended solid systems,
such as defects, the computational cost of plane wave based quantum mechanical simulations
remains prohibitive for routine calculations. In this work, we present a computational scheme
for isolating the vibrational spectrum of a defect in a solid. By quantifying the defect character
of the atom-projected vibrational spectra, the contributing atoms are identified and the strength
of their contribution determined. This method could be used to systematically improve phonon
fragment calculations. More interestingly, using the atom-projected vibrational spectra of the
defect atoms directly, it is possible to obtain a well-converged defect spectrum at lower
computational cost, which also incorporates the host-lattice interactions. Using diamond as
the host material, four point-defect test cases, each presenting a distinctly different
vibrational behaviour, are considered: a heavy substitutional dopant (Eu), two intrinsic
point-defects (neutral vacancy and split interstitial), and the negatively charged N-vacancy
center. The heavy dopant and split interstitial present localized modes at low and high
frequencies, respectively, showing little overlap with the host spectrum. In contrast, the
neutral vacancy and the N-vacancy center show a broad contribution to the upper spectral range
of the host spectrum, making them challenging to extract. Independent of the vibrational behaviour,
the main atoms contributing to the defect spectrum can be clearly identified. Recombination of
their atom-projected spectra results in the isolated spectrum of the point-defect.

Tutorial OOP(V): Documenting Fortran 2003 Classes

In the previous sessions of this tutorial on Object Oriented Programming in Fortran 2003, the basics of OO programming, including the implementation of constructors and destructors as well as operator overloading were covered. The resulting classes have already become quite extended (cf. github source). Although at this point it is still very clear what each part does and why certain choices were made, memory fades. One year from now, when you revisit your work, this will no longer be the case. Alternately, when sharing code, you don’t want to have to dig through every line of code to figure out how to use it. These are just some of the reasons why code documentation is important. This is a universal habit of programming which should be adopted irrespective of the programming-language and-paradigm, or size of the code base (yes, even small functions should be documented).

In Fortran, comments can be included in a very simple fashion: everything following the “!” symbol (when not used in a string) is considered a comment, and thus ignored by the compiler. This allows for quick and easy documentation of your code, and can be sufficient for single functions. However, when dealing with larger projects retaining a global overview and keeping track of interdependencies becomes harder. This is where automatic documentation generation software comes into play.  These tools parse specifically formatted comments to construct API documentation and user-guides. Over the years, several useful tools have been developed for the Fortran language directly, or as a plugin/extension to a more general tool:

  • ROBODoc : A tool capable of generating documentation (many different formats) for any programming/script language which has comments.  The latest update dates from 2015.
  • Doctran : This tool is specifically aimed at free-format (≥ .f90 ) fortran, and notes explicitly the aim to deal with object oriented f2003. It only generates html documentation, and is currently proprietary with license costs of 30£ per plugin. Latest update 2016.
  • SphinxFortran : This extension to SphinxFortran generates automatic documentation for f90 source (no OO fortran) and generates an html manual. This package is written in python and requires you to construct your config file in python as well.
  • f90doc / f90tohtml : Two tools written in Perl, which transform f90 code into html webpages.
  • FotranDOC : This tool (written in Fortran itself) aims to generate documentation for f95 code, preferably in a single file, in latex. It has a simple GUI interface, and the source of the tool itself is an example of how the fortran code should be documented. How nice is that?
  • FORD : Ford is a documentation tool written in python, aimed at modern fortran (i.e. ≥ f90).
  • Doxygen :  A multi-platform automatic documentation tool developed for C++, but extended to many other languages including fortran. It is very flexible, and easy to use and can produce documentation in html, pdf, man-pages, rtf,… out of the box.

As you can see, there is a lot to choose from, all with their own quirks and features. One unfortunate aspect is the fact that most of these tools use different formatting conventions, so switching from one to the another is not an exercise to perform lightly. In this tutorial, the doxygen tool is used, as it provides a wide range of options, is multi-platform,  supports multiple languages and multiple output formats.

As you might already expect, Object Oriented Fortran (f2003) is a bit more complicated to document than  procedural Fortran, but with some ingenuity doxygen can be made to provide nice documentation even in this case.

1. Configuring Doxygen

Before you can start you will need to install doxygen:

  1. Go the the doxygen-download page and find the distribution which is right for you (Windows-users: there are binary installers, no hassle with compilations 🙂 ).
  2. Follow the installation instructions, also install GraphViz, this will allow you to create nicer graphics using the dot-tool.
  3. Also get a pdf version of the manual (doxygen has a huge number of options)

With a nicely installed doxygen, you can make use of the GUI to setup a configuration suited to your specific needs and generate the documentation for your code automatically. For Object Oriented Fortran there are some specific settings you should consider:GUI interface of doxygen.

  1.  Wizard tab

    • Project Topic : Fill out the different fields. In a multi-file project, with source stored in a folder structure, don’t forget to select the tick-box “Scan recursively” .
    • Mode Topic : Select “Optimize for Fortran output”.
    • Output Topic : Select one or more output formats you wish to generate: html, Latex (pdf), map-pages, RTF, and XML
    • Diagrams Topic: Select which types of diagrams you want to generate.
  2. Expert tab

    (Provides access each single configuration option to set in doxygen, so I will only highlight a few. Look through them to get a better idea of the capabilities of doxygen.)

    • Project Topic :
      • EXTENSION_MAPPING: You will have to tell doxygen which fortran extensions you are using by adding them, and identifying it as free format fortran: e.g. f03=FortranFree (If you are also including text-files to provide additional documentation, it is best to add them here as well as free format fortran).
    • Build Topic:
      • CASE_SENSE_NAMES: Even though Fortran itself is not case sensitive, it may be nice to keep the type of casing you use in your code in your documentation. Note, however, that even though the output may have upper-case names, the documentation itself will require lower-case names in references.
    • Messages Topic:
      • WARN_NO_PARAMDOC: Throw a warning if documentation is missing for a function variable. This is useful to make sure you have a complete documentation.
    • Source Browser Topic:
      • SOURCE_BROWSER: Complete source files are included in the documentation.
      • INLINE_SOURCES: Place the source body with each function directly in the documentation.
    • HTML Topic:
      • FORMULA_FONTSIZE: The fontsize used for generated formulas. If 10 pts is too small to get a nice effect of formulas embedded in text.
    • Dot Topic:
      • HAVE_DOT & DOT_PATH: If you installed GraphViz
      • DOT_GRAPH_MAX_NODES: Maximum number of nodes to draw in a relation graph. In case of larger projects, 50 may be too small.
      • CALL_GRAPH & CALLER_GRAPH: Types of relation graphs to include.
  3. Run tab

    • Press “Run doxygen” and watch how your documentation is being generated. For larger projects this may take some time. Fortunately, graphics are not generated anew if they are present from a previous run, speeding things up. (NOTE: If you want to generate new graphics (and equations with larger font size), make sure to delete the old versions first.) Any warnings and errors are also shown in the main window.
    • Once doxygen was run successfully, pressing the button “Show HTML output” will open a browser and take you to the HTML version of the documentation.

 

Once you have a working configuration for doxygen, you can save this for later use. Doxygen allows you to load an old configuration file and run immediately. The configuration file for the Timer-class project is included in the docs folder, together with the pdf-latex version of the generated documentation.  Doxygen generates all latex files required for generating the pdf. To generate the actual pdf, a make.bat file needs to be run (i.e. double-click the file, and watch it run) in a Windows environment.

2. Documenting Fortran (procedural)

Let us start with some basics for documenting Fortran code in a way suitable for doxygen. Since doxygen has a very extensive set of options and features, not all of them can be covered. However, the manual of more than 300 pages provides all the information you may need.

With doxygen, you are able to document more or less any part of your code: entire files, modules, functions or variables. In each case, a similar approach can be taken. Let’s consider the documentation of the TimeClass module:

  1. !++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  2. !> \brief The <b>TimeClass module</b> contains the
  3. !! \link ttime TTime class\endlink used by the
  4. !! \link timerclass::ttimer TTimer class\endlink for practical timing.
  5. !!
  6. !! @author  Dr. Dr. Danny E. P. Vanpoucke
  7. !! @version 2.0-3  (upgrades deprecated timing module)
  8. !! @date    19-03-2020
  9. !! @copyright https://dannyvanpoucke.be
  10. !!
  11. !! @warning Internally, Julian Day Numbers are used to compare dates. As a
  12. !! result, *negative* dates are not accepted. If such dates are created
  13. !! (*e.g.*, due to a subtraction), then the date is set to zero.
  14. !!
  15. !! This module makes use of:
  16. !! - nothing; this module is fully independent
  17. !<-----------------------------------------------------------------------
  18. module TimeClass
  19.     implicit none
  20.     private

The documentation is placed in a standard single or multi-line fortran comment.  In case of multi-line documentation, I have the personal habit turning it into a kind of banner starting with a “!+++++++++” line and closing with a “!<——————-” line.  Such choices are your own, and are not necessary for doxygen documentation. For doxygen, a multi-line documentation block starts with “!>” and ends with “!<“ . The documentation lines in between can be indicated with “!!”. This is specifically for fortran documentation in doxygen. C/C++ and other languages will have slightly different conventions, related to their comment section conventions.

In the block above, you immediately see certain words are preceded by an “@”-symbol or a “\”, this indicates these are special keywords. Both the “@” and “\” can be used interchangeably for most keywords, the preference is again personal taste.  Furthermore, doxygen supports both html and markdown notation for formatting, providing a lot of flexibility. The multi-line documentation is placed before the object being documented (here an entire module).

Some keywords:

  • \brief : Here you can place a short description of the object. This description is shown in parts of the documentation that  provide an overview. Note that this is also the first part of the full documentation of the object itself. After a blank line, the \details(this keyword does not need to provided explicitly) section starts, providing further details on the object. This information is only visible in the documentation of the object itself.
  • \link … \endlink, or \ref : These are two option to build links between parts of your documentation. You can either use \ref nameobject or \link nameobject FormattedNameObject \endlink. Note that for fortran, doxygen uses an all non-capitalized namespace, so YourObject needs to be referenced as \ref yourobject or you will end up with an error and a missing link. So if you want your documentation to show YourObject as a link instead of yourobject, you can use the \link … \endlink construction.
  • “::”  : Referring to an element of an object can be done by linking the element and the object via two colons:  object::element . Here it is important to remember that your module is an object, so linking to an element of a module from outside that module requires you to refer to it in this way.
  • @author : Provide information on the author.
  • @version : Provide version information.
  • @date : Provide information on the date.
  • @copyright : Provide information on the copyright.
  • @warning : Provides a highlighted section with warning information for the user of your code (e.g., function kills the program when something goes wrong).
  • @todo[not shown] If you still have some things to do with regard to this object you can use this keyword. More interestingly, doxygen will also create a page where all to-do’s of the entire project are gathered, and link back to the specific code fragments.

 

  1. !++++++++++++++++++++++++++++++++++++++++++++++
  2. !>\brief Function to subtract two \link ttime TTime\endlink instance
  3. !! via the "-" operator. This is the function
  4. !! performing the actual operator overloading.
  5. !!
  6. !! \b usage:
  7. !! \code{.f03}
  8. !! Total = this - that
  9. !! \endcode
  10. !! This line also calls the \link copy assignment operator\endlink.
  11. !!
  12. !! \note The result should remain a positive number.
  13. !!
  14. !! @param[in] this The \link ttime TTime\endlink instance before
  15. !!                 the "-" operator.
  16. !! @param[in] that The \link ttime TTime\endlink instance after
  17. !!                 the "-" operator.
  18. !! \return Total The \link ttime TTime\endlink instance representing
  19. !!               the difference.
  20. !<---------------------------------------------
  21.     pure function subtract(this,that) Result(Total)
  22.         class(TTime), intent(in) :: this, that
  23.         Type(TTime) :: total

When documenting functions and subroutines there are some addition must-have keywords.

  • @param[in] , @param[out] ,or@param[in,out] : Provide a description for each of the function parameters, including their  intent: “in”, “out”, or “in,out” (note the comma!).
  • \return : Provides information on the return value of the function.
  • \b, \i : The next word is bold or italic
  • \n : Start a newline, without starting a new paragraph.
  • \note : Add a special note in your documentation. This section will be high lighted in a fashion similar to @warning.
  • \code{.f03}…\endcode :  This environment allows you to have syntax highlighted code in your documentation. The language can be indicated via the “extension” typical for said language. In this case: fortran-2003.
  • \f$ … \f$, or \f[ … \f] : Sometimes equations are just that much easier to convey your message. Doxygen also supports latex formatting for equations. These tags can be used to enter a latex $…$ or \[ \] math environments. The equations are transformed into small png images upon documentation generation, to be included in the html of your documentation. There are two important aspects to consider when using this option:
    1. Font size of the equation: Check if this is sufficient and don’t be afraid to change the font size to improve readability.
    2. Compilation is not halted upon an error: If the latex compiler encounters an error in your formula it just tries to continue. In case of failure, the end result may be missing or wrong. Debugging latex equations in doxygen documentation can be quite challenging as a result. So if you are using large complex equations, it may be advised to run them in a pure latex environment, and only past them in the documentation once you are satisfied with the result.

 

3. Documenting Fortran Classes

With the knowledge of the previous section, it is relatively easy to document most fortran code. Also the type of object orientation available in fortran 95, in which a fortran module is refurbished as a class. True fortran classes in contrast tend to give a few unexpected issues to deal with. Lets have a look at the documentation of the TTime class of the TimeClass module:

  1. !+++++++++++++++++++++++++++++++++++++++
  2. !> @class ttime
  3. !! \brief The TTime class contains all time functionality
  4. !! with regard to a single time stamp.
  5. !<-------------------------------------
  6.     type, public :: TTime
  7.       private
  8.         integer :: year    !< @private The year
  9.         integer :: month   !< @private The month (as integer).
  10.         ...
  11.     contains
  12.       private
  13.         procedure, pass(this),public :: SetTime       !<          @copydoc timeclass::settime
  14.         procedure, pass(this)        :: CalculateJDN  !< @private @copydoc timeclass::calculatejdn
  15.         procedure, pass(this)        :: SetJDN        !< @private @copydoc timeclass::setjdn
  16.         ...
  17.         procedure, pass(this)        :: copy          !< @private @copydoc timeclass::copy
  18.         ...
  19.         generic, public :: assignment(=) => copy      !<          @copydoc timeclass::copy
  20.         !> @{ @protected
  21.         final :: destructor !< @copydoc timeclass::destructor
  22.         !> @}
  23.     end type TTime
  24.  
  25.     ! This is the only way a constructor can be created,
  26.     ! as no "initial" exists, emulates the C++ constructor behavior
  27.     interface TTime
  28.         module procedure constructor
  29.     end interface TTime

To make sure doxygen generates a class-like documentation for our fortran class, it needs to be told it is a class. This can be done by documenting the class itself and using the keyword @class nameclass, with nameclass the name doxygen will use for this class (so you can choose something different from the actual class name). Unfortunately, doxygen will call this a “module” in the documentation (just poor luck in nomenclature). On the module page for the ttime class a listing is provided of all elements given in the class definition. The documentation added to each member (e.g.,:

  1. integer :: year !< @private The year

is shown as “\brief” documentation. By default all members of our function are considered as public. Adding the @private, @public, or @protected keyword instructs doxygen explicitly to consider these members as private, public or protected. (I used protected in the ttime code not as it should be used in fortran, but as a means of indicating the special status of the final subroutine (i.e. protected in a C++ way).)

However, there seems to be something strange going on. When following the links in the documentation, we do not end up with the documentation provided for the functions/subroutines in the body of our timeclass module. Doxygen seems to consider these two distinct things. The easiest way to link the correct information is by using the keyword @copydoc functionreference . The documentation is (according to doxygen) still for two distinctly different objects, however, this time they have the exact same documentation (unless you add more text on the member documentation line). In this context, it interesting to know there is also @copybrief and @copydetails which can be used to only copy the brief/details section.

In this example, the constructor interface is not documented, as this created confusion in the final  documentation since doxygen created a second ttime module/object linked to this interface. However, not documenting this specific instance of the constructor does not create such a large issue, as the module(the fortran module) function itself is documented already.

Conclusion

Documenting fortran classes can be done quite nicely with doxygen. It provides various modes of output: from a fully working website with in-site search engine to a hyperlinked pdf or RTF document. The flexibility and large number of options may be a bit daunting at first, but you can start simple, and work your way up.

As Fortran is supported as an extension, you will need to play around with the various options to find which combination gives the effect you intended. This is an aspect present in all automated code documentation generation tools, since object oriented Fortran is not that widely used. Nonetheless, doxygen provides a very powerful tool worth your time and effort.

PS:

SBDD 25 (aka the COVID19 edition)

Last Wednesday, the 25th edition of the Hasselt Diamond workshop started. The central topic of this celebratory edition was focused on surfaces, perfectly suited to present some of my more recent diamond based work.[1][2] Just as the previous years, the program was packed with interesting talks on anything diamond. Phosphorous doped diamond seemed to be the “new thing” this year, but I could be biased, as I was speaking on phosphorous adsorption myself. Due to a cancellation, I found myself being asked on Monday afternoon to present my work as a talk 😎 , on Wednesday morning 😯 . Because I had been a bit too ambitious in my conference abstract, this talk ended up being nicely complementary to my poster.

Poster SBDD 25 conference, Hasselt 2020

Unfortunately, this celebratory edition also fell victim to the COVID-19 crisis. In addition to being the most popular conversation topic—a close second to diamond research—, it also had a very real impact on the conference itself. The COVID-19 crisis resulted in a drop of attendance from 238 people in 2019 to 143 this year.  In addition, the quickly changing situation worldwide lead to last minute cancellations due to travel restrictions. On Thursday evening, the conference site went into lock down. Furthermore, that evening, the Belgian federal government also decided that schools and higher education should be closed, as well as pubs and restaurants, until April 3rd. There was also the urgent request for people to work from home as much as possible. (Consider this a good example of acting NOW aimed at saving people.)

Consider this computational scientist in lock down in his home lab until further notice.

UV-Curable Biobased Polyacrylates Based on a Multifunctional 2 Monomer Derived from Furfural

Authors: Jules Stouten, Danny E. P. Vanpoucke, Guy Van Assche, and Katrien V. Bernaerts
Journal: Macromolecules 53(4), 1388-1404 (2020)
doi: 10.1021/acs.macromol.9b02659
IF(2018): 5.997
export: bibtex
pdf: <Macromolecules> (Open Access)

 

 

Grapgical abstract ACS Macromolecules 2020
Graphical Abstract: The formation of biobased polyacrylates.

Abstract

The controlled polymerization of a new biobased monomer, 4-oxocyclopent-2-en-1-yl acrylate (4CPA), was
established via reversible addition−fragmentation chain transfer (RAFT) (co)polymerization to yield polymers bearing pendent cyclopentenone units. 4CPA contains two reactive functionalities, namely, a vinyl group and an internal double bond, and is an unsymmetrical monomer. Therefore, competition between the internal double bond and the vinyl group eventually leads to gel formation. With RAFT polymerization, when aiming for a degree of polymerization (DP) of 100, maximum 4CPA conversions of the vinyl group between 19.0 and 45.2% were obtained without gel formation or extensive broadening of the dispersity. When the same conditions were applied in the copolymerization of 4CPA with lauryl acrylate (LA), methyl acrylate (MA), and isobornyl acrylate, 4CPA conversions of the vinyl group between 63 and 95% were reached. The additional functionality of 4CPA in copolymers was demonstrated by model studies with 4-oxocyclopent-2-en-1-yl acetate (1), which readily dimerized under UV light via [2 + 2] photocyclodimerization. First-principles quantum mechanical simulations supported the experimental observations made in NMR. Based on the calculated energetics and chemical shifts, a mixture of head-to-head and head-to-tail dimers of (1) were identified. Using the dimerization mechanism, solvent-cast LA and MA copolymers containing 30 mol % 4CPA were cross-linked under UV light to obtain thin films. The cross-linked films were characterized by dynamic scanning calorimetry, dynamic mechanical analysis, IR, and swelling experiments. This is the first case where 4CPA is described as a monomer for functional biobased polymers that can undergo additional UV curing via photodimerization.

Influence of diamond crystal orientation on the interaction with biological matter

Authors: Viraj Damle, Kaiqi Wu, Oreste De Luca, Natalia Ortí-Casañ, Neda Norouzi, Aryan Morita, Joop de Vries, Hans Kaper, Inge Zuhorn, Ulrich Eisel, Danny E.P. Vanpoucke, Petra Rudolf, and Romana Schirhagl,
Journal: Carbon 162, 1-12 (2020)
doi: 10.1016/j.carbon.2020.01.115
IF(2018): 7.466
export: bibtex
pdf: <Carbon> (Open Access)

 

Graphical Abstract Carbon paper with Romana
Graphical Abstract: The preferential adsorption of biological matter on oriented diamond surfaces.

Abstract

Diamond has been a popular material for a variety of biological applications due to its favorable chemical, optical, mechanical and biocompatible properties. While the lattice orientation of crystalline material is known to alter the interaction between solids and biological materials, the effect of diamond’s crystal orientation on biological applications is completely unknown. Here, we experimentally evaluate the influence of the crystal orientation by investigating the interaction between the <100>, <110> and <111> surfaces of the single crystal diamond with biomolecules, cell culture medium, mammalian cells and bacteria. We show that the crystal orientation significantly alters these biological interactions. Most surprising is the two orders of magnitude difference in the number of bacteria adhering on <111> surface compared to <100> surface when both the surfaces were maintained under the same condition. We also observe differences in how small biomolecules attach to the surfaces. Neurons or HeLa cells on the other hand do not have clear preferences for either of the surfaces. To explain the observed differences, we theoretically estimated the surface charge for these three low index diamond surfaces and followed by the surface composition analysis using x-ray photoelectron spectroscopy (XPS). We conclude that the differences in negative surface charge, atomic composition and functional groups of the different surface orientations lead to significant variations in how the single crystal diamond surface interacts with the studied biological entities.

Investigation of structural, electronic and magnetic properties of breathing metal–organic framework MIL-47(Mn): a first principles approach

Authors: Mohammadreza Hosseini, Danny E. P. Vanpoucke, Paolo Giannozzi, Masoud Berahman  and Nasser Hadipour
Journal: RSC Adv. 10, 4786-4794 (2020)
doi: 10.1039/C9RA09196C
IF(2018): 3.049
export: bibtex
pdf: <RSC Adv.> (Open Access)

 

Graphical abstract: MIL-47(Mn) paper
Graphical Abstract: The breathing MIL-47(Mn) Metal-Organic Framework. Upon breathing, the electronic structure of this MOF undergoes a transition from an anti-ferromagnetic semiconductor, to a ferromagnetic semi-metal.

Abstract

The structural, electronic and magnetic properties of the MIL-47(Mn) metal–organic framework are investigated using first principles calculations. We find that the large-pore structure is the ground state of this material. We show that upon transition from the large-pore to the narrow-pore structure, the magnetic ground-state configuration changes from antiferromagnetic to ferromagnetic, consistent with the computed values of the intra-chain coupling constant. Furthermore, the antiferromagnetic and ferromagnetic configuration phases have intrinsically different electronic behavior: the former is semiconducting, the latter is a metal or half-metal. The change of electronic properties during breathing posits MIL-47(Mn) as a good candidate for sensing and other applications. Our calculated electronic band structure for MIL-47(Mn) presents a combination of flat dispersionless and strongly dispersive regions in the valence and conduction bands, indicative of quasi-1D electronic behavior. The spin coupling constants are obtained by mapping the total energies onto a spin Hamiltonian. The inter-chain coupling is found to be at least one order of magnitude smaller than the intra-chain coupling for both large and narrow pores. Interestingly, the intra-chain coupling changes sign and becomes five times stronger going from the large pore to the narrow pore structure. As such MIL-47(Mn) could provide unique opportunities for tunable low-dimensional magnetism in transition metal oxide systems.

Review of 2019

Happy New Year

2019 has come and gone. 2020 eagerly awaits getting acquainted. But first we look back one last time, trying to turn this into a old tradition. What have I done during the last year of some academic merit.

Publications: +3 (and currently +5 submitted)

 

Completed refereeing tasks: +9

  • Applied Physics Letters
  • Journal of Physics Communication
  • Super Conducting Science and Technology
  • Crystals
  • Journal of Physics: Condensed Matter (2x)
  • Diamond and Related Materials (3x)

 

Conferences & workshops: +7 (Attended) 

  • Consortium meeting D-NL-HIT, Hochschule Niederrhein, Krefeld, Germany, September 19th 2019
  • Workshop: Coatings Technology & Application of Machine Learning, Hochschule Niederrhein, Krefeld, Germany, September 2nd-6th , 2019
  • Summer School: “Let’s Talk Science”, Antwerp, Belgium, July 2nd, 2019 [invited plenary talk]
  • Summer School on Data Science, Maastricht University, The Netherlands, June 26th-28th,  2019
  • VSC-user day, Brussels, Belgium, June 4th, 2019 [poster presentation]
  • Belgian Physical Society annual meeting 2019, ULB, Brussels, May 22nd, 2019 [poster presentation]
  • SBDD XXIV, Hasselt University, Belgium, March 13th-15th, 2019

 

Science Communication Events: +3  

  • Casting Keynotes TEDxUHasselt:”The Virtual Lab”, November 26th, 2019 [first prize, TEDx talk 2020]
  • Summer School: “Let’s Talk Science”, Antwerp, Belgium, July 2nd, 2019 [invited plenary talk]
  • Universiteit van Vlaanderen: “Kan jij met je computer een snellere smartphone ontwikkelen”, February 19th, 2019 [Live presentation at UvV, Online April 1st]

 

Research Stay: +1           With Prof. Klauss-Uwe Koch, Westfälishe Hochschule, Recklinghausen, Germany, July 29th – August 2nd, 2019

PhD-students: +1             Guillaume Emerick (September 2019-August 2023,PhD student UHasselt-UNamur Project, Belgium, Awarded grant for this project)

Bachelor-students: +1   Siebe Frederix (3rd Bach. Phys., Project: Atoms in Molecules based on force partitioning)

Positions: +1                         Started working on Machine Learning at AMIBM of Maastricht University

 

Current size of HIVE:

  • Finally started a public version of HIVE at github: HIVE 4.x   (3.5K lines, 6 commands available)
  • 60K lines of program (code: 70 %)
  • ~90 files
  • 49 (command line) options

Hive-STM program:

And now, upward and onward, a new year, a fresh start.

Parallel Python?

As part of my machine learning research at AMIBM, I recently ran into the following challenge: “Is it possible to do parallel computation using python.” It sent me on a rather long and arduous journey, with the final answer being something like: “very reluctantly“.

Python was designed with one specific goal in mind; make it easy to implement small test programs to see if an idea is worth pursuing. This gave rise to a scripting language with a lot of flexibility, but also with significant limitations, most of which the “intended” user would never meet. However, as a consequence of its success, many are using it going far beyond this original scope (yours truly as well 🙂 ).

Python offers various libraries to parallelize your scripts…most of them wrappers adding minor additional functionality. However, digging down to the bottom one generally ends up at one of the following two libraries: the threading module and the multiprocessing module.

Of course, as with many things python, there is a huge amount of tutorials available with many of great quality.

import threading

Programmers experienced in a programming language such as C/C++, Pascal, or Fortran, may be familiar with the concept of multi-threading. With multi-threading, a CPU allows a program to distribute its work over multiple program-threads which can be performed in parallel by the different cores of the CPU (or while a core is idle, e.g., since a thread is waiting for data to be fetched).  One of the most famous API’s for writing multi-threaded applications is OpenMP. In the past I used it to parallelize my Hirshfeld-I implementation and the phonon-module of HIVE.

For Python, there is no implementation of the OpenMP API, instead there is the threading module. This provides access to the creation of multiple threads, each able to perform their own tasks while sharing data-objects. Unfortunately, python has also the Global Interpreter Lock, GIL for short, which allows only a single thread to access the interpreter at a time. This effectively reduces thread-based parallelization to a complex way of running a code in a serial way.

For more information on “multi-threading” in python, you can look into this tutorial.

import multiprocessing

In addition to the threading module, there is also the multiprocessing module. This module side-steps the GIL by creating multiple processes, each having its own interpreter. This however comes at a cost. Firstly, there is a significant computational cost starting the different processes. Secondly, objects are not shared between processes, so additional work is needed to collect and share data.

Using the “Pool” class, things are somewhat simplified, as can be seen in the code-fragment below.  With the pool class one creates a set of threads/processes available for your program. Then through the function apply_async function it is possible to run processes in parallel. (Note that you need to use the “async” version of the function, as otherwise you end up with running things serial …again)

  1. import multiprocessing as mp
  2.  
  3. def doOneRun(id:int): #trivial function to run in parallel
  4. return id**3
  5.  
  6.  
  7.  
  8. num_workers=10 #number of processes
  9. NRuns=1000 #number of runs of the function doOneRun
  10.  
  11. pool=mp.Pool(processes=num_workers) # create a pool of processes
  12. drones=[pool.apply_async(doOneRun, args=nr) for nr in range(NRuns)] #and run things in parallel
  13.  
  14. for drone in drones: #and collect the data
  15. Results.collectData(drone.get()) #Results.collectData is a function you write to recombine the separate results into a single result and is not given here.
  16.  
  17. pool.close() #close the pool...no new tasks can be run on any of the processes
  18. pool.join() #collapse all threads back into the main thread

 

how many cores does my computer have?

If you are used to HPC applications, you always want to get as much out of your machine as possible. With regard to parallelization this often means making sure no CPU cycle is left unused. In the example above we manually selected the number of processes to spawn. However, would it not be nice if the program itself could just set this value to be equal to the number of physical cores accessible?

Python has a large number of functions claiming to do just that. A few of them are given below.

  •  multiprocessing.cpu_count(): returns the number of logical cores it can find. So if you have a modern machine with hyper-threading technology, this will return a multiple of the number of physical cores (and you will be over-subscribing your CPU.
  • os.cpu_count(): same as multiprocessing.cpu_count().
  • psutil.cpu_count(logical=False): This implementation gives the same default behavior, however, the parameter logical allows for this function to return the correct number of cores in a single CPU. Indeed a single CPU. HPC architectures which contain multiples CPUs per node will again return an incorrect number, as the implementation makes use of a python “set”, and as such doesn’t increment for the same index core on a different CPU.

In conclusion, there seems to be no simple way to obtain the correct number of physical cores using python, and one is forced to provide this number manually. (If you do have knowledge of such a function which works in both windows and unix environments and both desktop and HPC architectures feel free to let me know in the comments.)

All in all, it is technically possible to run code in parallel using python, but you have to deal with a lot of python quirks such as GIL.

Casting Keynotes: The Virtual Lab

Last Tuesday? I had the pleasure of competing in the casting keynotes competition of the TEDx UHasselt chapter. An evening filled with interesting talks on subjects ranging from the FAIR principles of open-data (by Liebet Peeters)  to the duty not stay silent in the face of “bad ideas” and leading a life of purpose. An interesting presentation was the one by Ann Bessemans on visual prosody to improve reading skills in young children as well as reading experience, more specifically the transfer of non-literal-content, for non-native speakers. There was also time for some humor, with the dangerous life of Tim Biesmans, who suffers from peanut-allergies. For him, death lurks around every corner, even in a first-date’s kiss. During my talk, I traced the evolution of computational research as the third paradigm of scientific discovery, showing you can find computational research in every field, and why it is evolving at its break-neck speed.

During the event, both the public and a jury voted on the best presentation, which would then have to present at the TEDx UHasselt in 2020.

And the Winner is …drum roll… Danny Vanpoucke!

So this story will continue during the 2020 TEDx event at UHasselt, and I hope to see you there 🙂

Casting Keynotes

top: Full action shots of my presentation. Moore’s Law as driving force behind computational research, and pondering the meaning of Artificial Intelligence. Bottom: Yes, I won 🙂