# Tag: tools

## Defensive Programming and Debugging

Last few months, I finally was able to remove something which had been lingering on my to-do list for a very long time: studying debugging in Fortran. Although I have been programming in Fortran for over a decade, and getting quite good at it, especially in the more exotic aspects such as OO-programming, I never got around to learning how to use decent debugging tools. The fact that I am using Fortran was the main contributing factor. Unlike other languages, everything you want to do in Fortran beyond number-crunching in procedural code has very little documentation (e.g., easy dll’s for objects), is not natively supported (e.g., find a good IDE for fortran, which also supports modern aspects like OO, there are only very few who attempt), or you are just the first to try it (e.g., fortran programs for android :o, definitely on my to-do list). In a long bygone past I did some debugging in Delphi (for my STM program) as the debugger was nicely integrated in the IDE. However, for Fortran I started programming without an IDE and as such did my initial debugging with well placed write statements. And I am a bit ashamed to say, I’m still doing it this way, because it can be rather efficient for a large code spread over dozens of files with hundreds of procedures.

However, I am trying to repent for my sins. A central point in this penance was enlisting for the online MOOC  “Defensive Programming and Debugging“. Five weeks of intense study followed, in which I was forced to use command line gdb and valgrind. During these five weeks I also sharpened my skills at identifying possible sources of bugs (and found some unintentional bugs in the course…but that is just me). Five weeks of hard study, and taking tests, I successfully finished the course, earning my certificate as defensive programmer and debugger. (In contrast to my sometimes offensive programming and debugging skills before 😉 .)

## Merry Christmas & Happy New Year

Having fun with my xmgrace-fortran library and fractal code!

## A Spectre and Meltdown victim: VASP

Over the last weekend, two serious cyber security issues were hot news: Meltdown and Spectre [more links, and links](not to be mistaken for a title of a bond-movie). As a result, also academic HPC centers went into overdrive installing patches as fast as possible. The news of the two security issues went hand-in-hand with quite a few belittling comments toward the chip-designers ignoring the fact that no-one (including those complaining now) discovered the problem for over decade. Of course there was also the usual scare-mongering (cyber-criminals will hack our devices by next Monday, because hacks using these bugs are now immediately becoming their default tools etc.) typical since the beginning of the  21st century…but now it is time to return back to reality.

One of the big users on scientific HPC installations is the VASP program(an example), aimed at the quantum mechanical simulation of materials, and a program central to my own work. Due to an serendipitous coincidence of a annoyingly hard to converge job I had the opportunity to see the impact of the Meltdown and Spectre patches on the performance of VASP: 16% performance loss (within the range of the expected 10-50% performance loss for high performance applications [1][2][3]).

The case:

• large HSE06 calculation of a 71 atom defective ZnO supercell.
• 14 irreducible k-points (no reduction of the Hartree-Fock k-points)
• 14 nodes of 24 cores, with KPAR=14, and NPAR=1 (I know NPAR=24 is the recommended option)

The calculation took several runs of about 10 electronic steps (of each about 5-6 h wall-time, about 2.54 years of CPU-time per run) . The relative average time is shown below (error-bars are the standard deviation of the times within a single run). As the final step takes about 50% longer it is treated separately. As you can see, the variation in time between different electronic steps is rather small (even running on a different cluster only changes the time by a few %). The impact of the Meltdown/Spectre patch gives a significant impact.

Impact of Meltdown/Spectre patch on VASP performance for a 336 core MPI job.

The HPC team is currently looking into possible workarounds that could (partially) alleviate the problem. VASP itself is rather little I/O intensive, and a first check by the HPC team points toward MPI (the parallelisation framework required for multi-node jobs) being ‘a’ if not ‘the’ culprit. This means that also an impact on other multi-node programs is to be expected. On the bright side, finding a workaround for MPI would be beneficial for all of them as well.

So far, tests I performed with the HPC team not shown any improvements (recompiling VASP didn’t help, nor an MPI related fix). Let’s keep our fingers crossed, and hope the future brings insight and a solution.

## Resource management on HPC infrastructures.

Computational as a third pillar of science (next to experimental and theoretical) is steadily developing in many fields of science. Even some fields you would less expect it, such as sociology or psychology. In other fields such as physics, chemistry or biology it is much more widespread, with people pushing the boundaries of what is possible. Larger facilities provide access to larger problems to tackle. If a computational physicist is asked if larger infrastructures would not become too big, he’ll just shrug and reply: “Don’t worry, we will easily fill it up, even a machine 1000x larger than that.” An example is given by a pair of physicists who recently published their atomic scale study of the HIV-1 virus. Their simulation of a model containing more than 64 million atoms used force fields, making the simulation orders of magnitude cheaper than quantum mechanical calculations. Despite this enormous speedup, their simulation of 1.2 µs out of the life of an HIV-1 virus (actually it was only the outer skin of the virus, the inside was left empty) still took about 150 days on 3880 nodes of 16 cores on the Titan super computer of Oak Ridge National Laboratory (think about 25 512 years on your own computer).

In Flanders, scientist can make use of the TIER-1 facilities provided by the Flemish Super Computer (VSC). The first Tier-1 machine was installed and hosted at Ghent University. At the end of it’s life cycle the new Tier-1 machine (Breniac) was installed and is hosted at KULeuven. Although our Tier-1 supercomputer is rather modest compared to the Oak Ridge supercomputer (The HIV-1 calculation I mentioned earlier would require 1.5 years of full time calculations on the entire machine!) it allows Flemish scientists (including myself) to do things which are not possible on personal desktops or local clusters. I have been lucky, as all my applications for calculation time were successful (granting me between 1.5 and 2.5 million hours of CPU time every year). With the installment of the new supercomputer accounting of the requested resources has become fully integrated and automated. Several commands are available which provide accounting information, of which mam-balance is the most important one, as it tells how much credits are still available. However, if you are running many calculations you may want to know how many resources you are actually asking and using in real-time. For this reason, I wrote a small bash-script that collects the number of requested and used resources for running jobs:

 Source code

Output of the Bash Script.

Currently, the last part, on the completed jobs, only provides data based on the most recent jobs. Apparently the full qstat information of older jobs is erased. However, it still provides an educated guess of what you will be using for the still queued jobs.

## Scaling of VASP 5.4.1 on TIER-1b BrENIAC

When running programs on HPC infrastructure, one of the first questions to ask yourself is: “How well does this program scale?

In applications for HPC resources, this question plays a central role, often with the additional remark: “But for your specific system!“. For some software packages this is an important remark, for other packages this remark has little relevance, as the package performs similarly for all input (or for given classes of input). The VASP package is one of the latter. For my current resource application at the Flemish TIER-1 I set out to do a more extensive scaling test of the VASP package. This is for 2 reasons. The first being the fact that I will be using a newer version of VASP: vasp 5.4.1 (I am currently using my own multiply patched version 5.3.3.). The second being the fact that I will be using a brand new TIER-1 machine (second Flemish TIER-1, as our beloved muk retired the end of 2016).

Why should I put in the effort to get access to resources on such a TIER-1 supercomputer? Because such machines are the life blood of the computational materials scientist. They are their sidekick in the quest for understanding of materials. Over the past 4 years, I was granted (and used) 20900 node-days of calculation time (i.e. over 8 Million hours of CPU time, or 916 years of calculation time) on the first TIER-1 machine.

Now back to the topic. How well does VASP 5.4.1 behave? That depends on the system at hand, and how good you choose the parallelization settings.

### 1. Parallelization in VASP

VASP provides several parameters which allow for straightforward parallelization of the simulation:

• NPAR : This parameter can be set to parallelize over the electronic bands. As a consequence, the number of bands included in a calculation by VASP will be a multiple of NPAR. (Note : Hybrid calculations are an exception as they require NPAR to be set to the number of cores used per k-point.)
• NCORE : The NCORE parameter is related to NPAR via NCORE=#cores/NPAR, so only one of these can be set.
• KPAR : This parameter can be set to parallelize over the set of irreducible k-points used to integrate over the first Brillouin zone. KPAR should therefore best be a divisor of the number of irreducible k-points.
• LPLANE : This boolean parameter allows one to switch on parallelization over plane waves. In general this will give rise to a small but consistent speedup (observed in previous scaling tests). As such we have chosen to set this parameter = .TRUE. for all calculations.
• NSIM : Sets up a blocked mode for the RMM-DIIS algorithm. (cf. manual, and further tests by Peter Larsson). As our tests do not involve the RMM-DIIS algorithm, this parameter was not set.

In addition, one needs to keep the architecture of the HPC-system in mind as well: NPAR, KPAR and their product should be divisors of the number of nodes (and cores) used.

### 2. Results

Both NPAR and KPAR parameters can be set simultaneously and will have a serious influence on the speed of your calculations. However, not all possible combinations give the same speedup. Even worse, not all combinations are beneficial with regard to speed. This is best seen for a small 2 atom diamond primitive cell.

Timing results for various combinations of the NPAR and KPAR parameters for a 2 atom primitive unit cell of diamond.

Two things are clear. First of all, switching on a parallelization parameter does not necessarily mean the calculation will speed up. In some case it may actually slow you down. Secondly, the best and worst performance is consistently obtained using the same settings. Best: KPAR = maximal, and NPAR = 1, worst: KPAR = 1 and NPAR = maximal.

This small system shows us what you can expect for systems with a lot of k-points and very few electronic bands ( actually, any real calculation on this system would only require 8 electronic bands, not 56 as was used here to be able to asses the performance of the NPAR parameter.)

In a medium sized system (20-100 atoms), the situation will be different. There, the number of k-points will be small (5-50) while the natural number of electronic bands will be large (>100). As a test-case I looked at my favorite Metal-Organic Framework: MIL-47(V).

Timing results for several NPAR/KPAR combinations for the MIL-47(V) system.

This system has only 12 k-points to parallelize over, and 224 electronic bands. The spread per number of nodes is more limited than for the small system. In contrast, the general trend remains the same: KPAR=high, NPAR=low, with an optimum performance when KPAR=#nodes. Going beyond standard DFT, using hybrid functionals, also retains the same picture, although in some cases about 10% performance can be gained when using one half node per k-point. Unfortunatly, as we have very few k-points to start from, this will only be an advantage if the limiting factor is the number of nodes available.

An interesting behaviour is seen when one keeps the k-points/#nodes ratio constant:

Scaling behavior for either a constant number of k-points (for dense k-point grid in medium sized system) or constant k-point/#nodes ratio.

As you can see, VASP performs really well up to KPAR=#k-points (>80% efficiency). More interestingly, if the k-point/#node ratio is kept constant, the efficiency (now calculated as T1/(T2*NPAR) with T1 timing for a single node, and T2 for multiple nodes) is roughly constant. I.e. if you know the walltime for a 2-k-point/2-nodes job, you can use/expect the same for the same system but now considering 20-k-points/20-nodes (think Density of States and Band-structure calculations, or just change of #k-points due to symmetry reduction or change in k-point grid.)  😆

### 3. Conclusions and Guidelines

If one thing is clear from the current set of tests, it is the fact that good scaling is possible. How it is attained, however, depends greatly on the system at hand. More importantly, making a poor choice of parallelization settings can be very detrimental to the obtained speed-up and efficiency. Unfortunately when performing calculations on an HPC system, one has externally imposed limitations to work with:

• Memory available per core
• Number of cores per node[1]
• Size of your system: #atoms, #k-points, & #bands

Here are some guidelines (some open doors as well):

• Wherever possible k-point parallelization should be driven to a maximum (KPAR as large as possible). The limiting factor here is the actual number of k-points and the amount of memory available. The latter due to the fact that a higher value of KPAR leads to a higher memory requirement.[2]
• Use the Γ-version of VASP for Γ-point only calculations. It reduces memory usage significantly (3.7Gb→ 2.8Gb/core for a 512 atom diamond system) and increases the computational efficiency, sometimes even by a factor of 2.
• NPAR parallelization can be used to reduce the memory load for high KPAR calculations, but increasing KPAR will always outperform the same increase of NPAR.
• In case only NPAR parallelization is available, due to too few k-points, and working with large systems, NPAR parallelization is your last resort, and will perform reasonably well, up to a point.
• Electronic steps show very consistent timing, so scaling tests can be performed with only a 5-10 electronic steps, with standard deviations comparable in absolute size for PBE and HSE06.

In short:

### K-point parallelism will save you, wherever possible !

[1] 28 is a lousy number in that regard as its prime-decomposition is 2x2x7, leaving little overlap with prime-decompositions of the number of k-points, which more often than you wish end up being prime numbers themselves 😥
[2] The small system’s memory requirements varied from 0.15 to 1.09 Gb/core for the different combinations.

## One more digit of importance

Over the past few weeks I have bumped into several issues each tracing back to numerical accuracy. Although I have been  programming for almost two decades I never had to worry much about this, making these events seem as-if the universe is trying to tell me something.

Now, let me try to give a proper start to this story; Computational (materials) research is generally perceived as a subset of theoretical (materials) research, and it is true that such a case can be made. It is, however, also true that such thinking can trap us (i.e. the average computational physicist/chemist/mathematician/… programming his/her own code) with numerical accuracy problems. While theoretical equations use exact values for numbers, a computer program is limited by the numerical precision of the variables (e.g. single, double or quadruple precision for real numbers) used in the program. This means that actual numbers with a larger precision are truncated or rounded to the precision of the variable (e.g. 1/3 becomes 0.3333333 instead of 0.333… with an infinite series of 3’s). Most of the time, this is sufficient, and nothing strange will happen. Even more, most of the time, the additional digits would only increase the computational cost while not improving the results in a significant fashion.

### Interstellar disc

To understand the importance, or the lack thereof, of additional significant digits, let us first have a look at the precision of $\pi$ and the circumference and surface area of a disc. We will be looking at a rather large disc, one with a radius equal to the distance between the sun, and the nearest star, Alpha Centauri, which is 39 900 000 000 000 km away. The circumference of this disc is given by $2r\pi$ (or $2.5 \times 10^{14}$ km ). As a single precision variable $\pi$ will have about 7-8 significant digits. This means the calculated circumference will have an accuracy of about 1 000 000 km (or a few times the distance between the earth and the moon). Using a double precision $\pi$ variable, which has a precision of 16 decimal digits, the circumference will be accurately calculated to within a few meters. At quadrupal precision, the $\pi$ variable would have 34 significant decimal digits, and we would even be able to calculate the surface of the disc ($r^2\pi$ or $5.0 \times 10^{33}$ m² ) to within 1 m². Even the surface of a disc the size of our milky way could be calculated with an accuracy of a few hundred square km (or ± the size of Belgium ).

Knowing this, our mind is quickly put at easy regarding possible issues regarding numerical accuracy. However, once in a while we run into one exceptional case (or three, in my case).

### 1. Infinitesimal finite elements

Temperature profile in the insulating layer of a cylindrical wire.

While looking into the theory behind finite elements, I had some fun implementing a simple program which calculated the temperature distribution due to heat transport in an insulating layer. The finite element approach performed rather nicely, leading to good approximate results, already for a few dozen elements. However, I wanted to push the implementation a bit (the limit of infinite elements should give the exact solution). Since the set of equations was solved by a LAPACK subroutine, using 10 000 elements instead of 10 barely impacted the required time (writing the results took most of 2-3 seconds anyway). The results on the other hand were quite funny as you can see in the picture. The initial implementation, with single precision variables, breaks down even worse already at 1000 elements. Apparently the elements had become too small leading to too small variations of the properties in the stiffness-matrix, resulting in the LAPACK subroutine returning nonsense.

So it turns out that you can have too many elements in a finite elements method.

### 2. Small volumes: A few more digits please

Optimized volume in Equation of State fit, as function of the range of the fitting data, and step size between data-points. green diamonds, blue triangles and black discs: 1% , 0.5% and 0.25% volume steps respectively.

Recently, I started working at the Wide Band Gap Materials group at the University of Hasselt. So in addition to MOFs I am also working on diamond based materials. While setting up a series of reference calculations, using scripts which already suited me well during my work on MOFs, I was trying to figure out for which volume range, and step size I would get a sufficient convergence in my Equation-of-States Fitting procedure. For the MOFs this is a computationally rather expensive (and tedious) exercise, which, fortunately, gives clear results. For the 2-atom diamond unit cell the calculations are ridiculously fast (in comparison), but the results were confusing. As you can see in the picture, the values I obtained from the different fits seem to oscillate. Checking my E(V) data showed nothing out of the ordinary. All energies and volumes were clearly distinguishable, with the energies given with a precision of 0.001 meV, and the volumes with a precision of 0.01 Å3. However, as you can see in the figure, the volume-oscillations are of the order of 0.001 Å3, ten times smaller than our input precision. Calculating the volumes based on the lattice parameters to get a precision of 10-6 Å3 for the input volumes stabilizes the convergence behavior of the fits (open symbols in the figure). This problem was not present with the MOFs since these have a unit cell volume which is one hundred times larger, so a precision of 0.01 Åmakes the relative error on the volumes one hundred times smaller than was the case for diamond.

In essence, I was trying to get more accurate output than the input I provided, which will never give sensible results (even if they actually look sensible).

### 3. Many grains of sand really start to pile up after a while

The last one is a bit embarrassing as it lead to a bug in the HIVE-toolbox, which is fixed in the mean time.

One of the HIVE-toolbox users informed me that the dosgrabber routine had crashed because it could not find the Fermi-level in the output of a VASP calculation. Although VASP itself gives a value for the Fermi-level, I do not use it in the above sub-program, since this value tend to be incorrect for spin-polarized systems with different minority and majority spins. However, in an attempt to be smart (and efficient) I ended up in trouble. The basic idea behind my Fermi-level search is just running through the entire Density of States-spectrum until you have counted for all the electrons in the system. Because the VASP estimate for the Fermi-level is not that far of, you do not need to run through the entire list of several thousand entries, but you could just take a subset-centered around the estimated Fermi-level and check in that subset, speeding this up by a factor of 10 to 100. Unfortunately I calculated the energy step size between density of states entries as the difference between the first two entries, which are given to with an accuracy of 0.001 eV. I guess you already have a feeling what will be the problem. When the index of the estimated Fermi-level is 1000, the error will be of the order of 1 eV, which is much larger than the range I took into account. Fortunately, the problem is easily solved by calculating the energy step size as the difference between the first and last index, and divide by the number of steps, making the error in the particular case more than a thousand times smaller.

So, trying to be smart, you always need to make sure you really are being smart, and remember that small number can become very big when there are a lot of them.

## Animated Primes

Optimus Prime, the most prime of all autobots.[source]

2, 3, 5, 7, 11, 13, 17, 19, 23, …and on toward infinity. Prime numbers are a part of mathematics which seduce most of us at some point in our lives to try and “solve them”. There is structure in them, but too little to be easily understandable, and too much to let go and consider them chaos.

Recently, my girlfriend got sucked into this madness. Luckily she is not trying to generate the next largest prime (which would need to have more than 22 Million digits, although I might have jinxed it now). Instead she is looking for order into chaos; leading to pen and paper graphs, evolving into excel worksheets being abused as ordinary bitmap pictures and finally mathematica applications for even more heavy lifting.

These last exercises came to a somewhat grinding halt when mathematica refused to generate a gif-animation (it just failed to do the job without providing any warning that something might have gone awry). However, because it could generate the separate frames without any issues, I was tempted to dig up an old fortran module which allowed me to generate gif images and animations. I remembered correctly that it was able to both read and write gif-images, and so, after 10 minutes of writing a small program (9 minutes remembering how things worked) and another 10 linking in all dependencies and fixing some inconsistencies, I had a working program for generating a gif animation from separate gif images.

 Small program for reading single gif images and storing them as gif-animation.

The initial test went smoothly. Using some simple gif images I created in paint, the resulting 4 frame animation is the one on the left (which is probably giving you a headache as you read this ;-).

The real test, with the single frame gifs Sylvia had prepared, was rather interesting: It failed miserably. Fortunately it also taught us the origin of the problem: mathematica was generating frames of different dimensions. And although this does not give rise to problems in the workbook or generated avi, it does kill an animated gif. After fixing this, also mathematica successfully generated the animated gif Sylvia wanted to see. Also my own small program was now able to perform the intended trick, although it showed that a different color-map was created for each frame by mathematica, something my own implementation did not handle well (we’ll fix that somewhere in the future).

If you do not suffer from epileptic seizures (or not yet), you could check out the results in the spoiler below.

Spoiler Inside: gif-animations SelectShow

## Virtual Winterschool 2016: Computational Solid State Physics & Chemistry

In just an hour, I’ll be presenting my talk at the virtual winterschool 2016. In an attempt to tempt fate as much as possible I will try to give/run real-time examples on our HPC in Gent, however at this moment no nodes are available yet to do so. Let’s keep our fingers crossed and see if it all works out.

### Abstract

Modern materials research has evolved to the point where it is now common practice to manipulate materials at nanometer scale or even at the atomic scale (e.g. Intel’s skylake architecture with 14nm features, atomic layer deposition and surface structure manipulations with an STM-tip). At these scales, quantum mechanical effects become ever more relevant, making their prediction important for the field of materials science.

In this session, we will discuss how advanced quantum mechanical calculations can be performed for solids and indicate some differences with standard quantum chemical approaches. We will touch upon the relevant concepts for performing such calculations (plane-wave basis-sets, pseudo-potentials, periodic boundary conditions,…) and show how the basic calculations are performed with the VASP-code. You will familiarize yourself with the required input files and we will discuss several of the most important output-files and the data they contain.

At the end of this session you should be able to set up a single-point calculation, a structure optimization, a density of states and band structure calculation.

## HIVE-STM: A simple post-processing tool for simulating STM

While I was working on my PhD-thesis on Pt nanowires at the university of Twente, one of the things I needed was a method for simulating scanning-tunneling microscopy (STM) images in a quick and easy way. This was because the main experimental information on on these nanowires was contained in STM-images.

Because I love programming, I ended up writing a Delphi-program for this task. Delphi, being an Object Oriented version of the Pascal-programming language containing a Visual Components Library, was ideally suited for writing an easy to use program with a graphical user interface (GUI). The resulting STM-program was specifically designed for my personal needs and the system I was working on at that time.

In August 2008, I was contacted by two German PhD students, with the request if it would be possible for them to use my STM program. In October, an American post-doc and a South-Korean graduate student followed with similar requests, from which point onward I started getting more and more requests from researchers from all over the world. Now, seven years later, I decided to put all “HIVE-users” in a small data-base just to keep track of their number and their affiliation. I already knew I send the program to quite a lot of people, but I was still amazed to discover that it were 225 people from 34 countries.

Bar-graph showing the evolution in requests for the HIVE-STM program.

There is a slow but steady increase in requests over the years, with currently on average about one request every week. It is also funny to see there was a slight setback in requests both times I started in a new research-group. For 2015, the data is incomplete, as it does not include all requests of the month December. Another way to distribute the requests is by the month of the year. This is a very interesting graph, since it clearly shows the start of the academic year (October). There are two clear minima (March and September), for which the later is probably related due to the fact that it is the last month of before the start of the academic year (much preparation for new courses) and, in case of the solid state community, this month is also filled with conferences. The reason why there is a minimum in March, however, escapes me ( 💡 all suggestions are welcome 💡 ).

Distribution of requests for the HIVE-STM program on a monthly basis.

The geographic distribution of affiliations of those requesting the STM-program shows Europe, Azia and America to take roughly equal shares, while African affiliations are missing entirety. Hopefully this will change after the workshop on visualization and analysis of VASP outputs delivered at the Center for High Performance Computing‘s 9th National Meeting in South Africa by Dr. David Carballal. By far the most requests come from the USA (57), followed by China(23) and then Germany(15). South-Korea(14) unexpectedly takes the fourth place, while the fifth place is a tie between the UK, Spain and India(12 each).

Distribution of Hive requests per country and continent.

All in all, the STM program seems to be of interest to many more researchers than I would have ever expected, and has currently been cited about 25 times, so it is time to add a page listing these papers as examples of what can be done with HIVE(which has in the mean time been done, check out useful link n°2).

## Fortran dll’s and libraries: a Progress bar

In the previous fortran tutorials, we learned the initial aspects of object oriented programming (OOP) in fortran 2003. And even though our agent-based opinion-dynamics-code is rather simple, it can quickly take several minutes for a single run of the program to finish. Two tools which quickly become of interest for codes that need more than a few minutes to run are: (1) a progress bar, to track the advance of the “slow” part of the code and prevent you from killing the program 5 seconds before it is to finish, and (2) a timer, allowing you to calculate the time needed to complete certain sections of code, and possibly make predictions of the expected total time of execution.

In this tutorial, we will focus on the progress bar. Since our (hypothetical) code is intended to run on High-Performance Computing (HPC) systems and is written in the fortran language, there generally is no (or no easy) access to GUI’s. So we need our progress bar class to run in a command line user interface. Furthermore, because it is such a widely useful tool we want to build it into a (shared) library (or dll in windows).

# The progress bar class

What do we want out of our progress bar? It needs to be easy to use, flexible and smart enough to work nicely even for a lazy user. The output it should provide is formatted as follows: <string> <% progress> <text progress bar>, where the string is a custom character string provided by the user, while ‘%progress’ and ‘text progress bar’ both show the progress. The first shows the progress as an updating number (fine grained), while the second shows it visually as a growing bar (coarse grained).

 TProgressBar Class
1. type, public :: TProgressBar
2. private
3. logical :: init
4. logical :: running
5. logical :: done
6. character(len=255) :: message
7. character(len=30) :: progressString
8. character(len=20) :: bar
9. real :: progress
10. contains
11. private
12. procedure,pass(this),public :: initialize
13. procedure,pass(this),public :: reset
14. procedure,pass(this),public :: run
15. procedure,pass(this),private:: printbar
16. procedure,pass(this),private:: updateBar
17. end type TProgressBar

All properties of the class are private (data hiding), and only 3 procedures are available to the user: initialize, run and reset. The procedures, printbar and updatebar are private, because we intend the class to be smart enough to decide if a new print and/or update is required. The reset procedure is intended to reset all properties of the class. Although one might consider to make this procedure private as well, it may be useful to allow the user to reset a progress bar in mid progress.(The same goes for the initialize procedure.)

 Run procedure of the TProgressBar class
1. subroutine run(this,pct,Ix,msg)
2. class(TProgressBar) :: this
3. real::pct
4. integer, intent(in), optional :: Ix
5. character(len=*),intent(in),optional :: msg
6.
7. if (.not. this%init) call this%initialize(msg)
8. if (.not. this%done) then
9. this%running=.true.
10. this%progress=pct
11. call this%updateBar(Ix)
12. call this%printbar()
13. if (abs(pct-100.0)<1.0E-6) then
14. this%done=.true.
15. write(*,'(A6)') "] done"
16. end if
17. end if
18.
19. end subroutine run

In practice, the run procedure is the heart of the class, and the only procedure needed in most applications. It takes 3 parameters: The progress (pct), the number of digits to print of pct (Ix),and the <string> message (msg). The later two parameters are even optional, since msg may already have been provided if the initialize procedure was called by the user. If the class was not yet initialized it will be done at the start of the procedure. And while the progress bar has not yet reached 100% (within 1 millionth of a %) updates and prints of the bar are performed. Using a set of Boolean properties (init, running, done), the class keeps track of its status. The update and print procedures just do this: update the progress bar data and print the progress bar. To print the progress bar time and time again on the same line, we need to make use of the carriage return character (character 13 of the ASCII table):

The advance=’NO‘ option prevents the write statement to move to the next line. This can sometimes have the unwanted side-effect that the write statement above does not appear on the screen. To force this, we can use the fortran 2003 statement flush(OUTPUT_UNIT), where “OUTPUT_UNIT” is a constant defined in the intrinsic fortran 2003 module iso_fortran_env. For older versions of fortran, several compilers provided a (non standard) flush subroutine that could be called to perform the same action. As such, we now have our class ready to be used. The only thing left to do is to turn it into a dll or shared library.

# How to create a library and use it

There are two types of libraries: static and dynamic.

Static libraries are used to provide access to functions/subroutines at compile time to the library user. These functions/subroutines are then included in the executable that is being build. In linux environments these will have the extension “.a”, with the .a referring to archive. In a windows environment the extension is “.lib”, for library.

Dynamic libraries are used to provide access to functions/subroutines at run time. In contrast to static libraries, the functions are not included in the executable, making it smaller in size. In linux environments these will have the extension “.so”, with the .so referring to shared object. In a windows environment the extension is “.dll”, for dynamically linked library.

In contrast to C/C++, there is relatively little information to be found on the implementation and use of libraries in fortran. This may be the reason why many available fortran-“libraries” are not really libraries, in the sense meant here. Instead they are just one or more files of fortran code shared by their author(s), and there is nothing wrong with that. These files can then be compiled and used as any other module file.

So how do we create a library from our Progressbar class? Standard examples start from a set of procedures one wants to put in a library. These procedures are put into a .f or .f90 file. Although they are not put into a module (probably due to the idea of having compatibility with fortran 77) which is required for our class, this is not really an issue. The same goes for the .f03 or .f2003 extension for our file containing a fortran 2003 class. To have access to our class and its procedures in our test program, we just need to add the use progressbarsmodule clause. This is because our procedures and class are incorporated in a module (in contrast to the standard examples). Some of the examples I found online also include compiler dependent pragmas to export and import procedures from a dll. Since I am using gfortran+CB for development, and ifort for creating production code, I prefer to avoid such approaches since it hampers workflow and introduces another possible source of bugs.

The compiler setups I present below should not be considered perfect, exhaustive or fool-proof, they are just the ones that work fine for me. I am, however, always very interested in hearing other approaches and fixes in the comments.

## Windows

The windows approach is very easy. We let Code::Blocks do all the hard work.

### shared library: PBar.dll

Creating the dll : Start a new project, and select the option “Fortran DLL“. Follow the instructions, which are similar to the setup of a standard fortran executable. Modify/replace/add the fortran source you wish to include into your library and build your code (you can not run it since it is a library).

Creating a user program : The program in which you will be using the dll is setup in the usual way. And to get the compilation running smoothly the following steps are required:

• Add the use myspecificdllmodule clause where needed, with myspecificdllmodule the name of the module included in the dll you wish to use at that specific point.
• If there are modules included in the dll, the *.mod files need to be present for the compiler to access upon compilation of the user program. (Which results in a limitation with regard to distribution of the dll.)
• Upon running the program you only need the program executable and the dll.

### static library

The entire setup is the same as for the shared library. This time, however, choose the “Fortran Library” option instead of Fortran dll. As the static library is included in the executable, there is no need to ship it with the executable, as is the case for the dll.

## Unix

For the unix approach we will be working on the command line, using the intel compiler, since this compiler is often installed at HPC infrastructures.

### static library: PBar.a

After having created the appropriate fortran files you wish to include in your library (in our example this is always a single file: PBar.f03, but for multiple files you just need to replace PBar.f03 with the list of files of interest.)

1. Create the object files:
ifort -fpic -c -free -Tf Pbar.f03

Where -fpic tells the compiler to generate position independent code, typical for use in a shared object/library, while -c tells the compiler to create an object file. The -free and -Tf compiler options are there to convince the compiler that the f03 file is actual fortran code to compile and that it is free format.

2. Use the GNU ar tool to combine the object files into a library:
ar rc PBarlib.a PBar.o
3. Compile the program with the library
ifort TestProgram.f90 PBarlib.a -o TestProgram.exe

Note that also here the .mod file of our Progressbarsmodule needs to be present for the compilation to be successful.

### shared library: PBar.so

For the shared library the approach does not differ that much.

1. Create the object files:
ifort -fpic -c -free -Tf Pbar.f03

In this case the fpic option is not optional in contrast to the static library above. The other options are the same as above.

2. Compile the object files into a shared library:
ifort -shared PBar.o -o libPBar.so

The compiler option -shared creates a shared library, while the -o option allows us to set the name of the library.

3. Compile the program with the library
ifort TestProgram.f90 libPBar.so -o TestProgram.exe

Note that also here the .mod file of our Progressbarsmodule needs to be present for the compilation to be successful. To run the program you also need to add the location of the library file libPBar.so to the environment variable LD_LIBRARY_PATH

# One small pickle

HPC systems may perform extensive buffering of data before output, to increase the efficiency of the machine (disk-writes are the slowest memory access option)…and as a result this can sometimes overrule our flush command. The progressbar in turn will not show much progress until it is actually finished, at which point the entire bar will be shown at once. There are options to force the infrastructure not to use this buffering (and the system administrators in general will not appreciate this), for example by setting the compiler flag -assume nobuffered_stdout. So the best solution for HPC applications will be the construction of a slightly modified progress bar, where the carriage return is not used.

Special thanks also to the people of stack-exchange for clarifying some of the issues with the modules.