51 results for diamond

Review of 2017

Happy New Year

2017 has come and gone. 2018 eagerly awaits getting acquainted. But first we look back one last time, trying to turn this into a old tradition. What have I done during the last year of some academic merit.

Publications: +4

Completed refereeing tasks: +8

  • The Journal of Physical Chemistry (2x)
  • Journal of Physics: Condensed Matter (3x)
  • Diamond and Related Materials (3x)

Conferences & workshops: +5 (Attended) 

  • Int. Conference on Diamond and Carbon Materials (DCM) 2017, Gothenburg, Sweden, September 3rd-7th, 2017 [oral presentation]
  • Summerschool: “Upscaling techniques for mathematical models involving multiple scales”, Hasselt, Belgium, June 26th-29th, 2017 [poster presentation]
  • VSC-user day, Brussels, Belgium, June 2nd, 2017 [poster presentation]
  • E-MRS 2017 Spring Meeting, Strasbourg, France, May 22nd-26th, 2017 [1 oral + 2 poster presentations]
  • SBDD XXII, Hasselt University, Belgium, March 8th-10th, 2017 [poster presentation]

PhD-students: +1

  • Mohammadreza Hosseini (okt.-… ,Phd student physical chemistry, Tarbiat Modares University, Teheran, Iran)

Bachelor-students: +2

Current size of HIVE:

  • 48.5K lines of program (code: 70 %)
  • ~70 files
  • 45 (command line) options

Hive-STM program:

And now, upward and onward, a new year, a fresh start.

Permanent link to this article: https://dannyvanpoucke.be/review-of-2017-en/

Audioslides tryout.

One of the new features provided by Elsevier upon publication is the creation of audioslides. This is a kind of short presentation of the publication by one of the authors. I have been itching to try this since our publication on the neutral C-vancancy was published. The interface is quite intuitive, although the adobe flash tend to have a hard time finding the microphone. However, once it succeeds, things go quite smoothly. The resolution of the slides is a bit low, which is unfortunate (but this is only for the small-scale version, the large-scale version is quite nice as you can see in the link below). Maybe I’ll make a high resolution version video and put it on Youtube, later.

The result is available here (since the embedding doesn’t play nicely with WP).

And a video version can be found here.
 

Permanent link to this article: https://dannyvanpoucke.be/audioslides-tryout-en/

Exa-scale computing future in Europe?

As a computational materials scientist with a main research interest in the ab initio simulation of materials, computational resources are the life-blood of my research. Over the last decade, I have seen my resource usage grow from less than 100.000 CPU hours per year to several million CPU-hours per year. To satisfy this need for computational resources I have to make use of HPC facilities, like the TIER-2 machines available at the Flemish universities and the Flemish TIER-1 supercomputer, currently hosted at KU Leuven. At the international level, computational scientists have access to so called TIER-0 machines, something I no doubt will make use of in the future. Before I continue, let me first explain a little what this TIER-X business actually means.

The TIER-X notation is used to give an indication of the size of the computer/supercomputer indicated. There are 4 sizes:

  •  TIER-3: This is your personal computer(laptop/desktop) or even a small local cluster of a research group. It can contain from one (desktop) up to a few hundred CPU’s (local cluster). Within materials research, this is sufficient for quite a few tasks: post-processing of data, simple force-field based calculations, or even small quantum chemical or solid state calculations. A large fraction of the work during my first Ph.D. was performed on the local cluster of the CMS.
  • TIER-2: This is a supercomputer hosted by an institute or university. It generally contains over 1000 CPUs and has a peak performance of >10 TFLOPS (1012 Floating Point Operations Per Second, compare this to 1-50×10FLOPS or 1-25 GFLOPS of an average personal computer). The TIER-2 facilities of the VUB and UAntwerp both have a peak performance of about 75TFLOPS , while the machines at Ghent University and the KU Leuven/Uhasselt facilities both have a peak performance of about 230 TFLOPS. Using these machines I was able to perform the calculations necessary for my study of dopant elements in cerates (and obtain my second Ph.D.).
  • TIER-1: Moving up one more step, there are the national/regional supercomputers. These generally contain over 10.000 CPUs and have a peak performance of over 100 TFLOPS. In Flanders the Flemish Supercomputer Center (VSC) manages the TIER-1 machine (which is being funded by the 5 Flemish universities). The first TIER-1 machine was hosted at Ghent University, while the second and current one is hosted at KU Leuven, an has a peak performance of 623 TFLOPS (more than all TIER-2 machines combined), and cost about 5.5 Million € (one of the reasons it is a regional machine). Over the last 5 years, I was granted over 10 Million hours of CPU time, sufficient for my study of Metal-Organic Frameworks and defects in diamond.
  • TIER-0: This are international level supercomputers. These machines contain over 100.000 CPUs, and have a peak performance in excess of 1 PFLOP (1 PetaFLOP = 1000 TFLOPS). In Europe the TIER-0 facilities are available to researchers via the PRACE network (access to 7 TIER-0 machines, accumulated 43.49 PFLOPS).

This is roughly the status of what is available today for Flemish scientists at various levels. With the constantly growing demand for more processing power, the European union, in name of EuroHPC, has decided in march of this year, that Europe will host two Exa-scale computers. These machines will have a peak performance of at least 1 EFLOPS, or 1000 PFLOPS. These machines are expected to be build by 2024-2025. In June, Belgium signed up to EuroHPC as the eighth country participating, in addition, to the initial 7 countries (Germany, France, Spain, Portugal, Italy, Luxemburg and The Netherlands).

This is very good news for all involved in computational research in Flanders. There is the plan to build these machines, there is a deadline, …there just isn’t an idea of what these machines should look like (except: they will be big, massively power consuming and have a target peak performance). To get an idea what users expect of such a machine, Tier-1 and HPC users have been asked to put forward requests/suggestions of what they want.

From my user personal experience, and extrapolating from my own usage I see myself easily using 20 million hours of CPU time each year by the time these Exa-scale machines are build. Leading a computational group would multiply this value. And then we are talking about simple production purpose calculations for “standard” problems.

The claim that an Exa-scale scale machine runs 1000x faster than a peta-scale machine, is not entirely justified, at least not for the software I am generally encountering. As software seldom scales linearly, the speed-gain from Exa-scale machinery mainly comes from the ability to perform many more calculations in parallel. (There are some exceptions which will gain within the single job area, but this type of jobs is limited.) Within my own field, quantum mechanical calculation of the electronic structure of periodic atomic systems, the all required resources tend to grow with growth of the problem size. As such, a larger system (=more atoms) requires more CPU-time, but also more memory. This means that compute nodes with many cores are welcome and desired, but these cores need the associated memory. Doubling the cores would require the memory on a node to be doubled as well. Communication between the nodes should be fast as well, as this will be the main limiting factor on the scaling performance. If all this is implemented well, then the time to solution of a project (not a single calculation) will improve significantly with access to Exa-scale resources. The factor will not be 100x from a Pflops system, but could be much better than 10x. This factor 10 also takes into account that projects will have access to much more demanding calculations as a default (Hybrid functional structure optimization instead of simple density functional theory structure optimization, which is ~1000x cheaper for plane wave methods but is less accurate).

At this scale, parallelism is very important, and implementing this into a program is far from a trivial task. As most physicists/mathematicians/chemists/engineers may have the skills for writing scientifically sound software, we are not computer-scientists and our available time and skills are limited in this regard. For this reason, it will become more important for the HPC-facility to provide parallelization of software as a service. I.e. have a group of highly skilled computer scientists available to assist or even perform this task.

Next to having the best implementation of software available, it should also be possible to get access to these machines. This should not be limited to a happy few through a peer review process which just wastes human research potential. Instead access to these should be a mix of guaranteed access and peer review.

  • Guaranteed access: For standard production projects (5-25 million CPU hours/year) university researchers should have a guaranteed access model. This would allow them to perform state of the are research without too much overhead. To prevent access to people without the proven necessary need/skills it could be implemented that a user-database is created and appended upon each application. Upon first application, a local HPC-team (country/region/university Tier-1 infrastructure) would have to provide a recommendation with regard to the user, including a statement of the applicant’s resource usage at that facility. Getting resources in a guaranteed access project would also require a limited project proposal (max 2 pages, including user credentials, requested resources, and small description of the project)
  • Peer review access: This would be for special projects, in which the researcher requires a huge chunk of resources to perform highly specialized calculations or large High-throughput exercises (order of 250-1000 million CPU hours, e.g. Nature Communications 8, 15959 (2017)). In this case a full project with serious peer review (including rebuttal stage, or the possibility to resubmit after considering the indicated problems). The goal of this peer review system should not be to limit the number of accepted projects, but to make sure the accepted projects run successfully.
  • Pay per use: This should be the option for industrial/commercial users.

What could an HPC user as myself do to contribute to the success of EuroHPC? This is rather simple, run the machine as a pilot user (I have experience on most of the TIER-2 clusters of Ghent University and both Flemish Tier-1 machines. I successfully crashed the programs I am using by pushing them beyond their limits during pilot testing, and ran into rather unfortunate issues. 🙂 That is the job of a pilot user, use the machine/software in unexpected ways, such that this can be resolved/fixed by the time the bulk of the users get access.) and perform peer review of the lager specialized projects.

Now the only thing left to do is wait. Wait for the Exa-scale supercomputers to be build…7 years to go…about 92 node-days on Breniac…a starting grant…one long weekend of calculations.

Appendix

For simplicity I use the term CPU to indicate a single compute core, even though technically, nowadays a single CPU will contain multiple cores (desktop/laptop: 2-8 cores, HPC-compute node: 2-20 cores / CPU (or more) ). This to make comparison a bit more easy.

Furthermore, modern computer systems start more and more to rely on GPU performance as well, which is also a possible road toward Exa-scale computing.

Orders of magnitude:

  • G = Giga = 109
  • T = Tera = 1012
  • P = Peta = 1015
  • E = Exa = 1018

Permanent link to this article: https://dannyvanpoucke.be/exa-scale-computing/

Bachelor Projects Completed: 2 new computational materials scientists initialised

The black arts of computational materials science.

Black arts of computational materials science.

Just over half a year ago, I mentioned that I presented two computational materials science related projects for the third bachelor physics students at the UHasselt. Both projects ended up being chosen by a bachelor student, so I had the pleasure of guiding two eager young minds in their first steps into the world of computational materials science. They worked very hard, cursed their machine or code (as any good computational scientist should do once in a while, just to make sure that he/she is still at the forefront of science) and survived. They actually did quite a bit more than “just surviving”, they grew as scientists and they grew in self-confidence…given time I believe they may even thrive within this field of research.

One week ago, they presented their results in a final presentation for their classmates and supervisors. The self-confidence of Giel, and the clarity of his story was impressive. Giel has a knack for storytelling in (a true Pan Narrans as Terry Pratchett would praise him). His report included an introduction to various topics of solid state physics and computational materials science in which you never notice how complicated the topic actually is. He just takes you along for the ride, and the story unfolds in a very natural fashion. This shows how well he understands what he is writing about.

This, in no way means his project was simple or easy. Quite soon, at the start of his project Giel actually ran into a previously unknown VASP bug. He had to play with spin-configurations of defects and of course bumped into a hand full of rookie mistakes which he only made once *thumbs-up*. (I could have warned him for them, but I believe people learn more if they bump their heads themselves. This project provided the perfect opportunity to do so in a safe environment. 😎 )  His end report was impressive and his results on the Ge-defect in diamond are of very good quality.

The second project was brought to a successful completion by Asja. This very eager student actually had to learn how to program in fortran before he could even start. He had to implement code to calculate partial phonon densities with the existing HIVE code. Along the way he also discovered some minor bugs (Thank you very much 🙂  ) and crashed into a rather unexpected hard one near the end of the project. For some time, things looked very bleak indeed: the partial density of equivalent atoms was different, and the sum of all partial densities did not sum to the total density. As a result there grew some doubts if it would be possible to even fulfill the goal of the project. Luckily, Asja never gave up and stayed positive, and after half a day of debugging on my part the culprit was found (in my part of the code as well). Fixing this he quickly started torturing his own laptop calculating partial phonon densities of state for Metal-organic frameworks and later-on also the Ge-defect in diamond, with data provided by Giel. Also these results are very promising and will require some further digging, but they will definitely be very interesting.

For me, it has been an interesting experience, and I count myself lucky with these two brave and very committed students. I wish them all the best of luck for the future, and maybe we meet again.

Permanent link to this article: https://dannyvanpoucke.be/bachelor-projects-completed/

VSC-user day 2017: The Poster Edition

Last Friday, the HPC infrastructure in Flanders got celebrated by the VSC user day. Being one of the Tier-1 supercomputer users at UHasselt, I was asked if I could present a poster at the meeting, showcasing the things I do here. Although I was very interested in this event, educational obligations (the presentations of the bachelor projects, on which I will post later) prevented me from attending the meeting.

As means of a compromise, I created a poster for the meeting which Geert Jan Bex, our local VSC/HPC support team, would be so nice to put up at the event. The poster session was preceded by a set of 1-minute presentations of the posters, for which a slide had to be made. As I could not be physically present, I provided the organizers a slide which contained a short description that could be used as the 1-minute presentation. Unfortunately, things got a little mixed up, as Geert Jan accidentally printed this slide as the poster (which gave rise to some difficulties in the printing process 🙄 ). So for those who might have had an interest in the actual poster, let me put it up here:

This poster presents my work on linker functionalisation of the MIL-47, which got recently published in the Journal of physical chemistry C, and the diamond work on the C-vacancy, which is currently submitted. Clicking on the poster above will provide you the full size image. The 1-minute slide presentation, which erroneously got printed as poster:

Permanent link to this article: https://dannyvanpoucke.be/vsc-user-day-2017-the-poster-edition/

VASP tutor: Structure optimization through Equation-of-State fitting

EOS-fitting Diamond and Graphite

Materials properties, such as the electronic structure, depend on the atomic structure of a material. For this reason it is important to optimize the atomic structure of the material you are investigating. Generally you want your system to be in the global ground state, which, for some systems, can be very hard to find. This can be due to large barriers between different conformers, making it easy to get stuck in a local minimum. However, a very shallow energy surface will be problematic as well, since optimization algorithms can get stuck wandering these plains forever, hopping between different local minima (Metal-Organic Frameworks (MOFs) and other porous materials like Covalent-Organic Frameworks and Zeolites are nice examples).

VASP, as well as other ab initio software, provides multiple settings and possibilities to perform structure optimization. Let’s give a small overview, which I also present in my general VASP introductory tutorial, in order of increasing workload on the user:

  1. Experimental Structure: This the most lazy option, as it entails just taking an experimentally obtained structure and not optimizing it at all. This should be avoided unless you have a very specific reason why you want to use specifically this geometry. (In this regard, Force-Field optimized structures fall into the same category.)
  2. Simple VASP Optimization: You can let VASP do the heavy lifting. There are several parameters which help with this task.
    1. IBRION = 1 (RMM-DIIS, good close to a minimum), 2 (conjugate gradient, safe for difficult problems, should always work), 3 (damped molecular dynamics, useful if you start from a bad initial guess) The IBRION tag determines how ions are moved during relaxation.
    2. ISIF = 2 (Ions only, fixed shape and volume), 4 (Ions and cell shape, fixed volume), 3 (ions, shape and volume relaxed) The ISIF tag determines how the stress tensor is calculated, and which degrees of freedom can change during a relaxation.
    3. ENCUT = max(ENMAX)x1.3  To reduce Pulay stresses, it is advised to increase the basis set to 1.3x the default value, which is the largest ENMAX value for the atoms used in your system.
  1. Volume Scan (Quick and dirty): For many systems, especially simple systems, the internal coordinates of the ions are often well represented in available structure files. The main parameter which needs optimization is the lattice parameter. This is also often the main change if different functional are used. In a quick and dirty volume scan, one performs a set of static calculations, only the volume of the cell is changed. The shape of the cell and the internal atom coordinates are kept fixed. Fitting a polynomial to the resulting Energy-Volume data can then be used to obtain the optimum volume. This option is mainly useful as an initial guess and should either be followed by option 2, or improved to option 4.
  2. Equation of state fitting to fixed volume optimized structures: This approach is the most accurate (and expensive) method. Because you make use of fixed volume optimizations (ISIF = 4), the errors due to Pulay stresses are removed. They are still present for each separate fixed volume calculation, but the equation of state fit will average out the basis-set incompleteness, as long as you take a large enough volume range: 5-10%. Note that the 5-10% volume range is generally true for small systems. In case of porous materials, like MOFs, ±4% can cover a large volume range of over 100 Å3. Below you can see a pseudo-code algorithm for this setup. Note that the relaxation part is split up in several consecutive relaxations. This is done to further reduce basis-set incompleteness errors. Although the cell volume does not change, the shape does, and the original sphere of G-vectors is transformed into an ellipse. At each restart this is corrected to again give a sphere of G-vectors. For many systems the effect may be very small, but this is not always the case, and it can be recognized as jumps in the energy going from one relaxation calculation to the next. The convergence is set the usual way for a relaxation in VASP (EDIFF and EDIFFG parameters) and a threshold in the number of ionic steps should be set as well (5-10 for normal systems is reasonable, while for porous/flexible materials you may prefer a higher value). There exist several possible equations-of-state which can be used for the fit of the E(V) data. The EOSfit option of HIVE-4 implements 3:
    1. Birch-Murnaghan third order isothermal equation of state
    2. Murnaghan equation of state
    3. Rose-Vinet equation of state (very well suited for (flexible) MOFs)

    Using the obtained equilibrium volume a final round of fixed volume relaxations should be done to get the fully optimized structure.

For (set of Volumes: equilibrium volume ±5%){
	Step 1          : Fixed Volume relaxation
	(IBRION = 2, ISIF=4, ENCUT = 1.3x ENMAX, LCHARG=.TRUE., NSW=100)
	Step 2→n-1: Second and following fixed Volume relaxation (until a threshold is crossed and the structure is relaxed in fewer than N ionic steps) (IBRION = 2, ISIF=4, ENCUT = 1.3x ENMAX, ICHARG=1, LCHARG=.TRUE., NSW=100) 
	Step n : Static calculation (IBRION = -1, no ISIF parameter, ICHARG=1, ENCUT = 1.3x ENMAX, ICHARG=1, LCHARG=.TRUE., NSW=0) 
} 
Fit Volume-Energy to Equation of State.
Fixed volume relaxation at equilibrium volume. (With continuations if too many ionic steps are required.) 
Static calculation at equilibrium volume
EOS-fitting Diamond and Graphite

Top-left: Volume scan of Diamond. Top-right: comparison of volume scan and equation of state fitting to fixed volume optimizations, showing the role of van der Waals interactions. Bottom: Inter-layer binding in graphite for different functionals.

Some examples

Let us start with a simple and well behaved system: Diamond. This material has a very simple internal structure. As a result, the internal coordinates should not be expected to change with reasonable volume variations. As such, a simple volume scan (option 3), will allow for a good estimate of the equilibrium volume. The obtained bulk modulus is off by about 2% which is very good.

Switching to graphite, makes things a lot more interesting. A simple volume scan gives an equilibrium volume which is a serious overestimation of the experimental volume (which is about 35 Å3), mainly due to the overestimation of the c-axis. The bulk modulus is calculated to be 233 GPa a factor 7 too large. Allowing the structure to relax at fixed volume changes the picture dramatically. The bulk modulus drops by 2 orders of magnitude (now it is about 24x too small) and the equilibrium volume becomes even larger. We are facing a serious problem for this system. The origin lies in the van der Waals interactions. These weak forces are not included in standard DFT, as a result, the distance between the graphene sheets in graphite is gravely overestimated. Luckily several schemes exist to include these van der Waals forces, the Grimme D3 corrections are one of them. Including these the correct behavior of graphite can be predicted using an equation of state fit to fixed volume optimizations.(Note that the energy curve was shifted upward to make the data-point at 41 Å3 coincide with that of the other calculations.) In this case the equilibrium volume is correctly estimated to be about 35 Å3 and the bulk modulus is 28.9 GPa, a mere 15% off from the experimental one, which is near perfect compared to the standard DFT values we had before.

In case of graphite, the simple volume scan approach can also be used for something else. As this approach is well suited to check the behaviour of 1 single internal parameter, we use it to investigate the inter-layer interaction. Keeping the a and b lattice vectors fixed, the c-lattice vector is scanned. Interestingly the LDA functional, which is known to overbind, finds the experimental lattice spacing, while both PBE and HSE06 overestimate it significantly. Introducing D3 corrections for these functionals fixes the problem, and give a stronger binding than LDA.

EOS-fitting for MIL53-MOFs

Comparison of a volume scan and an EOS-fit to fixed volume optimizations for a Metal-Organic Framework with MIL53/47 topology.

We just saw that for simple systems, the simple volume scan can already be too simple. For more complex systems like MOFs, similar problems can be seen. The simple volume scan, as for graphite gives a too sharp potential (with a very large bulk modulus). In addition, internal reordering of the atoms gives rise to very large changes in the energy, and the equilibrium volume can move quite a lot. It even depends on the spin-configuration.

In conclusion: the safest way to get a good equilibrium volume is unfortunately also the most expensive way. By means of an equation of state fit to a set of fixed volume structure optimizations the ground state (experimental) equilibrium volume can be found. As a bonus, the bulk modulus is obtained as well.

Permanent link to this article: https://dannyvanpoucke.be/vasp-tutor-eosfit-en/

Review of 2016

2016 has come and gone. 2017 eagerly awaits getting acquainted. But first we look back one last time, trying to turn this into a tradition. What have I done during the last year of some academic merit.

Publications: +4

Completed refereeing tasks: +5

  • ACS Sustainable Chemistry & Engineering
  • The Journal of Physical Chemistry
  • Journal of Physics: Condensed Matter (2x)
  • Diamond and Related Materials

Conferences: +4 (Attended) & + 1 (Organized)

PhD-students: +2

  • Arthur De Vos : (Jan.-Mar., Ghent University, Ghent, Belgium )
  • Mohammadreza Hosseini (okt.-… ,Phd student physical chemistry, Tarbiat Modares University, Teheran, Iran)

Current size of HIVE:

  • 47K lines of program (code: 70 %)
  • 70 files
  • 44 (command line) options

Hive-STM program:

And now, upward and onward, a new year, a fresh start.

Permanent link to this article: https://dannyvanpoucke.be/review-of-2016-en/

Scaling of VASP 5.4.1 on TIER-1b BrENIAC

Timing primitive uni cell diamond.

When running programs on HPC infrastructure, one of the first questions to ask yourself is: “How well does this program scale?

In applications for HPC resources, this question plays a central role, often with the additional remark: “But for your specific system!“. For some software packages this is an important remark, for other packages this remark has little relevance, as the package performs similarly for all input (or for given classes of input). The VASP package is one of the latter. For my current resource application at the Flemish TIER-1 I set out to do a more extensive scaling test of the VASP package. This is for 2 reasons. The first being the fact that I will be using a newer version of VASP: vasp 5.4.1 (I am currently using my own multiply patched version 5.3.3.). The second being the fact that I will be using a brand new TIER-1 machine (second Flemish TIER-1, as our beloved muk retired the end of 2016).

Why should I put in the effort to get access to resources on such a TIER-1 supercomputer? Because such machines are the life blood of the computational materials scientist. They are their sidekick in the quest for understanding of materials. Over the past 4 years, I was granted (and used) 20900 node-days of calculation time (i.e. over 8 Million hours of CPU time, or 916 years of calculation time) on the first TIER-1 machine.

Now back to the topic. How well does VASP 5.4.1 behave? That depends on the system at hand, and how good you choose the parallelization settings.

1. Parallelization in VASP

VASP provides several parameters which allow for straightforward parallelization of the simulation:

  • NPAR : This parameter can be set to parallelize over the electronic bands. As a consequence, the number of bands included in a calculation by VASP will be a multiple of NPAR. (Note : Hybrid calculations are an exception as they require NPAR to be set to the number of cores used per k-point.)
  • NCORE : The NCORE parameter is related to NPAR via NCORE=#cores/NPAR, so only one of these can be set.
  • KPAR : This parameter can be set to parallelize over the set of irreducible k-points used to integrate over the first Brillouin zone. KPAR should therefore best be a divisor of the number of irreducible k-points.
  • LPLANE : This boolean parameter allows one to switch on parallelization over plane waves. In general this will give rise to a small but consistent speedup (observed in previous scaling tests). As such we have chosen to set this parameter = .TRUE. for all calculations.
  • NSIM : Sets up a blocked mode for the RMM-DIIS algorithm. (cf. manual, and further tests by Peter Larsson). As our tests do not involve the RMM-DIIS algorithm, this parameter was not set.

In addition, one needs to keep the architecture of the HPC-system in mind as well: NPAR, KPAR and their product should be divisors of the number of nodes (and cores) used.

2. Results

Both NPAR and KPAR parameters can be set simultaneously and will have a serious influence on the speed of your calculations. However, not all possible combinations give the same speedup. Even worse, not all combinations are beneficial with regard to speed. This is best seen for a small 2 atom diamond primitive cell.

Timing primitive uni cell diamond.

Timing results for various combinations of the NPAR and KPAR parameters for a 2 atom primitive unit cell of diamond.

Two things are clear. First of all, switching on a parallelization parameter does not necessarily mean the calculation will speed up. In some case it may actually slow you down. Secondly, the best and worst performance is consistently obtained using the same settings. Best: KPAR = maximal, and NPAR = 1, worst: KPAR = 1 and NPAR = maximal.

This small system shows us what you can expect for systems with a lot of k-points and very few electronic bands ( actually, any real calculation on this system would only require 8 electronic bands, not 56 as was used here to be able to asses the performance of the NPAR parameter.)

In a medium sized system (20-100 atoms), the situation will be different. There, the number of k-points will be small (5-50) while the natural number of electronic bands will be large (>100). As a test-case I looked at my favorite Metal-Organic Framework: MIL-47(V).

scaling of MIL-47(V) at PBE level

Timing results for several NPAR/KPAR combinations for the MIL-47(V) system.

This system has only 12 k-points to parallelize over, and 224 electronic bands. The spread per number of nodes is more limited than for the small system. In contrast, the general trend remains the same: KPAR=high, NPAR=low, with an optimum performance when KPAR=#nodes. Going beyond standard DFT, using hybrid functionals, also retains the same picture, although in some cases about 10% performance can be gained when using one half node per k-point. Unfortunatly, as we have very few k-points to start from, this will only be an advantage if the limiting factor is the number of nodes available.

An interesting behaviour is seen when one keeps the k-points/#nodes ratio constant:

scaling at constant k-point/node ratio

Scaling behavior for either a constant number of k-points (for dense k-point grid in medium sized system) or constant k-point/#nodes ratio.

As you can see, VASP performs really well up to KPAR=#k-points (>80% efficiency). More interestingly, if the k-point/#node ratio is kept constant, the efficiency (now calculated as T1/(T2*NPAR) with T1 timing for a single node, and T2 for multiple nodes) is roughly constant. I.e. if you know the walltime for a 2-k-point/2-nodes job, you can use/expect the same for the same system but now considering 20-k-points/20-nodes (think Density of States and Band-structure calculations, or just change of #k-points due to symmetry reduction or change in k-point grid.)  😆

3. Conclusions and Guidelines

If one thing is clear from the current set of tests, it is the fact that good scaling is possible. How it is attained, however, depends greatly on the system at hand. More importantly, making a poor choice of parallelization settings can be very detrimental to the obtained speed-up and efficiency. Unfortunately when performing calculations on an HPC system, one has externally imposed limitations to work with:

  • Memory available per core
  • Number of cores per node[1]
  • Size of your system: #atoms, #k-points, & #bands

Here are some guidelines (some open doors as well):

  • Wherever possible k-point parallelization should be driven to a maximum (KPAR as large as possible). The limiting factor here is the actual number of k-points and the amount of memory available. The latter due to the fact that a higher value of KPAR leads to a higher memory requirement.[2]
  • Use the Γ-version of VASP for Γ-point only calculations. It reduces memory usage significantly (3.7Gb→ 2.8Gb/core for a 512 atom diamond system) and increases the computational efficiency, sometimes even by a factor of 2.
  • NPAR parallelization can be used to reduce the memory load for high KPAR calculations, but increasing KPAR will always outperform the same increase of NPAR.
  • In case only NPAR parallelization is available, due to too few k-points, and working with large systems, NPAR parallelization is your last resort, and will perform reasonably well, up to a point.
  • Electronic steps show very consistent timing, so scaling tests can be performed with only a 5-10 electronic steps, with standard deviations comparable in absolute size for PBE and HSE06.

 

In short:

K-point parallelism will save you, wherever possible !

 

[1] 28 is a lousy number in that regard as its prime-decomposition is 2x2x7, leaving little overlap with prime-decompositions of the number of k-points, which more often than you wish end up being prime numbers themselves 😥
[2] The small system’s memory requirements varied from 0.15 to 1.09 Gb/core for the different combinations.

Permanent link to this article: https://dannyvanpoucke.be/scaling-vasp-breniac-en/

Bachelor projects @ UHasselt/IMO

Black arts of computational materials science.

Today the projects for the third year bachelor students in physics were presented at UHasselt. I also contributed two projects, giving the students the opportunity to choose for a computational materials science project. During these projects, I hope to introduce them into the modern (black) arts of High-Performance Computing and materials modelling beyond empirical models.

The two projects focus each on a different aspect of what it is to be a computational materials scientist. One project focuses on performing quantum mechanical calculations using the VASP program, and analyzing the obtained results with existing software. This student will investigate the NV-defect complex in diamond in all its facets. The other project focuses on the development of new tools to investigate the data generated by simulation software like VASP. This student will extend the existing phonon module in the HIVE-toolbox and use it to analyse a whole range of materials, varying from my favourite Metal-Organic Framework to a girl’s best friend: diamond.

Calculemus solidi

 

A description of the projects in Dutch can be found here.

Permanent link to this article: https://dannyvanpoucke.be/baproj2017-en/

VASP-tutor: Creating a primitive unit cell from a conventional unit cell…for a MOF.

VESTA: Fractional Coordinates
Ball-and-stick representation of diamond.

Ball-and-stick representation of diamond in two different unit cells. Left: primitive unit cell containing two atoms. All atoms at the vertices are periodic copies of the same one. Right: Conventional cubic unit cell containing eight atoms. Atoms at opposing faces are periodic copies, while all atoms at the vertices are periodic copies of the same atom.

When performing electronic structure calculations on complex systems, you prefer to do this on systems with as few atoms as possible. Such periodic cells are called unit cells. There are, however, two types of unit cells: Primitive unit cells and conventional unit cells.

A primitive unit cell is the smallest possible periodic cell of a crystalline material, making it extremely suited for calculations. Unfortunately, it is not always the nicest unit cell to work with, as it may be difficult to recognize it’s symmetry (cf. the example of diamond on the right). The conventional unit cell on the other hand shows the symmetry more clearly, but is not (always) the smallest possible unit cell. To make matters complicated and confusing, people often refer to both types as simply “unit cell”, which is not wrong, but the term unit cell is for many uniquely associated with only one of the two types.

When you are performing calculations on diamond, the conventional cell isn’t that large that standard calculations become impossible, even on a personal laptop or desktop. On the other hand, when you are studying a Metal-Organic Framework like the UiO-66(Zr) which contains 456 atoms in its conventional unit cell, you will be very happy to use the primitive unit cell with ‘merely’ 114 atoms. Also the MIL-47/53 topology which generally is studied using a conventional unit cell containing 72/76 can be reduced to a smaller primitive unit cell of only 36/38 atoms. Just as for the diamond primitive unit cell, this MIL47/53 primitive unit cell is not a nice cubic cell. Instead you end up with a lattice having lattice angles of seventy-something degrees.

Reduction of the MIL-53 conventional cell to the primitive cell

Reduction of the MIL-53 conventional cell to the primitive cell. The conventional cell is shown, extending slightly into the periodic copies. The primitive lattice vectors are shown as colored arrows. The folded primitive cell shows there was some symmetry-breaking in the hydroxy groups of the metal-oxide chain. Introducing some additional symmetry fixes this in the final primitive cell.

How to reduce a conventional unit cell to a primitive unit cell?

Before you start, and if you are using VASP, make sure you have the POSCAR file giving the atomic positions as Cartesian coordinates. (Using the HIVE-4 toolbox: Option TF, suboption 2 (Dir->Cart).)

If you do not use VASP, you can still make use of the scheme below.

  1. Open your structure using VESTA, and save it as “VASP” file: POSCAR.vasp (FileExport Datachoose “VASP” as filetype, select Cartesian Coordinates (don’t select the Convert to Niggli reduced cell as this only works for perfect crystal symmetry)).
  2. Open the file you just saved in a text editor (e.g. notepad or notepad++). The file format is quite straight forward.  The first line is a comment line while the second is a general scale-factor, which for our current purpose can be ignored. What is important to know is that the 3rd, 4th and 5th lines give the lattice vectors (a, b, and c). The 6th and 7th line give the order, type and number of atoms for each atomic species (In VASP 5.x, the older VASP 4.x format does not have a 6th line). The 8th line should say “Cartesian”. From the 9th line onward you get the atomic coordinates.
  3. Choose 1 atom in your conventional cell which you are going to use as reference point.
  4. Get the primitive unit cell lattice vectors by generating vectors from the reference atom. (cf. figure above) Using VESTA this can be done as follows:
    1. Open your conventional cell in VESTA (if you closed it after step 1).
    2. Use the distance selector (5th symbol from the top in the left-hand-side menu) and for each of the primitive lattice vectors, select the reference atom and it’s primitive copy.
      VESTA: Select distance VESTA: Fractional Coordinates
    3. Subtract the “fractional coordinates” of the selected atoms provided by VESTA to get a “fractional” primitive vector (the primitive a vector will be called aprim,frac )
    4. Multiply each of the conventional lattice vectors(aconv, bconv, and cconv) with the corresponding component of the fractional primitive vector, and add the resulting vectors to obtain the new primitive vector:

      ⇒  aprim = axprim,frac aconv + ayprim,frac bconv + azprim,frac cconv

      So imagine that the lattice vectors of the MOF above are  a = ( 20, 0, 0),  b = ( 0, 15, 0), and c = ( 0, 0, 5). And the primitive fractional a vector is found to be aprim,frac = ( 0.5, -0.5, 0.5). In this case the aprim vector will become: aprim = ( 10, 0, 0 ) + ( 0, -7.5, 0 ) + ( 0, 0, 2.5) = (10, -7.5, 2.5).

  5. Replace the conventional lattice vectors in the POSCAR.vasp file (cf. step 2) with the new primitive lattice vectors. Save the file.
  6. Open the POSCAR.vasp in VESTA. If everything went well, and the conventional cell wasn’t the real primitive cell already, you should see a nice new primitive cell with the equivalent atoms perfectly overlapping one-another. This is also the reason to have your starting geometry in Cartesian coordinates. If you would have your atomic positions as fractional coordinates this first check will not work at all. Furthermore, you would need to calculate the new fractional coordinates of the atoms in the primitive unit cell. If all is well, you can close POSCAR.VASP in VESTA. (If something is wrong: either you did something wrong, and you should start again, or it wasn’t actually a super cell of a primitive cell you started to construct.)
  7. Get the atoms of the primitive unit cell.
    1. Because our atomic positions are in Cartesian coordinates in our initial geometry file, we now just need to make a list of single copies of equivalent atoms. Using VESTA (the original structure file you still have open from step 1) you can click on each atom you wish to keep and write down their index (this is the first number you find on the line with Cartesian coordinates) … For example: In case of the MIL53-MOF you can select all metal and oxygen atoms of 1 chain, and two linker molecules.
    2. Remove all superfluous atoms (i.e. those of which you didn’t write down the index) from the POSCAR.vasp (using your text-editor). You may want to make a backup of this file before you start :-).
    3. Update the number of atoms on the 7th line of the POSCAR.vasp file, and check that the number is correct. The conventional cell should have had an integer multiple of the number of atoms in the primitive cell. Save the final structure as POSCAR_final.vasp .
    4. The POSCAR_final.vasp should contain both the new lattice vectors and a list of atoms for a single primitive unit cell. Check this by opening the file using VESTA, and make sure you didn’t remove too many or too few atoms. If not, go back to step (a) and double check. (If you are using the HIVE4-toolbox you can first transform POSCAR_final.vasp back to direct coordinates as this may make atoms visible which nicely overlap in Cartesian coordinates: Option TF, suboption 1 (Cart->Dir)  )
  8. Congratulations you have constructed a primitive cell from a conventional cell.

 As you can see, the method is quite simple and straight forward, albeit a bit tedious if you need to do this many times.

Enjoy your primitive unit cell!

 

PS: Small remark for those new to VESTA. You can use delete atoms in VESTA and store your structure again. This is useful if you want to play with a molecule. Unfortunately for a solid you need also to get new lattice vectors, which did not happen. As a result you end up with some atoms floating around in a periodically repeated box with the original lattice parameters. Steps 1-5 given above provide a simple way of not ending up in this situation, but require some typing on your part.

PS 2: The opposite transformation, from a primitive unit cell to a conventional unit cell, using VESTA, is shown in this youtube video.

Permanent link to this article: https://dannyvanpoucke.be/vasp-tutor-creating-a-unit-cell-from-a-super-cell-en/