Compilation notes for Rocks cluster

This post is more to assist me when upgrading software on our Rocks cluster, but may be useful to others who are compiling similar software. Our cluster only has AMD nodes so we use PGI compilers. Also the backbone is only 1Gb/s so I don’t need to enable any IB options for OpenMPI. I use Modules as a contrib module with a compute-node.xml file to add a /share/apps/modulefiles path to the local config. I have copies of my ModuleFiles synced with Git here.

Also because our head node where compilation occurs uses a AMD Barcelona CPU and the work nodes use an older model I need to add a CFLAG of -tp k8-64

PGI Compilers

PGI have some info on common compilation optimisation at PGI Compilation

  1. Download from Portland Group and save to ~/install/pgi. To download over SSH I use elinks as these can be left open in a disconnected screen session
  2. Create a new directory and tar xzvf from there as it is a messy tar
  3. Root: run ./install – 1. Single System Install
  4. install dir: /share/apps/opt/pgi
  5. add ACML, CUDA
  6. update 2011 links
  7. MPICH1, ssh
  8. License keys, /share/apps/opt/pgi/license.dat
  9. Copy an existing /share/apps/modulefiles/pgi version and update it for new version. Also update .version to new version.


  1. Download latest from OpenMPI and extract in ~/install/openmpi with tar xjvf
  2. mkdir build;cd build
  3. module load pgi
  4. configure with
    env CC=pgcc FC=pgfortran F77=pgfortran CXX=pgcpp CFLAGS='-fast -tp=k8-64' FCFLAGS='-fast -tp=k8-64' FFLAGS='-fast -tp=k8-64' CXXFLAGS='-fast -tp=k8-64' \
    ../configure --prefix=/share/apps/opt/openmpi/1.4.4-pgi-11.10 --with-gnu-ld --with-sge --enable-static \
    --without-openib --disable-openib-ibcm --disable-ipv6 --disable-openib-connectx-xrc --disable-openib-rdmacm --disable-io-romio

    Compiling with Torque on Ethernet Cluster

    export MYINSTALLDIR=/share/apps/mpi/openmpi/1.6.4-pgi-13.04
    export CC=pgcc
    export CXX=pgcpp
    export F77=pgfortran
    export FC=${F77}
    export CFLAGS='-tp=bulldozer-64,barcelona-64,nehalem-64'
    export CXXFLAGS=${CFLAGS}
    export FFLAGS=${CFLAGS}
    export FCFLAGS=${FFLAGS}
    ../configure \
    --prefix=${MYINSTALLDIR} \
    --without-openib \
    --with-tm=/opt/torque \
    --disable-ipv6 \
    --enable-static \
    --with-gnu-ld \
    --disable-io-romio \
    2>&1 | tee configure-`date +%y%m%d-%H%M`.log
  5. make -j 8
  6. make check
  7. root: module load pgi; make install
  8. Copy an existing /share/apps/modulefiles/openmpi/ version and update it for new version. Also update .version to new version and recreate symlink.


These instructions are for PGI and Torque

  1. Download latest MPICH2 and extract
  2. module load pgi
  3. Configure with
    export MYINSTALLDIR=/share/apps/mpi/mpich2/3.0.3-pgi-13.04
    export CC=pgcc
    export CXX=pgcpp
    export F77=pgfortran
    export FC=pgf90
    unset F90
    export CFLAGS='-tp=bulldozer-64,barcelona-64,nehalem-64'
    export CXXFLAGS=${CFLAGS}
    export FFLAGS=${CFLAGS}
    export FCFLAGS=${FFLAGS}
    unset F90FLAGS
    ./configure --prefix=${MYINSTALLDIR}  \
    --with-pbs=/opt/torque \
    2>&1 | tee configure-`date +%y%m%d-%H%M`.log
  4. make -j 40
  5. root: module load pgi; make install; make install-examples
  6. Copy an existing /share/apps/modulefiles/mpich2/ version and update it for new version.


  1. Download latest stable from
  2. module load pgi openmpi
  3. ./configure --enable-openmp --enable-mpi --prefix=/share/apps/opt/fftw/3.3.3-pgi-13.04 \
    CFLAGS="-O3 -fastsse -Mvect=sse -tp=k8-64e,bulldozer-64" F90=pgf90 CC=pgcc FC=pgfortran F77=pgfortran CXX=pgcp
  4. make -j 8
  5. root: make install
  6. Copy an existing /share/apps/modules/other/fftw version and update it for new version. Also update .version to new version.


This replaces GotoBLAS2 as the BLAS library as Goto is no longer maintained.

To keep source up to date I use a GIT mirror, created with git clone git:// I have also modified Makefile.rule and committed it to local mirror. In Makefile.rule I also needed to set architecture as head node is slightly different to compute nodes

  1. cd ~/install/openblas/OpenBLAS
  2. git pull to update source
  3. module load pgi
  4. make
  5. root: make install PREFIX=/share/apps/opt/OpenBLAS/0.1-pgi-11.10
  6. Copy an existing /share/apps/modulefiles/openblas version and update it for new version. Also update .version to new version.


Download latest version from VASP FTP server


Vasp uses a shared library that is common across multiple versions. This only needs to be compiled once, and the makefile does not need any modification. I added a -tp k8-64 to FFLAGS and a clean section but they are not required.

  1. extract vasp.5.lib for a new install and cd vasp.5.lib
  2. module load pgi
  3. make

VASP Executable

The VASP executable makefile has 2 sections for a standalone run time and an MPI runtime. In each of these sections you can build a complex versions and a gamma point only version. So if you compile them all there are 4 combinations used. I use the same make file but comment out the MPI section when compiling standalone. Alternatively you can have separate makefiles and latter use make -f filename

.SUFFIXES: .inc .f .f90 .F
# Makefile for Portland Group F90/HPF compiler release 3.0-1, 3.1
# and release 1.7
# ( &, you need
#  to order the HPF/F90 suite)
#  we have found no noticable performance differences between
#  any of the releases, even Athlon or PIII optimisation does
#  not seem to improve performance
# The makefile was tested only under Linux on Intel platforms
# (Suse X,X)
# it might be required to change some of library pathes, since
# LINUX installation vary a lot
# Hence check ***ALL**** options in this makefile very carefully
# Mind that some Linux distributions (Suse 6.1) have a bug in
# libm causing small errors in the error-function (total energy
# is therefore wrong by about 1meV/atom). The recommended
# solution is to update libc.
# Mind that some Linux distributions (Suse 6.1) have a bug in
# libm causing small errors in the error-function (total energy
# is therefore wrong by about 1meV/atom). The recommended
# solution is to update libc.
# BLAS must be installed on the machine
# there are several options:
# 1) very slow but works:
#   retrieve the lapackage from
#   and compile the blas routines (BLAS/SRC directory)
#   please use g77 or f77 for the compilation. When I tried to
#   use pgf77 or pgf90 for BLAS, VASP hang up when calling
#   ZHEEV  (however this was with lapack 1.1 now I use lapack 2.0)
# 2) most desirable: get an optimized BLAS
#   for a list of optimized BLAS try
# the two most reliable packages around are presently:
# 3a) Intels own optimised BLAS (PIII, P4, Itanium)
#   this is really excellent when you use Intel CPU's
# 3b) or obtain the atlas based BLAS routines
#   you certainly need atlas on the Athlon, since the  mkl
#   routines are not optimal on the Athlon.

# all CPP processed fortran files have the extension .f

# fortran compiler and linker
# fortran linker

# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3

CPP_ =  ./preprocess $*$(SUFFIX)

# possible options for CPP:
# possible options for CPP:
# NGXhalf             charge density   reduced in X direction
# wNGXhalf            gamma point only reduced in X direction
# avoidalloc          avoid ALLOCATE if possible
# IFC                 work around some IFC bugs
# CACHE_SIZE          1000 for PII,PIII, 5000 for Athlon, 8000 P4
# RPROMU_DGEMV        use DGEMV instead of DGEMM in RPRO (usually  faster)
# RACCMU_DGEMV        use DGEMV instead of DGEMM in RACC (faster on P4)
#  **** definitely use -DRACCMU_DGEMV if you use the mkl library

CPP    = $(CPP_) -DHOST=\"LinuxPgi\" \
          -DNGXhalf -DCACHE_SIZE=2000 -DPGF90 -Davoidalloc \
          -DRPROMU_DGEMV  \
#		  -DwNGXhalf

# general fortran flags  (there must a trailing blank on this line)
# the -Mx,119,0x200000 is required if you use older pgf90 versions
# on a more recent LINUX installation
# the option will not do any harm on other 3.X pgf90 distributions

#FFLAGS =  -Mfree -Mx,119,0x200000  -tp k8-64 -I/share/apps/opt/fftw/3.3.0-pgi-11.10/include
#Need k8-64 to allow it to run on older Opertons
#FFLAGS =  -Mfree  -tp k8-64,barcelona-64   # doubles exec size (36M->50M)
FFLAGS =  -Mfree  -tp k8-64

# optimization,
# we have tested whether higher optimisation improves
# the performance, and found no improvements with -O3-5 or -fast
# (even on Athlon system, Athlon specific optimistation worsens performance)

#OFLAG  = -O0 -g -DDEBUG
#OFLAG = -O3 -fastsse
OFLAG = -O2  -fastsse

OBJ_HIGH = nonlr.o nonl.o

OBJ_LOW = broyden.o

#DEBUG  = -g -O0
DEBUG  = -O0

# the following lines specify the position of BLAS  and LAPACK
# what you chose is very system dependent
# P4: VASP works fastest with Intels mkl performance library
# Athlon: Atlas based BLAS are presently the fastest
# P3: no clue

#BLAS=   -lacml
# Following allows linking with modules
#BLAS=  -lopenblas -L/share/apps/opt/OpenBLAS/0.1-pgi-11.10/lib
# Hard code link
BLAS=  /share/apps/opt/OpenBLAS/0.1-pgi-11.10/lib/libopenblas.a
# use specific libraries (default library path points to other libraries)

# use the mkl Intel libraries for p4 (
#BLAS=-L/opt/intel/mkl/lib/32 -lmkl_p4  -lpthread

# LAPACK, simplest use vasp.5.lib/lapack_double
#LAPACK= ../vasp.5.lib/lapack_double.o

# use atlas optimized part of lapack
#LAPACK= ../vasp.5.lib/lapack_atlas.o  -llapack -lblas -lacml
#LAPACK= ../vasp.5.lib/lapack_atlas.o  -llapack -lacml
#LAPACK= ../vasp.5.lib/lapack_atlas.o

# use the mkl Intel lapack
#LAPACK= -lmkl_lapack


LIB  = -L../vasp.5.lib -ldmy \
     ../vasp.5.lib/linpack_double.o $(LAPACK) \
     $(BLAS) -lfftw3 -L/share/apps/opt/fftw/3.3.0-pgi-11.10/lib

# options for linking (none required)
#LINK    =  -tp k8-64

# fft libraries:
# VASP.4.5 can use FFTW (
# since the FFTW is very slow for radices 2^n the fft3dlib is used
# in these cases
# if you use fftw3d you need to insert -lfftw in the LIB line as well
# please do not send us any querries reltated to FFTW (no support)
# if it fails, use fft3dlib

FFT3D   = fft3dfurth.o fft3dlib.o  /share/apps/opt/fftw/3.3.0-pgi-11.10/lib/libfftw3.a
#FFT3D   = fftw3d+furth.o fft3dlib.o

# MPI section, uncomment the following lines
# one comment for users of mpich or lam:
# You must *not* compile mpi with g77/f77, because f77/g77
# appends *two* underscores to symbols that contain already an
# underscore (i.e. MPI_SEND becomes mpi_send__).  The pgf90
# compiler however appends only one underscore.
# Precompiled mpi version will also not work !!!
# We found that mpich.1.2.1 and lam-6.5.X are stable
# mpich.1.2.1 was configured with
#  ./configure -prefix=/usr/local/mpich_nodvdbg -fc="pgf77 -Mx,119,0x200000"  \
# -f90="pgf90 -Mx,119,0x200000" \
# --without-romio --without-mpe -opt=-O \
# lam was configured with the line
#  ./configure  -prefix /usr/local/lam-6.5.X --with-cflags=-O -with-fc=pgf90 \
# --with-f77flags=-O --without-romio
# lam was generally faster and we found an average communication
# band with of roughly 160 MBit/s (full duplex)
# please note that you might be able to use a lam or mpich version
# compiled with f77/g77, but then you need to add the following
# options: -Msecond_underscore (compilation) and -g77libs (linking)
# !!! Please do not send me any queries on how to install MPI, I will
# certainly not answer them !!!!
# fortran linker for mpi: if you use LAM and compiled it with the options
# suggested above,  you can use the following lines


# additional options for CPP in parallel version (see also above):
# NGZhalf               charge density   reduced in Z direction
# wNGZhalf              gamma point only reduced in Z direction
# scaLAPACK             use scaLAPACK (usually slower on 100 Mbit Net)

CPP    = $(CPP_) -DMPI  -DHOST=\"LinuxPgi\" \
     -DNGZhalf -DCACHE_SIZE=2000 -DPGF90 -Davoidalloc -DRPROMU_DGEMV \
#     -DwNGZhalf \

# location of SCALAPACK
# if you do not use SCALAPACK simply uncomment the line SCA


# libraries for mpi

LIB     = -L../vasp.5.lib -ldmy  \
      ../vasp.5.lib/linpack_double.o $(LAPACK) \
      $(SCA) $(BLAS) -L/share/apps/opt/fftw/3.3.0-pgi-11.10/lib -lfftw3 -lfftw3_mpi

# FFT: only option  fftmpi.o with fft3dlib of Juergen Furthmueller

FFT3D   = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o

# general rules and compile lines
BASIC=   symmetry.o symlib.o   lattlib.o  random.o

SOURCE=  base.o     mpi.o      smart_allocate.o      xml.o  \
         constant.o jacobi.o   main_mpi.o  scala.o   \
         asa.o      lattice.o  poscar.o   ini.o  mgrid.o  xclib.o  vdw_nl.o  xclib_grad.o \
         radial.o   pseudo.o   gridq.o     ebs.o  \
         mkpoints.o wave.o     wave_mpi.o  wave_high.o  \
         $(BASIC)   nonl.o     nonlr.o    nonl_high.o dfast.o    choleski2.o \
         mix.o      hamil.o    xcgrad.o   xcspin.o    potex1.o   potex2.o  \
         constrmag.o cl_shift.o relativistic.o LDApU.o \
         paw_base.o metagga.o  egrad.o    pawsym.o   pawfock.o  pawlhf.o   rhfatm.o  paw.o   \
         mkpoints_full.o       charge.o   Lebedev-Laikov.o  stockholder.o dipol.o    pot.o \
         dos.o      elf.o      tet.o      tetweight.o hamil_rot.o \
         steep.o    chain.o    dyna.o     sphpro.o    us.o  core_rel.o \
         aedens.o   wavpre.o   wavpre_noio.o broyden.o \
         dynbr.o    rmm-diis.o reader.o   writer.o   tutor.o xml_writer.o \
         brent.o    stufak.o   fileio.o   opergrid.o stepver.o  \
         chgloc.o   fast_aug.o fock.o     mkpoints_change.o sym_grad.o \
         mymath.o   internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o \
         hamil_high.o nmr.o    pead.o     mlwf.o     subrot.o   subrot_scf.o \
         force.o    pwlhf.o  gw_model.o optreal.o   davidson.o  david_inner.o \
         electron.o rot.o  electron_all.o shm.o    pardens.o  paircorrection.o \
         optics.o   constr_cell_relax.o   stm.o    finite_diff.o elpol.o    \
         hamil_lr.o rmm-diis_lr.o  subrot_cluster.o subrot_lr.o \
         lr_helper.o hamil_lrf.o   elinear_response.o ilinear_response.o \
         linear_optics.o linear_response.o   \
         setlocalpp.o  wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o \
         ratpol.o screened_2e.o wave_cacher.o chi_base.o wpot.o local_field.o \
         ump2.o bse_te.o bse.o acfdt.o chi.o sydmat.o dmft.o \
         rmm-diis_mlr.o  linear_response_NMR.o

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
	rm -f vasp
	$(FCL) -o vasp main.o  $(SOURCE)   $(FFT3D) $(LIB) $(LINK)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
	$(FCL) -o makeparam  $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
	$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
	$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
	$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
	$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

	-rm -f *.g *.f *.o *.L *.mod vasp ; touch *.F

main.o: main$(SUFFIX)
	$(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
	$(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
	$(FC) $(FFLAGS) $(INLINE)  $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
	$(FC) $(FFLAGS)$(DEBUG)  $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
base.o: base.F
mgrid.o: mgrid.F
constant.o: constant.F
lattice.o: lattice.F
setex.o: setex.F
pseudo.o: pseudo.F
poscar.o: poscar.F
mkpoints.o: mkpoints.F
wave.o: wave.F
nonl.o: nonl.F
nonlr.o: nonlr.F


	$(FC) $(FFLAGS) $(OFLAG_LOW) $(INCS) -c $*$(SUFFIX)

	$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
	$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

	$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
	$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

  1. extract vasp.5.2.12 for the current version
  2. module load pgi openmpi openblas fftw
  3. Add makefile above or customise your own
  4. In makefile correct paths for libraries and features to enable.
  5. make
  6. Copy the compiled vasp executable to destination. I use vasp and vasp.g for gamma point.

Gamma Point

  1. Use same makefile as above but add -DwNGXhalf to CPP options

Quantum Espresso

  1. Download latest version from as well as the examples
  2. tar xzvf both the src and examples. They end up in same directory
  3. module load pgi openmpi openblas fftw3
  4. ./configure F90=pgf90 FC=pgf90 F77=pgf77 CC=pgcc MPIF90=mpif90 \
    FFLAGS="-fast -O2 -tp=k8-64e,bulldozer-64" CFLAGS="-fast -O3 -tp k8-64e,bulldozer-64" \
    FFT_LIBS="-L/share/apps/opt/fftw/3.3.3-pgi-13.04/lib -lfftw3 -lfftw3_mpi" \
    BLAS_LIBS="-L/share/apps/opt/OpenBLAS/0.2.6-pgi-13.04/lib -lopenblas" \
    LAPACK_LIBS="-L/share/apps/opt/OpenBLAS/0.2.6-pgi-13.04/lib -lopenblas" \
    LIBDIRS="/share/apps/opt/fftw/3.3.3-pgi-13.04/lib /share/apps/opt/OpenBLAS/0.2.6-pgi-13.04/lib"
  5. make pw pp -j8
  6. make all -j8
  7. to test: cd examples, modify environment_variables (e.g. tmp dir and mpirun -np ),  ./run_all_examples, then compare example??/results to reference
  8. root: copy entire espresso-4.3.2 directory to /share/apps/opt/espresso/4.3.2/pgi-openmpi
  9. Copy an existing /share/apps/modulefiles/espresso module and update it for new version. Also update .version to new version.
  1. make -j 8
  2. root: module load pgi; make install
  3. Copy an existing /share/apps/modulefiles/openmpi/ version and update it for new version.

About James Rudd

Network Administrator at Sydney Boys High School
This entry was posted in Linux and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply