Skip to content

HPC - Cluster Configuration and Software Installation


Preface

This is to document the setup of a cluster for high-performance parallel computing.

The goal is to

  • minimize maintenance;
  • minimize security risk;
  • maximize performance (especially, inter-node communication); and
  • implement a well-defined use policy.

Contents


Hardware Specification

The cluster consists of a head/login node (frontend) and 10 compute nodes which are inter-connected through the Intel Omni-Path Fabric. They are housed in a 40U-rack cabinet.

Hardware specs are as follows.

  • Head/login node (hostname: node0)

    • Platform
      • Intel 2U Server System R2308WFTZSR
    • Processor
      • Intel Xeon Gold Processor 6226R (16Core, 2.9GHz, 22M Cache, 150w) 2EA (total 32 cores)
    • Memory
      • 16GB DDR4 (PC4-23400, ECC, Registered, DIMM, 2933MT/s) 12EA (total 192GB)
    • RAID Controller
      • Intel Integrated 12Gb RAID Module RMS3CC080 (1GB Cache/RAID Lv.0, 1, 5, 6, 10, 50, 60) (Documentation is available here.)
    • Storage
      • Intel SSD S4510 Series 480GB (2.5in SATA 3D TLC Nand 1DWPD) 2EA -> Configured as RAID1, providing 480GB of storage
      • SATA HDD 6TB (3.5", 7200rpm, 6Gb, Enterprise) 4EA -> Configured as RAID6, providing 16TB of storage
      • SATA HDD 6TB (3.5", 7200rpm, 6Gb, Enterprise) 2EA -> Configured as RAID1, providing 6TB of storage
    • Network
      • Intel Omni-Path Host Fabric Interface Adapter 100 Series 1Port PCIe x16 100HFA016LS 1EA
      • Onboard Dual Port RJ45 10GbE
    • Management
      • Intel Remote Management Module AXXRMM4LITE2 1EA
  • Compute node: node[1-8]

    • Platform
      • Intel 1U Server System R1304WFTYSR
    • Processor
      • Intel Xeon Gold Processor 6238R (28Core, 2.2GHz, 38.5M Cache, 165w) 2EA (total 56 cores)
    • Memory
      • 16GB DDR4 (PC4-23400, ECC, Registered, DIMM, 2933MT/s) 12EA (total 192GB)
    • Storage
      • Intel SSD S4510 Series 240GB (2.5in SATA 3D TLC Nand 1DWPD) 1EA for node[1-2]
      • Samsung PM883 2.5" SATA 240GB 1EA for node[3-7]
      • Samsung PM893 2.5" SATA 240GB 1EA for node8
    • Network
      • Intel Omni-Path Host Fabric Interface Adapter 100 Series 1Port PCIe x16 100HFA016LS 1EA
      • Onboard Dual Port RJ45 10GbE
    • Management
      • Intel Remote Management Module AXXRMM4LITE2 1EA
  • Compute node: node[9-10]

    • Platform
      • Intel 1U Server Systems M50CYP1UR204
    • Processor
      • Intel Xeon Gold 6348 Processor (28core, 2.6GHz, 42MB, 205W) 2EA (total 56 cores)
    • Memory
      • 32GB DDR4 ECC Registered DIMM 3200MT/s(25600) 8EA (total 256GB)
    • Storage
      • Samsung PM893 480GB SATA 6Gb/s 2.5-Inch Enterprise SSD 1EA
    • Network
      • Intel Omni-Path Host Fabric Interface Adapter 100 Series 1Port PCIe x16 100HFA016LSN
      • Intel Ethernet Network Adapter X710-T2L for OCP 3.0 (RJ45 10/1G)
    • Management
      • Intel Remote Management Module AXXRMM4LITE2
  • Network switch

    • Intel Omni-Path Fabric Switch
      • Intel Omni-Path Edge Switch 100 Series 48Port Forward 2PSU 100SWE24UF2 (100Gbps = 25Gbps x 4 channels)
    • Gigabit Switch
      • HPE OfficeConnect 1420 Switch Series (24 RJ-45 autosensing 10/100/1000 ports)
  • Misc.

    • Rack Cabinet
      • 40U 19" Rack Cabinet 700-SR
    • KVM Console
      • OXCA 17" LCD 1Port KVM Console KLB-101

Guide for troubleshooting


Cluster Configuration

The block diagram below depicts the cluster configuration.

A router provides a firewall between the cluster frontend and the worldwide web (WWW) as well as a private network independent of the university's network.

The head node plays many roles: frontend to the cluster, user accounting, job scheduling, shared storage, software repository, cluster networking, and OPA fabric manager. It can also serve as a compute node.

Currently, there are 8 first-gen compute nodes of the same specs and 2 second-gen compute notes of the same specs. Their sole job is to execute compute requests by the job schedular running on the head node.

Cluster Block Diagram


Head Node RAID Setup

The RAID configuration is found in the "5.2 Integrated RAID M BIOS Configuration Utility for 12 Gb/s Intel(R) RAID Controllers" section of the document found here.


Base System Setup

This section describes the base system setup for all nodes, which requires occasional visit to the server room.

BIOS Settings

For performance tuning, Intel Omni-Path Fabric Performance Tuning.

OS Installation

NOTE: See Kickstart Installations at https://docs.centos.org/en-US/centos/install-guide/Kickstart2/.

As far as configuring a HPC system is concerned, CentOS is already shiped with the OPA drivers right out of the box and has the support for cluster server systems. CentOS 8 has been released recently, but it didn't seem to have the OPA driver support and is missing several development packages necessary for the configuration. So, CentOS 7 it is. NOTE: The support of CentOS 7 will be discontinued in 2024. An alternative can be Oracle Linux which is much like CentOS.

The archive vault is found at

https://vault.centos.org
http://archive.kernel.org/centos-vault/centos/
http://linuxsoft.cern.ch/centos-vault/
http://mirror.nsc.liu.se/centos-store/

Installing pacakges using dnf after after a fresh install of CentOS 8 will fail with an error No URLs in mirrorlist. See CentOS through a VM - no URLs in mirrorlist or CentOS 8: No URLs in mirrorlist error for a workaround.

An easy way to install a fresh OS is to make a bootable usb stick from a CentOS installation iso file. On macOS, follow the steps at How To Make a Bootable USB Stick From an ISO File on an Apple Mac OX X.

  1. Download the desired file.

  2. Open the Terminal.

  3. Convert the .iso file to .img using the convert option of hdiutil:

    $ hdiutil convert -format UDRW -o /path/to/target.img /path/to/source.iso
    

    NOTE: OS X tends to put the .dmg ending on the output file automatically. Rename the file by typing:

    $ mv /path/to/target.img.dmg /path/to/target.img
    
  4. Run diskutil list to get the current list of devices.

  5. Insert your flash media.

  6. Run diskutil list again and determine the device node assigned to your flash media (e.g. /dev/disk2).

  7. Run diskutil unmountDisk /dev/diskN (replace N with the disk number from the last command - in the previous example, N would be 2).

  8. Execute sudo dd if=/path/to/downloaded.img of=/dev/rdiskN bs=1m (replace /path/to/downloaded.img with the path where the image file is located; for example, ./ubuntu.img or ./ubuntu.dmg).

    NOTE: Using /dev/rdisk instead of /dev/disk may be faster. NOTE: If you see the error dd: Invalid number '1m', you are using GNU dd. Use the same command but replace bs=1m with bs=1M. NOTE: If you see the error dd: /dev/diskN: Resource busy, make sure the disk is not in use. Start the 'Disk Utility.app' and unmount (don't eject) the drive.

  9. Run diskutil eject /dev/diskN and remove your flash media when the command completes.

The installation is straightforward and only involves a few button clicks. For the Netinstall version, a URL to the CentOS mirror repository is needed:

http://mirror.centos.org/centos-7/7.9.2009/os/x86_64/

Put this into the Installation Source section in the Installation Summary screen.

Choose sensible packages in the Compute Node category. Do not choose High Availibility-something package (which seems to install GUI). Do not choose Infiniband package; Intel-provided driver and software package will be installed later. Check out the installation report: anaconda-ks.cfg on compute nodes and anaconda-ks.cfg on the head node.

Choose the manual partitioning option. Choose the standard partition table for the partitions where the CentOS will be installed. For UEFI booting, a /boot/efi partition is required. Partitions are configured as follows:

  • For compute nodes (a single 240GB or 480GB SSD),

    /SSD/boot/efi   - 200 MiB
    /SSD/boot       - 500 MiB
    /SSD/           - remaining space
    
  • And, for the head node (two 480GB SSD configured as RAID1),

    /SSD(RAID1)/boot/efi   - 200 MiB
    /SSD(RAID1)/boot       - 500 MiB
    /SSD(RAID1)/           - remaining space
    

The head node has an additional storage:

  • 4 x 6TB SATA HDD configured as RAID6, providing a total of 16TB storage; and
  • 2 x 6TB SATA HDD configured as RAID1, providing a total of 6TB storage.

This will be partitioned later as an LVM volume.

Disable KDUMP.

Enable network for installing packages over the net and be sure to set the hostname (see configure-internet-conection).

node0.hpc.kyungguk.com
node[1-10].hpc.kyungguk.com

Here, node0 is the head node.

If the NetInstall iso is used, the head node must be setup first and forward the local cluster network traffics. Then, the compute node' network needs to be configured manually.

The Security Policy doesn't have any items to choose and turning it off doesn't seem to do anything.

After all is done, begin installation. While the installation is in progress, set the root password. Do not create a user account at this point.

Network Settings

After first boot, the next items in the agenda is to make sure that the public network is up and running and to configure the local cluster network. Configuring a local network is explained at https://devops.ionos.com/tutorials/deploy-outbound-nat-gateway-on-centos-7/.

On the head node

As shown in Cluster Configuration, the head node is connected to the public network through the ethernet interface eno2. It acts as a gateway for the local cluster network and forwards all local network traffics requested by the compute nodes. The eno2 ethernet port is connected to the router and receives the DHCP configuration. To do so (if not already done during the installation process), edit /etc/sysconfig/network-scripts/ifcfg-eno2 to include

BOOTPROTO="dhcp"
ONBOOT="yes"

The eno1 ethernet port is connected to the local network switch and acts as a gateway for local traffics. Edit /etc/sysconfig/network-scripts/ifcfg-eno1 to have something like this:

TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=eno1
DEVICE=eno1
ONBOOT=yes
IPADDR=192.168.1.254
PREFIX=24

Restart the network service:

$ systemctl restart NetworkManager

Enable IP forwarding:

$ sysctl -w net.ipv4.ip_forward=1
$ echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.d/ip_forward.conf

Enable NAT:

$ firewall-cmd --permanent --zone=public --direct --passthrough ipv4 -t nat -I POSTROUTING -o eno2 -j MASQUERADE -s 192.168.1.0/24
$ firewall-cmd --permanent --zone=trusted --change-interface=eno1
$ firewall-cmd --reload

NOTE: Had it not been assigned a trusted zone, the following would have been needed:

firewall-cmd --permanent --zone=public --direct --passthrough ipv4 -I FORWARD -i eno1 -j ACCEPT.

Confirm the configuration with:

$ firewall-cmd --list-all-zones

Log into one of compute nodes and check if network traffic goes through the firewall:

$ ping 8.8.8.8
$ dig kyungguk.com

On compute nodes

To setup a static IP,

  • Edit /etc/sysconfig/network-scripts/ifcfg-$DEVICE to have something like this:

    TYPE=Ethernet
    PROXY_METHOD=none
    BROWSER_ONLY=no
    BOOTPROTO=static
    DEFROUTE=yes
    IPV4_FAILURE_FATAL=no
    IPV6INIT=no
    IPV6_AUTOCONF=yes
    IPV6_DEFROUTE=yes
    IPV6_FAILURE_FATAL=no
    IPV6_ADDR_GEN_MODE=stable-privacy
    NAME=$DEVICE
    DEVICE=$DEVICE
    ONBOOT=yes
    IPADDR=192.168.1.[1-10]
    PREFIX=24
    GATEWAY=192.168.1.254
    DNS1=192.168.0.1
    

    DEVICE=eno1 for node[1-8], and DEVICE=ens259f0 for node[9-10].

Restart the network service:

$ systemctl restart NetworkManager

Check if network traffic goes through the head node:

$ ping 8.8.8.8
$ dig kyungguk.com

SSH Settings

Next in the agenda is to enhance the SSH security and enable password-less login from the head node to compute nodes.

On the head node, generate public/private rsa key pair for root:

$ ssh-keygen -t rsa -b 4096 -C "root@hpc.kyungguk.com"

Copy the public key to the compute nodes (? must be replaced with an appropriate node IP):

$ ssh 192.168.1.? "mkdir -p ~/.ssh"
$ cat ~/.ssh/id_rsa.pub | ssh 192.168.1.? "cat >> ~/.ssh/authorized_keys"

On all compute nodes, change the file permission of authorized_keys:

$ chmod 0600 ~/.ssh/authorized_keys

On all compute nodes, edit /etc/ssh/sshd_config to include

PermitRootLogin without-password
PubkeyAuthentication yes
PermitEmptyPasswords no
PasswordAuthentication no
ChallengeResponseAuthentication yes

Restart the sshd service

$ systemctl restart sshd

At this point, it is good idea to enhance the SSH security for the access to the head node from outside networks. Follow the same steps and make sure that the public key(s) are listed in authorized_keys on the head node.

Package Update & Enable Extra Packages for Enterprise Linux (EPEL) Repository

NOTE: Check out the IUS repository.

Now that the network is up and running, it is a good time to update all packages installed during OS installation.

$ yum update

NOTE: The above action results in kernel update, which is not often the desired case.

Enable Extra Packages for Enterprise Linux (EPEL) Repository

$ yum install epel-release
$ yum update

NOTE: The above action results in kernel update, which is not often the desired case.

Install a couple of extra handy packages

$ yum install htop

On the head node, the following packages as well

$ yum install screen neovim

NOTE: Make screen only runnable by the root user.

Clock Sync

It is important to synchronize the clocks on all nodes.

By default, chronyd should be running after OS installation. On every node, verify that chronyd is running

$ systemctl status chronyd
$ chronyc tracking

Intel OPA Drivers and Softwares

Article about OPA at Intel OmniPath network fabric, https://wiki.archlinux.org/index.php/InfiniBand, and https://blog.exxactcorp.com/understanding-intel-omni-path-architecture-intel-opa/.

Intel OPA package download option

Download Intel driver here. Under Software and Drivers section, download version RHEL* 7.* of IntelOPA-IFS* for the fabric manager (i.e., head node) and IntelOPA-Basic* for all other nodes.

Check out the Intel_OP_Fabric_Software_10_10_3_2_RN_M25829_v1_0.pdf document for installation in version 7.9.

Update on 2023/08/14: According to Omni-Path in Wikipedia, "Intel announced that the Omni-Path network products and technology would be spun out into a new venture with Cornelis Networks." The drivers is now hosted at Cornelis. Sign in (my credential already created) and go to Release Library and look for Cornelis Omni-Path Express Basic Software - RHEL ?.? and Cornelis Omni-Path Express Suite (OPXS) Software - RHEL ?.?. The file names for RHEL 7.9 are CornelisOPX-Basic.RHEL78-x86_64.10.12.1.0.7.tgz and CornelisOPX-OPXS.RHEL78-x86_64.10.12.1.0.7.tgz, the latter containing the fabric manager.

The CornelisOPX website indicates that the newer version of OPA packages are supported by RockyLinux. The RockyLinux archive are found at https://dl.rockylinux.org/vault/rocky/. See the Rocky Wiki on Notes on: CRB and Notes on: EPEL.

The packages needed for building the Intel-provided drivers and software are:

$ yum install rdma-core rdma-core-devel
$ yum install opensm-libs opensm-devel
$ yum install libatomic
$ yum install atlas atlas-devel
$ yum install expect expect-devel
$ yum install openssl-devel

Unzip appropriate files and execute ./INSTALL. Install software using option 1; then choose to install/upgrade up to 7. Press enters (default options).

After installation, enable appropriate modules on boot. For the fabric manager (i.e., head node), enable opafm service as well.

CentOS-shipped package option

The InfiniBand package chosen during the installation appears to set up all necessary drivers and software. The fabric manager needs to be installed on the fabric manager after the installation. Log into the head node, install the fabric manager, and enable & start the service

$ yum install opa-fm
$ systemctl start opafm.service
$ systemctl enable opafm.service

The following will not be necessary, but just for bookkeeping. On the compute nodes,

$ yum install opa-basic-tools
$ yum install rdma-core
$ yum install libpsm2

On the head nodes,

$ yum install opa-basic-tools
$ yum install rdma-core
$ yum install libpsm2
$ yum install opa-fastfabric
$ yum install opa-addres-resolution
$ yum install opa-basic-tools
$ yum install rdma-core
$ yum install libpsm2

Post-driver installation setups

Confirm if hfi1 kernal module is loaded:

$ modinfo hfi1

Confirm Omni-Path HFI Adapter:

## node[0-8]
$ lspci -vvv -s 18:00.0
## node[9-10]
$ lspci -vvv -s 31:00.0

LnkCap and LnkSta under the Capabilities section indicate the communication speed.

Command opareports reports FI Fabric connections to Switches (may not be available for Intel software).

Commands opainfo and opaportinfo report the link status.

PortState must be Active and PhysicalState must be LinkUp.

To enable IPoIB, edit /etc/sysconfig/network-scripts/ifcfg-ib0 in all nodes, which should look something like this

CONNECTED_MODE=no
TYPE=InfiniBand
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ib0
DEVICE=ib0
ONBOOT=yes
IPADDR=192.168.2.[1-10]
PREFIX=24
MTU=65520

Be sure to change IPADDR field appropriately. (The head node should take 254.)

Restart the network service:

$ systemctl restart NetworkManager

Edit /etc/hosts in all nodes to include

192.168.2.254  node0
192.168.2.1    node1
192.168.2.2    node2
192.168.2.3    node3
192.168.2.4    node4
192.168.2.5    node5
192.168.2.6    node6
192.168.2.7    node7
192.168.2.8    node8
192.168.2.9    node9
192.168.2.10   node10

192.168.1.254  node0e
192.168.1.1    node1e
192.168.1.2    node2e
192.168.1.3    node3e
192.168.1.4    node4e
192.168.1.5    node5e
192.168.1.6    node6e
192.168.1.7    node7e
192.168.1.8    node8e
192.168.1.9    node9e
192.168.1.10   node10e

Or, we can broadcast head node's /etc/hosts to all compute nodes:

$ for i in {1..10}; do scp /etc/hosts node$i:/etc/hosts; done

The qperf package can measure bandwidth and latency over RDMA (SDP, UDP, UD, and UC) or TCP/IP (including IPoIB). See qperf - Measure performance over RDMA or TCP/IP.

On one node qperf must be run in server mode (Firewall needs to be disabled on this node):

$ qperf

On another node, run it in the client mode:

$ qperf SERVERNODE [OPTIONS] TESTS

SERVERNODE can be a hostname, or for IPoIB a TCP/IP address.

There are many tests. Some of the most useful are below.

  • To run a TCP bandwidth and latency test:

    $ qperf node0 tcp_lat tcp_bw tcp_bw tcp_bw tcp_bw tcp_bw
    ------------------------------------------------------------
    tcp_lat:
        latency  =  14.5 us
    tcp_bw:
        bw  =  850 MB/sec
    
  • To run a UDP latency test and then cause the server to terminate:

    $ qperf node0 udp_lat udp_bw udp_bw udp_bw udp_bw udp_bw
    ------------------------------------------------------------
    udp_lat:
        latency  =  12.8 us
    udp_bw:
        send_bw  =  1.66 GB/sec
        recv_bw  =  1.62 GB/sec
    
  • To measure the RDMA UD latency and bandwidth:

    $ qperf node0 ud_lat ud_bw ud_bw ud_bw ud_bw ud_bw
    ------------------------------------------------------------
    ud_lat:
        latency  =  13.6 us
    ud_bw:
        send_bw  =  1.14 GB/sec
        recv_bw  =  1.13 GB/sec
    
  • To measure RDMA UC bi-directional bandwidth:

    $ qperf node0 rc_lat rc_bi_bw rc_bi_bw rc_bi_bw rc_bi_bw rc_bi_bw
    ------------------------------------------------------------
    rc_lat:
        latency  =  8.33 us
    rc_bi_bw:
        bw  =  6.36 GB/sec
    
  • To get a range of TCP latencies with a message size from 1 to 64K:

    $ qperf node0 -oo msg_size:1:64K:*2 -vu tcp_lat
    ------------------------------------------------------------
    tcp_lat:
        latency   =  17.2 us
        msg_size  =     1 bytes
    tcp_lat:
        latency   =  15.1 us
        msg_size  =     2 bytes
        ... [continues]
    

Check out Diagnosing and benchmarking.

Firewall

The firewall setting is explained at https://www.digitalocean.com/community/tutorials/how-to-set-up-a-firewall-using-firewalld-on-centos-7.

The firewall can be often a great deal of odeal and frustration when setting a local network and establish connections among the nodes. To construct a local cluster network, an easy way -- without disabling it -- is to group the local network interfaces into a trusted zone, so that all network traffics to and from the cluster nodes are trusted.

Configure the firewall settings on all nodes so that the network interfaces connected to the local cluster switches are assigned the trusted zone.

For node[0-8]:

## node[0-8]
$ firewall-cmd --permanent --zone=trusted --change-interface=eno1
## node[9-10]
$ firewall-cmd --permanent --zone=trusted --change-interface=ens259f0
$ firewall-cmd --permanent --zone=trusted --change-interface=ib0
$ firewall-cmd --reload

Confirm the settings using:

$ firewall-cmd --list-all-zones

LVM Partition

See How to Install and Configure LVM on CentOS 7 and Beginner's Guide to LVM (Logical Volume Management).

This only applies to the storage node (in this case the head node).

The storage volume (RAID6 SATA HDD and RAID1 SATA HDD) can be partitioned and formatted during the OS installation.

Configure partitions like this:

  • /home - RAID1 SATA HDD 6TB
  • /scratch - RAID6 SATA HDD 16TB

For the setup below, we assume that 6TB RAID1 SATA HDD is /dev/sdc and 16TB RAID6 SATA HDD is /dev/sdb.

  1. Create physical volumes,

    # pvcreate /dev/sdb
    # pvcreate /dev/sdc
    

    If it returns an error like this: Device /dev/sdb excluded by a filter., then follow this instruction and execute

    # wipefs -a /dev/sdb
    

    Confirm the newly created PVs

    # pvdisplay
    
  2. Create volume groups

    # vgcreate raid6_16tb /dev/sdb
    # vgcreate raid1_6tb /dev/sdc
    

    Confirm the created volume groups

    # vgdisplay
    
  3. Create logical volumes

    # lvcreate -n home -l 100%FREE raid1_6tb
    # lvcreate -n scratch -l 100%FREE raid6_16tb
    

    Confirm the created logical volumes

    # lvdisplay
    

    Or using

    # lsblk
    

    To find UUID

    # blkid /dev/raid1_6tb/home
    
  4. Create a filesystem

    # mkfs.xfs /dev/raid1_6tb/home
    # mkfs.xfs /dev/raid6_16tb/scratch
    
  5. Update /ets/fstab to include

    /dev/mapper/raid1_6tb-home     /home                   xfs     defaults,uquota,gquota,pquota        0 0
    /dev/mapper/raid6_16tb-scratch /scratch                xfs     defaults,uquota,gquota,pquota        0 0
    

Mirrored Logical Volume

See Create Mirrored Logical Volume in Linux for instruction.

Pruning Scratch Directory

See CentOS / RHEL : Beginners guide to cron.

To keep the scratch directory spacious and clean, files and directories whose last change (ctime) time is older than a certain period will be purged.

  • For regular files and links, the grace period is 20 days.
  • For directories, the grace period is 10 days.

NOTE: The directory pruning only applies to empty folders. The reason for the different grace periods is that deleting files changes the ctime of the folders containing them.

Edit /scratch/prune_scratch.sh to include

#!/bin/bash

SUFFIX=$(date '+%Y-%m-%d')

# Prune files/links changed at least 20 days ago
#
find /scratch/* -mindepth 1 \( -type f -or -type l \) -ctime +20 -exec stat {} \; -delete >> /scratch/purged-files.${SUFFIX}.log

# Then empty directories changed at least 10 days ago
#
find /scratch/* -mindepth 1 -type d -empty -ctime +10 -exec stat {} \; -delete >> /scratch/purged-folders.${SUFFIX}.log

# Delete old logs
#
find /scratch -maxdepth 1 -type f -name "purged-*.log" \( -mtime +10 -or -empty \) -delete

Make it executable and accessible by root and create a symbolic link /etc/cron.daily/prune_scratch.sh to it

$ sudo chmod 0700 /scratch/prune_scratch.sh
$ sudo ln -s /scratch/prune_scratch.sh /etc/cron.daily/prune_scratch.sh

Disk Quota

See 3.3. XFS QUOTA MANAGEMENT, How to Setup Disk Quota on XFS File System in Linux Servers, and Using xfs project quotas to limit capacity within a subdirectory.

This only applies to the storage node (in this case the head node).

To enable User, Group, and Project quota on /home and /scratch, edit the /etc/fstab to include thequota option:

...   /home    xfs    defaults,uquota,gquota,pquota   0 0
...   /scratch    xfs    defaults,uquota,gquota,pquota   0 0

Reboot and verify whether quota is enabled:

$ mount

To print disk quota,

$ xfs_quota -x -c "report -h" /home
$ xfs_quota -x -c "report -h" /scratch

User Quota

To set disk (block) limit on a user named kmin on home directory,

$ xfs_quota -x -c "limit -u bsoft=1000g bhard=1200g kmin" /home
$ xfs_quota -x -c "limit -u bsoft=5t bhard=5200g kmin" /scratch

In above command, bsoft is block soft limit in MBs and bhard is block hard limit in MBs, limit is a keyword to implement disk or file limit on a file system for a specific user.

To set file (inode) limit,

$ xfs_quota -x -c "limit -u isoft=400 ihard=500 kmin" /home
$ xfs_quota -x -c "limit -u isoft=40000 ihard=50000 kmin" /scratch

In above command, isoft is inode or file soft limit and ihard is inode or file hard limit.

Both commands can be combined

$ xfs_quota -x -c "limit -u bsoft=1000g bhard=1200g isoft=400 ihard=500 kmin" /home
$ xfs_quota -x -c "limit -u bsoft=5t bhard=5200g isoft=40000 ihard=50000 kmin" /scratch

Verify block and inode limits:

$ xfs_quota -x -c "report -bih" /home
$ xfs_quota -x -c "report -bih" /scartch

To reset the limits, set the limits with zeros as its argument, e.g.,

$ xfs_quota -x -c "limit -u bsoft=0 bhard=0 kmin" /home
$ xfs_quota -x -c "limit -u bsoft=0 bhard=0 kmin" /scratch

Group Quota

To configure disk and file quota on engineering group,

$ xfs_quota -x -c "limit -g bsoft=6144m bhard=8192m isoft=1000 ihard=1200 engineering" /home

Verify the Quota details for group engineering

$ xfs_quota -x -c "report -gbih" /home

Project Quota

Project quota is good for enforcing directory-based quota. First, edit /etc/projects

2000:/home/kmin
102000:/home/scratch/kmin

And, edit /etc/projid to map the id to a name

kmin/home:2000
kmin/scratch:102000

Initialize project directories

$ xfs_quota -x -c "project -s kmin/home" /home
$ xfs_quota -x -c "project -s kmin/scratch" /home

Implement quotas

$ xfs_quota -x -c "limit -p bsoft=8g bhard=10g kmin/home" /home
$ xfs_quota -x -c "limit -p bsoft=800g bhard=1000g kmin/scratch" /home

Verify quota details

$ xfs_quota -x -c "report -bih" /home

NFS

The NFS server is configured based on 8.6. CONFIGURING THE NFS SERVER.

Particularly notable is 8.6.6. Enabling NFS over RDMA (NFSoRDMA) and 8.6.7. Configuring an NFSv4-only Server.

NOTE: The NFSoRDMA doesn't seem to work if there are more than one client connected to the NFSoRDMA server.

The following articles as reference:

Have nfs-utils installed on all nodes:

$ yum install nfs-utils

NFS Server

Start and enable the nfs services

$ systemctl start nfs-server rpcbind
$ systemctl enable nfs-server rpcbind

Edit /etc/exports to include (exporting over IPoIB)

/opt 192.168.2.0/24(ro,sync,no_subtree_check)
/home 192.168.2.0/24(rw,sync,no_subtree_check,no_root_squash)
/scratch 192.168.2.0/24(rw,sync,no_subtree_check,no_root_squash)

NOTE: The /opt partition is read-only on purpose. See Mount only sub-directory in NFS export.

To enable NFSoRDMA:

  • On clients, the /etc/rdma/rdma.conf file contains a line that sets XPRTRDMA_LOAD=yes by default, which requests the rdma service to load the NFSoRDMA client module.
  • On the server, to enable automatic loading of NFSoRDMA server modules, add SVCRDMA_LOAD=yes on a new line in /etc/rdma/rdma.conf. And, edit /etc/sysconfig/nfs to set RPCNFSDARGS="--rdma=20049".

To configure an NFSv4-only server, edit /etc/sysconfig/nfs to include

RPCNFSDARGS="... -N 2 -N 3 -U"

Once all is done, restart the NSF server:

$ systemctl restart nfs

Check if the rdma port is listed:

$ cat /proc/fs/nfsd/portlist

NFS Client

Check the exported NFS shares first:

$ showmount -e node0

Mount the NFS shares (make sure directories exist):

$ mkdir -p /opt /home /scratch
$ mount -o vers=4 192.168.2.254:/opt /opt
$ mount -o vers=4 192.168.2.254:/home /home
$ mount -o vers=4 192.168.2.254:/scratch /scratch

Use the df -hT command to check the mounted NFS shares.

Edit /etc/fstab to enable automatic mount of NFS shares.

192.168.2.254:/opt    /opt    nfs     auto,noatime,nolock,bg,nfsvers=4,intr,actimeo=1800  0  0
192.168.2.254:/home   /home   nfs     auto,noatime,nolock,bg,nfsvers=4,intr,actimeo=1800  0  0
192.168.2.254:/scratch   /scratch   nfs     auto,noatime,nolock,bg,nfsvers=4,intr,actimeo=1800  0  0

To enable NFSoRDMA:

  1. On clients, the /etc/rdma/rdma.conf file contains a line that sets XPRTRDMA_LOAD=yes by default, which requests the rdma service to load the NFSoRDMA client module.
  2. Use mount -o ...,rdma,port=20049 ... option. Check the mount parameters: mount | grep [exported_share].
  3. Edit /etc/fstab to enable automatic mount of NFS shares.
    192.168.2.254:/opt    /opt    nfs     auto,noatime,nolock,bg,nfsvers=4,intr,actimeo=1800,rdma,port=20049  0  0
    192.168.2.254:/home   /home   nfs     auto,noatime,nolock,bg,nfsvers=4,intr,actimeo=1800,rdma,port=20049  0  0
    192.168.2.254:/scratch   /scratch   nfs     auto,noatime,nolock,bg,nfsvers=4,intr,actimeo=1800,rdma,port=20049  0  0
    

Slurm

Installing and configuring Slurm, the workload manager, properly have been the most challenging part.

Instructions at https://slurm.schedmd.com/quickstart_admin.html.

See https://slurm.schedmd.com/mpi_guide.html#pmix to enable pmix for MVAPICH2.

Useful tips and tricks at

Delete failed installation of Slurm (if any)

$ yum remove mariadb-server mariadb-devel -y
$ yum remove slurm munge munge-libs munge-devel -y
$ userdel -r slurm
$ suerdel -r munge

Create global users

Slurm and Munge require consistent UID and GID across every node in the cluster. For all the nodes, create users/groups before installing Slurm and Munge:

$ export MUNGEUSER=1001
$ groupadd -g $MUNGEUSER munge
$ useradd  -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge  -s /sbin/nologin munge
$ export SLURMUSER=1002
$ groupadd -g $SLURMUSER slurm
$ useradd  -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm  -s /bin/bash slurm

NOTE: The UID and GID less than 1000 are reserved for the system. Later, UIDs and GIDs will be assigned to 2000 and above.

Install Munge

Install Munge on all nodes:

$ yum install munge munge-libs munge-devel

Create a secret key on the head node. First install rig-tools to properly create the key:

$ yum install rng-tools
$ rngd -r /dev/urandom
$ /usr/sbin/create-munge-key -r
$ dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
$ chown munge: /etc/munge/munge.key
$ chmod 400 /etc/munge/munge.key

Send this key to all of the compute nodes:

$ scp /etc/munge/munge.key root@node1:/etc/munge
$ scp /etc/munge/munge.key root@node2:/etc/munge
$ scp /etc/munge/munge.key root@node3:/etc/munge
$ scp /etc/munge/munge.key root@node4:/etc/munge
$ scp /etc/munge/munge.key root@node5:/etc/munge
$ scp /etc/munge/munge.key root@node6:/etc/munge
$ scp /etc/munge/munge.key root@node7:/etc/munge
$ scp /etc/munge/munge.key root@node8:/etc/munge
$ scp /etc/munge/munge.key root@node9:/etc/munge
$ scp /etc/munge/munge.key root@node10:/etc/munge

Then ssh into every node and correct the permissions as well as start the Munge service:

$ chown -R munge: /etc/munge/ /var/log/munge/
$ chmod 0700 /etc/munge/ /var/log/munge/
$ systemctl enable munge
$ systemctl start munge
$ systemctl status munge

To test Munge, try to access another node with Munge from the head node:

$ munge -n
$ munge -n | munge
$ munge -n | ssh node1 unmunge
$ munge -n | ssh node2 unmunge
$ munge -n | ssh node3 unmunge
$ munge -n | ssh node4 unmunge
$ munge -n | ssh node5 unmunge
$ munge -n | ssh node6 unmunge
$ munge -n | ssh node7 unmunge
$ munge -n | ssh node8 unmunge
$ munge -n | ssh node9 unmunge
$ munge -n | ssh node10 unmunge
$ remunge

Install Slurm

Install a few dependencies:

$ yum install man2html libibmad libibumad python3 perl-ExtUtils-MakeMaker
$ yum install http-parser http-parser-devel
$ yum install openssl openssl-devel
$ yum install pam pam-devel
$ yum install numactl numactl-devel
$ yum install hwloc hwloc-libs hwloc-devel
$ yum install lua lua-devel
$ yum install readline readline-devel
$ yum install rrdtool rrdtool-devel
$ yum install ncurses ncurses-devel
$ yum install pmix pmix-devel
$ yum install json-c json-c-devel
$ yum install hdf5 hdf5-devel
$ yum install ucx ucx-devel
$ yum install mariadb mariadb-server mariadb-devel

NOTE: On compute nodes, the devel version can be omitted.

Download the latest version (https://www.schedmd.com/downloads.php) of Slurm:

$ wget https://download.schedmd.com/slurm/slurm-20.02.4.tar.bz2

RPM packaging:

$ rpmbuild -vv -ta --define '_with_slurmrestd 1' --define '_with_slurmsmwd 1' \
--define '_without_debug 1' --define '_with_hdf5 --with-hdf5=/usr' \
--define '_with_hwloc 1' --define '_with_lua --with-lua=/usr' \
--define '_with_mysql 1' --define '_with_numa 1' --define '_with_pam 1' \
--define '_without_x11 1' --define '_with_ucx --with-ucx=/usr' \
--define '_with_pmix --with-pmix=/usr' slurm-20.02.4.tar.bz2

Copy the built RPM packages over to the shared directory and install the appropriate version.

On all nodes,

$ yum localinstall slurm-20.02.4-1.el7.x86_64.rpm slurm-slurmd-20.02.4-1.el7.x86_64.rpm \
slurm-example-configs-20.02.4-1.el7.x86_64.rpm slurm-libpmi-20.02.4-1.el7.x86_64.rpm \
slurm-pam_slurm-20.02.4-1.el7.x86_64.rpm slurm-perlapi-20.02.4-1.el7.x86_64.rpm

NOTE: Since both Slurm and PMIx provide libpmi[2].so libraries, we recommend to install both pieces of software in different locations. Otherwise, both libraries provided by Slurm and PMIx might end up being installed under standard locations like /usr/lib64 and the package manager erroring out and reporting the conflict. It is planned to alleviate that by putting these libraries in a separate libpmi-slurm package.

Additionally, on the head node,

$ yum localinstall slurm-devel-20.02.4-1.el7.x86_64.rpm \
slurm-slurmctld-20.02.4-1.el7.x86_64.rpm slurm-slurmdbd-20.02.4-1.el7.x86_64.rpm \
slurm-slurmrestd-20.02.4-1.el7.x86_64.rpm slurm-slurmsmwd-20.02.4-1.el7.x86_64.rpm

Default ports for various Slurm daemons are:

slurmctld default port [6817]
slurmd default port [6818]
slurmdbd default port [6819]
slurmrestd default port [6820]

Set up in advance files and permissions to be used by Slurm. On all nodes,

$ mkdir -p /var/spool/slurm/ctld /var/spool/slurm/d /var/log/slurm
$ chown -R slurm: /var/spool/slurm /var/log/slurm

Configure SlurmDB

This only applies to the database node (in this case the head node).

Check out https://slurm.schedmd.com/accounting.html and https://github.com/Artlands/Install-Slurm.

Edit /etc/slurm/slurmdbd.conf:

#
# Example slurmdbd.conf file.
#
# See the slurmdbd.conf man page for more information.
#
# Archive info
#ArchiveJobs=yes
#ArchiveDir="/tmp"
#ArchiveSteps=yes
#ArchiveScript=
#JobPurge=12
#StepPurge=1
PurgeEventAfter=1month
PurgeJobAfter=12month
PurgeResvAfter=1month
PurgeStepAfter=1month
PurgeSuspendAfter=1month
PurgeTXNAfter=12month
PurgeUsageAfter=24month
#
# Authentication info
AuthType=auth/munge
#AuthInfo=/var/run/munge/munge.socket.2
#
# slurmDBD info
DbdAddr=localhost
DbdHost=localhost
#DbdPort=6819
SlurmUser=slurm
#MessageTimeout=300
DebugLevel=verbose
DefaultQOS=normal
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
PluginDir=/usr/lib64/slurm
PrivateData=accounts,users,usage,jobs
#TrackWCKey=yes
#
# Database info
StorageType=accounting_storage/mysql
#StorageHost=localhost
#StoragePort=3306
StoragePass=1234
StorageUser=slurm
#StorageLoc=slurm_acct_db

NOTE: Be sure to change the password. It should not contain "#" (which is used for commenting).

Set up file permission:

$ chown slurm: /etc/slurm/slurmdbd.conf
$ chmod 600 /etc/slurm/slurmdbd.conf

Enable and start MariaDB:

$ systemctl enable mariadb
$ systemctl start mariadb
$ systemctl status mariadb

Create the Slurm database user:

$ mysql
...
MariaDB[(none)]> GRANT ALL ON slurm_acct_db.* TO 'slurm'@'localhost' IDENTIFIED BY '1234' with grant option;
MariaDB[(none)]> SHOW VARIABLES LIKE 'have_innodb';
MariaDB[(none)]> FLUSH PRIVILEGES;
MariaDB[(none)]> CREATE DATABASE slurm_acct_db;
MariaDB[(none)]> quit;

NOTE: Be sure to set the password the same as StoragePass.

Verify the databases grants for the slurm user:

$ mysql -p -u slurm
Password:
...
MariaDB[(none)]> show grants;
MariaDB[(none)]> quit;

Edit /etc/my.cnf.d/innodb.cnf to contain

[mysqld]
innodb_buffer_pool_size=1024M
innodb_log_file_size=64M
innodb_lock_wait_timeout=900

To implement this change you have to shut down the database and move/remove logfiles:

$ systemctl stop mariadb
$ mv /var/lib/mysql/ib_logfile? /tmp/
$ systemctl start mariadb

Check the new setting in MySQL:

$ mysql -p -u slurm
Password:
...
MariaDB[(none)]> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
MariaDB[(none)]> quit;

Run slurndbd manually and check the log:

$ slurmdbd -D -vvv

Once cleared, enable and start slurmdbd service.

$ systemctl enable slurmdbd.service
$ systemctl start slurmdbd.service
$ systemctl status slurmdbd.service

Configure Slurm

See Intel OmniPath network fabric. Especially, the Memory Limit section.

On head node

Edit /etc/slurm/cgroup.conf:

###
#
# Slurm cgroup support configuration file
#
# See man slurm.conf and man cgroup.conf for further
# information on cgroup configuration parameters
#--
CgroupAutomount=yes

TaskAffinity=no
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes

NOTE: See the comment under TaskPlugin in the slurm.conf man page.

If the Topology plugin is to be enabled (see slurm.conf below), edit /etc/slurm/topology.conf:

#
# topology.conf
# Slurm switch configuration
#
# Generated by opa2slurm <http://github.com/jtfrey/opa2slurm>
#

#
# Switch GUID 0x00117501020C645C
#
SwitchName=OmniPth00117501ff0c645c Nodes=node[0-10] LinkSpeed=16

NOTE: This can be generated for the Intel OPA-enabled system via the program at http://github.com/jtfrey/opa2slurm. (See Topology Guide.)

Edit /etc/slurm/slurm.conf:

#
# Example slurm.conf file. Please run configurator.html
# (in doc/html) to build a configuration file customized
# for your environment.
#
#
# slurm.conf file generated by configurator.html.
#
# See the slurm.conf man page for more information.
#
ClusterName=hpc.kyungguk.com
SlurmctldHost=node0
SlurmUser=slurm
#SlurmctldPort=6817
#SlurmdPort=6818
AuthType=auth/munge
#JobCredentialPrivateKey=
#JobCredentialPublicCertificate=
StateSaveLocation=/var/spool/slurm/ctld
SlurmdSpoolDir=/var/spool/slurm/d
SwitchType=switch/none
MpiDefault=pmi2
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/cgroup
PluginDir=/usr/lib64/slurm
#FirstJobId=
ReturnToService=2
#MaxJobCount=10000
#PlugStackConfig=
#PropagatePrioProcess=
#PropagateResourceLimits=
PropagateResourceLimitsExcept=MEMLOCK
#Prolog=/etc/slurm/prolog.d/*
#Epilog=/etc/slurm/epilog.d/*
#SrunProlog=
#SrunEpilog=
#TaskProlog=
#TaskEpilog=
TaskPlugin=task/affinity,task/cgroup
TaskPluginParam=Cores
PrologFlags=contain
#TrackWCKey=no
#TreeWidth=50
#TmpFS=
UsePAM=1
TopologyPlugin=topology/tree
RebootProgram=/usr/sbin/reboot
#
CpuFreqDef=Performance
DisableRootJobs=yes
EnforcePartLimits=all
#
# TIMERS
#SlurmctldTimeout=120
SlurmdTimeout=30
InactiveLimit=10
#MinJobAge=300
#KillWait=30
WaitTime=60
#
# SCHEDULING
SchedulerType=sched/backfill
PreemptType=preempt/qos
PreemptMode=REQUEUE
SelectType=select/cons_res
SelectTypeParameters=CR_ONE_TASK_PER_CORE,CR_Core_Memory
PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityUsageResetPeriod=MONTHLY
PriorityWeightFairshare=10000
PriorityWeightAssoc=1000
PriorityWeightQOS=10
PriorityWeightPartition=10
PriorityWeightTRES=cpu=2000,mem=1000
PriorityWeightAge=1000
PriorityWeightJobSize=1000
#PriorityFavorSmall=YES
#PriorityMaxAge=1-0
#
# LOGGING
SlurmctldParameters=enable_configless
SlurmctldDebug=verbose
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=verbose
SlurmdLogFile=/var/log/slurm/slurmd.log
JobCompType=jobcomp/none
JobCompLoc=/var/log/slurm/jobcomp.log
#
# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
#JobAcctGatherFrequency=30
#
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageEnforce=associations,limits,qos
AccountingStorageHost=localhost
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStorageUser=
#PrivateData=jobs,usage
#
# COMPUTE NODES
NodeName=node[0-0]  CPUs=64  Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=192098 Feature=2.9GHz CoreSpecCount=4
NodeName=node[1-8]  CPUs=112 Boards=1 SocketsPerBoard=2 CoresPerSocket=28 ThreadsPerCore=2 RealMemory=192091 Feature=2.2GHz
NodeName=node[9-10] CPUs=112 Boards=1 SocketsPerBoard=2 CoresPerSocket=28 ThreadsPerCore=2 RealMemory=257642 Feature=2.6GHz
#
# PARTITIONS
#PartitionName=compute  Nodes=node[1-8]  Default=NO MaxTime=UNLIMITED DefMemPerCPU=1714 MaxMemPerNode=192000 SelectTypeParameters=CR_Socket_Memory DenyQos=compile
PartitionName=compute1 Nodes=node[1-8]  Default=NO MaxTime=UNLIMITED DefMemPerCPU=1714 MaxMemPerNode=192000 SelectTypeParameters=CR_Core_Memory DenyQos=compile
PartitionName=compute2 Nodes=node[9-10] Default=NO MaxTime=UNLIMITED DefMemPerCPU=2285 MaxMemPerNode=256000 SelectTypeParameters=CR_Core_Memory DenyQos=compile
PartitionName=allnodes Nodes=node[1-10] Default=NO MaxTime=UNLIMITED DefMemPerCPU=1714 MaxMemPerNode=192000 SelectTypeParameters=CR_Core_Memory DenyQos=compile
PartitionName=frontend Nodes=node[0-0]  Default=NO MaxTime=UNLIMITED DefMemPerCPU=3001 MaxMemPerNode=168085 MaxCPUsPerNode=56 SelectTypeParameters=CR_Core_Memory AllowQos=compile,debug,normal

Modify the compute node specs based on the information returned by slurmd -C command.

To lock out the user login session, if any, edit epilog script /etc/slurm/epilog.d/lock_user_login.sh to include

#!/bin/bash

if systemctl -t slice status user-${SLURM_JOB_UID}.slice > /dev/null; then
    systemctl kill --signal=SIGKILL user-${SLURM_JOB_UID}.slice
else
    exit 0
fi

Make it executable.

Try slurmctld in foreground and check the log:

$ slurmctld -D -v

Once cleared, enable and start slurmctld service.

$ systemctl enable slurmctld.service
$ systemctl start slurmctld.service
$ systemctl status slurmctld.service

On compute nodes

To enable configless slurmd, edit /etc/sysconfig/slurmd on all nodes to include

SLURMD_OPTIONS=--conf-server=node0
SLURM_CONF=

Try slurmd in foreground and check the log:

$ slurmd -D -vvv --conf-server=192.168.1.254

If the head node is also to serve as a compute node, edit /usr/lib/systemd/system/slurmd.service to include the dependency

After=... slurmctld.service

Enable and start slurmd.service:

$ systemctl enable slurmd.service
$ systemctl start slurmd.service
$ systemctl status slurmd.service

Logrotate

See How to Setup and Manage Log Rotation Using Logrotate in Linux.

To keep the log files under control, edit /etc/logrotate.d/slurm on every node

/var/log/slurm/*.log {
    missingok
    notifempty
    copytruncate
}

Do a dry-run to confirm the settings

$ logrotate -d /etc/logrotate.d/slurm

Control user's limits on and access to compute nodes

User's limits on or access to compute nodes can be controlled using PAM module. The following articles discuss its use:

But, the official guide from

seem working.

The RPM package will install the PAM modules on /lib64/security.

Follow the instructions at https://slurm.schedmd.com/faq.html#pam to enable Slurm's use of PAM by setting UsePAM=1 in slurm.conf (head node only).

Then, establish PAM configuration file(s) for Slurm by adding to /etc/pam.d/slurm on all nodes (including the head node):

#%PAM-1.0
account         required        pam_unix.so
account         required        pam_slurm.so
auth            required        pam_localuser.so
session         required        pam_limits.so

Consult https://slurm.schedmd.com/faq.html#pam for how to impose the limits.

To prevent sshing compute nodes, follow the instructions at https://slurm.schedmd.com/pam_slurm_adopt.html. Specifically, on compute nodes add the following line to /etc/pam.d/sshd

-account    required      pam_slurm_adopt.so

Here, pam_slurm_adopt.so should be the last PAM module in the account stack. ("-" sign instructs to skip if the module is not found.)

To disable pam_systemd module, visit /etc/pam.d, copy password-auth-ac to slurm-password-auth, and edit slurm-password-auth to comment out the following

#account    sufficient    pam_localuser.so
#-session   optional      pam_systemd.so

Edit /etc/pam.d/sshd to replace the occurrence of password-auth with slurm-password-auth. This prevents rolling back of the settings by the autogenerating scripts.

User accouting

See Accounting and Resource Limits and Resource Limits.

Rules are:

  1. IMPORTANT: All nodes under Cluster must have the same users' /etc/passwd and /etc/group database. On the other hand, the computing nodes need not have the password database. (The access is blocked anyway.)

  2. The user accounting and job scheduling can be strictly enforced by Slurm: "Only the users to whom the computing resources are granted can schedule jobs. No users can directly run their jobs on the computing nodes."

  3. The head node is intended for users to prepare source codes, compile them, submit job scripts, monitor running/queued job status, and manage running/queued jobs. To minimize no other usage on the head node, only limited resources are granted to users on the head node.

  4. As for work space, all users will be assigned a home directory under the /home partition and a scratch directory under the /scratch partition. The disk usage on these partitions is managed by xfs_quota.

  5. The soft and hard block limits (bsoft and bhard, respectively) on the /home partition are 1000GB and 1200GB, respectively. Its intended use is to keep the source codes and dependent libraries necessary for running jobs. Because of the tight disk quota, it is not intended to keep intermediate results of computation, for which scratch space should be used.

  6. The soft and hard block limits on the /scratch partition are 5TB and 5200GB, respectively. Unlike the /home partition, the system periodically checks files and directories in the /scratch partition, so those not being touched for the past 30 days will be purged. Its intended use is to temporary hold intermediate results of computation which are expected to be fetched in a timely manner by the user after a job is completed.

NOTE: The exact limits may be different. Refer to Admin Guide for details.


Environment Modules

Install environment modules on the head node:

$ yum install environment-modules

To show modulefiles by category, append the path to the top modulefiles directory for that category to /usr/share/Modules/init/.modulespath.

For example, open /usr/share/Modules/init/.modulespath using vim and execute r ! find /etc/modulefiles -mindepth 1 -maxdepth 1 -type d in command mode, which will append the directory paths under /etc/modulefiles.

The default modulefile can be specified by creating .version and populating

#%Module1.0#####################################################################
##
## version file for Perl
##
set ModulesVersion "native"

where native is a modulefile (or a sub-directory) in that directory. See https://modules.readthedocs.io/en/latest/FAQ.html.

Check out https://www.admin-magazine.com/HPC/Articles/Environment-Modules for building module files.


MPIs

See Intel OmniPath network fabric.

NOTE: It is instructive to build MPIs for every compiler.

NOTE: See OmniPath fabric: srun error PSM2 can't open hfi unit: -1 (err=23) for potential issue. Search Problem SOLVED in the webpage.

NOTE: It appears that the PSM2 error also occurs when more than two jobs are running on multiple nodes with the total tasks exceeding the total number of cores. Setting SelectTypeParameters=CR_ONE_TASK_PER_CORE,... in slurm.conf seems to solve this problem.

IMPORTANT: The CPU Affinity is on by default. (TODO: Find out how to disable it on compile time.) However, it must be disabled on runtime by exporting environment variable. Since the CPU resource allocation is already controlled by the Slurm workload manager, enabling it results in poor performance or even worse unexpected termination of jobs.

$ export MV2_ENABLE_AFFINITY=0

One can apply this setting globally by adding a file mvapich2_disable_affinity.sh to /etc/profile.d/ with the content

export MV2_ENABLE_AFFINITY=0

or by executing

$ echo "export MV2_ENABLE_AFFINITY=0" > /etc/profile.d/mvapich2_disable_affinity.sh

MVAPICH2

See MVAPICH2 2.0 User Guide.

The source tarball can be obtained here: http://mvapich.cse.ohio-state.edu/downloads/.

NOTE: Tested/working versions are 2.3.4, 2.3.5, 2.3.6, and 2.3.7. (Simply replace the appropriate version numbers below.) The latest version seems to have some issues with newer versions of g++ compiler.

With Intel Omni-Path PSM2

Version 2.3.4

On the head node,

$ wget http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.4.tar.gz
$ tar xzvf mvapich2-2.3.4.tar.gz
$ mkdir build && cd build
$ ../mvapich2-2.3.4/configure --prefix=/opt/mvapich2/2.3.4 \
--with-device=ch3:psm --with-pm=slurm --with-pmi=pmi2 \
--enable-fortran --enable-threads=runtime
$ make && make install

See config.log.mvapich2 for configuration options.

On the head node, edit /etc/modulefiles/mpi/mvapich2/2.3.4

#%Module1.0#####################################################################
##
## modules mvapich2-2.3.4
##
## modulefiles/mvapich2-2.3.4.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the MVAPICH2 environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the MVAPICH2 environment"

# for Tcl script use only
set     package         mvapich2
set     version         2.3.4
set     prefix          /opt/$package/$version


#setenv          MV2_ENABLE_AFFINITY 0
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    CPATH           $prefix/include
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LIBRARY_PATH    $prefix/lib

Update MODULEPATH as described in Environment Modules.

Version 2.3.5

On the head node,

$ wget http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.5.tar.gz
$ tar xzvf mvapich2-2.3.5.tar.gz
$ mkdir build && cd build
$ ../mvapich2-2.3.5/configure --prefix=/opt/mvapich2/2.3.5 \
--with-device=ch3:psm --with-pm=slurm --with-pmi=pmi2 \
--enable-fortran --enable-threads=runtime
$ make && make install

See config.log.mvapich2 for configuration options.

On the head node, edit /etc/modulefiles/mpi/mvapich2/2.3.5

#%Module1.0#####################################################################
##
## modules mvapich2-2.3.5
##
## modulefiles/mvapich2-2.3.5.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the MVAPICH2 environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the MVAPICH2 environment"

# for Tcl script use only
set     package         mvapich2
set     version         2.3.5
set     prefix          /opt/$package/$version


#setenv          MV2_ENABLE_AFFINITY 0
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    CPATH           $prefix/include
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LIBRARY_PATH    $prefix/lib

Update MODULEPATH as described in Environment Modules.

Version 2.3.6

On the head node,

$ wget http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.6.tar.gz
$ tar xzvf mvapich2-2.3.6.tar.gz
$ mkdir build && cd build
$ ../mvapich2-2.3.6/configure --prefix=/opt/mvapich2/2.3.6 \
--with-device=ch3:psm --with-pm=slurm --with-pmi=pmi2 \
--enable-fortran --enable-threads=runtime
$ make && make install

See config.log.mvapich2 for configuration options.

On the head node, edit /etc/modulefiles/mpi/mvapich2/2.3.6

#%Module1.0#####################################################################
##
## modules mvapich2-2.3.6
##
## modulefiles/mvapich2-2.3.6.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the MVAPICH2 environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the MVAPICH2 environment"

# for Tcl script use only
set     package         mvapich2
set     version         2.3.6
set     prefix          /opt/$package/$version


#setenv          MV2_ENABLE_AFFINITY 0
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    CPATH           $prefix/include
prepend-path    LD_LIBRARY_PATH $prefix/lib

Update MODULEPATH as described in Environment Modules.

Version 2.3.7

On the head node,

$ curl -O https://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.7-1.tar.gz
$ tar xzvf mvapich2-2.3.7-1.tar.gz
$ mkdir build && cd build
$ ../mvapich2-2.3.7-1/configure --prefix=/opt/mvapich2/2.3.7 \
--with-device=ch3:psm --with-pm=slurm --with-pmi=pmi2 \
--enable-fortran --enable-threads=runtime
$ make && make install

See config.log.mvapich2 for configuration options.

On the head node, edit /etc/modulefiles/mpi/mvapich2/2.3.7

#%Module1.0#####################################################################
##
## modules mvapich2-2.3.7
##
## modulefiles/mvapich2-2.3.7.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the MVAPICH2 environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the MVAPICH2 environment"

# for Tcl script use only
set     package         mvapich2
set     version         2.3.7
set     prefix          /opt/$package/$version


#setenv          MV2_ENABLE_AFFINITY 0
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    CPATH           $prefix/include
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LIBRARY_PATH    $prefix/lib

Update MODULEPATH as described in Environment Modules.

Common

Test bandwidth using MPI Bandwidth Test Code.

  • NOTE: --mpi=pmi2 option is needed for srun, unless MpiDefault=pmi2 in slurm.conf.

With OFA-IB (Experimental)

$ ../mvapich2-2.3.4/configure --prefix=/opt/mvapich2/2.3.4-ofa --with-device=ch3:mrail \
--with-rdma=gen2 --with-pm=slurm --with-pmi=pmi2 --enable-fortran --enable-threads=runtime

NOTE: The compilation will fail with error:

../mvapich2-2.3.4/src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c:27:23: fatal error: romioconf.h: No such file or directory

Then do

$ make CPATH=./src/mpi/romio/adio/include

MVAPICH

Version 3.0 (experimental)

NOTE: Using --with-pmi=pmi2 results in an error of undefined type.

On the head node,

$ curl -O http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich-3.0.tar.gz
$ tar xzvf mvapich-3.0.tar.gz
$ cd mvapich-3.0
$ ./configure --prefix=/opt/mvapich/3.0 \
--with-device=ch4:ofi --with-pm=slurm --with-pmi=pmi1 \
--enable-fortran --enable-threads=runtime
$ make && make install

On the head node, edit /etc/modulefiles/mpi/mvapich/3.0

#%Module1.0#####################################################################
##
## modules mvapich-3.0
##
## modulefiles/mvapich-3.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the MVAPICH environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the MVAPICH environment"

# for Tcl script use only
set     package         mvapich
set     version         3.0
set     prefix          /opt/$package/$version


#setenv          MV2_ENABLE_AFFINITY 0
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    CPATH           $prefix/include
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LIBRARY_PATH    $prefix/lib

Update MODULEPATH as described in Environment Modules.

OpenMPI

See Installation Guide with Slurm.

Source at https://www.open-mpi.org.

NOTE: Tested/working versions are 4.0.5, 4.1.1, and 4.1.2. (Simply replace the appropriate version numbers below.) The latest version doesn't seem to work with LLVM and GCC compilers and not recommended to use or install.

With Intel Omni-Path PSM2

After some trial-and-error, the following seems working on this system

Version 4.0.5
$ wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.5.tar.gz
$ tar xzvf openmpi-4.0.5.tar.gz
$ mkdir -p build && cd build
$ ../openmpi-4.0.5/configure --prefix=/opt/openmpi/4.0.5 --enable-static --enable-shared --with-pmi --with-ucx=no --with-verbs=no --with-slurm --with-psm2
$ make && make install

See config.log.openmpi for configuration options.

On the head node, edit /etc/modulefiles/mpi/openmpi/4.0.5

#%Module1.0#####################################################################
##
## modules openmpi-4.0.5
##
## modulefiles/openmpi-4.0.5.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the OpenMPI environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the OpenMPI environment"

# for Tcl script use only
set     package         openmpi
set     version         4.0.5
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    CPATH           $prefix/include
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LIBRARY_PATH    $prefix/lib

Update MODULEPATH as described in Environment Modules.

Version 4.1.1
$ wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.1.tar.gz
$ tar xzvf openmpi-4.1.1.tar.gz
$ mkdir -p build && cd build
$ ../openmpi-4.1.1/configure --prefix=/opt/openmpi/4.1.1 \
--enable-static --enable-shared --with-pmi --with-ucx=no \
--with-verbs=no --with-slurm --with-psm2
$ make && make install

See config.log.openmpi for configuration options.

On the head node, edit /etc/modulefiles/mpi/openmpi/4.1.1

#%Module1.0#####################################################################
##
## modules openmpi-4.1.1
##
## modulefiles/openmpi-4.1.1.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the OpenMPI environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the OpenMPI environment"

# for Tcl script use only
set     package         openmpi
set     version         4.1.1
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    CPATH           $prefix/include
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LIBRARY_PATH    $prefix/lib

Update MODULEPATH as described in Environment Modules.

Version 4.1.2
$ wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.2.tar.gz
$ tar xzvf openmpi-4.1.2.tar.gz
$ mkdir -p build && cd build
$ ../openmpi-4.1.2/configure --prefix=/opt/openmpi/4.1.2 \
--enable-static --enable-shared --with-pmi --with-ucx=no \
--with-verbs=no --with-slurm --with-psm2
$ make && make install

See config.log.openmpi for configuration options.

On the head node, edit /etc/modulefiles/mpi/openmpi/4.1.2

#%Module1.0#####################################################################
##
## modules openmpi-4.1.2
##
## modulefiles/openmpi-4.1.2.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the OpenMPI environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the OpenMPI environment"

# for Tcl script use only
set     package         openmpi
set     version         4.1.2
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    CPATH           $prefix/include
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LIBRARY_PATH    $prefix/lib

Update MODULEPATH as described in Environment Modules.


Test bandwidth using MPI Bandwidth Test Code.

  • NOTE: --mpi=pmi2 option is needed for srun, unless MpiDefault=pmi2 in slurm.conf.

With UCX

$ ../openmpi-4.0.5/configure --prefix=/opt/openmpi/4.0.5-ucx --enable-static --enable-shared --with-pmi --with-ucx=yes --with-verbs=no --with-slurm --with-psm2=no --with-ofi=no

MPICH

$ ../mpich-3.3.2/configure --prefix=/opt/mpich/3.3.2-ucx --enable-fast=all --enable-fortran --enable-threads=runtime --with-device=ch4:ucx --with-pmi=pmi2 --with-pm=no --with-slurm LIBS=-lpmi2

NOTE: This compiles, but fails at runtime.


Compilers

GCC

Guide at https://medium.com/@bipul.k.kuri/install-latest-gcc-on-centos-linux-release-7-6-a704a11d943d.

Official mirror page at http://gcc.gnu.org/mirrors.html.

Install dependencies (on the head node):

$ yum install gmp gmp-devel mpfr mpfr-devel libmpc libmpc-devel

Version 7.5.0

On the head node, build

$ GCC_VERSION=7.5.0
$ wget https://ftp.gnu.org/gnu/gcc/gcc-${GCC_VERSION}/gcc-${GCC_VERSION}.tar.gz
$ tar xzvf gcc-${GCC_VERSION}.tar.gz
$ mkdir build && cd build
$ ../gcc-${GCC_VERSION}/configure --prefix=/opt/gcc/${GCC_VERSION} --enable-lto --disable-multilib --enable-gold --enable-ld
$ make -j20 && make install

On the head node, edit /etc/modulefiles/compiler/gcc/7.5.0

#%Module1.0#####################################################################
##
## modules gcc-7.5.0
##
## modulefiles/gcc-7.5.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the GCC environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the GCC environment"

# for Tcl script use only
set     package         gcc
set     version         7.5.0
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/gcc
setenv          OMPI_CC         $prefix/bin/gcc
setenv          MPICH_CC        $prefix/bin/gcc
setenv          CXX             $prefix/bin/g++
setenv          OMPI_CXX        $prefix/bin/g++
setenv          MPICH_CXX       $prefix/bin/g++
setenv          FC              $prefix/bin/gfortran
setenv          OMPI_FC         $prefix/bin/gfortran
setenv          MPICH_FC        $prefix/bin/gfortran
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib64

Version 8.5.0

On the head node, build

$ wget http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-8.5.0/gcc-8.5.0.tar.gz
$ tar xzvf gcc-8.5.0.tar.gz
$ mkdir build && cd build
$ ../gcc-8.5.0/configure --prefix=/opt/gcc/8.5.0 --enable-lto --disable-multilib --enable-gold --enable-ld
$ make -j20 && make install

See config.log.gcc for configuration options.

On the head node, edit /etc/modulefiles/compiler/gcc/8.5.0

#%Module1.0#####################################################################
##
## modules gcc-8.5.0
##
## modulefiles/gcc-8.5.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the GCC environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the GCC environment"

# for Tcl script use only
set     package         gcc
set     version         8.5.0
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/gcc
setenv          OMPI_CC         $prefix/bin/gcc
setenv          MPICH_CC        $prefix/bin/gcc
setenv          CXX             $prefix/bin/g++
setenv          OMPI_CXX        $prefix/bin/g++
setenv          MPICH_CXX       $prefix/bin/g++
setenv          FC              $prefix/bin/gfortran
setenv          OMPI_FC         $prefix/bin/gfortran
setenv          MPICH_FC        $prefix/bin/gfortran
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib64

Version 9.5.0

On the head node, build

$ git clone git://gcc.gnu.org/git/gcc.git gcc-git
$ cd gcc-git
$ git checkout releases/gcc-9.5.0
$ cd .. && mkdir -p build && cd build
$ ../gcc-git/configure --prefix=/opt/gcc/9.5.0 --enable-lto --disable-multilib --enable-gold --enable-ld
$ make -j20 && make install

On the head node, edit /etc/modulefiles/compiler/gcc/9.5.0

#%Module1.0#####################################################################
##
## modules gcc-9.5.0
##
## modulefiles/gcc-9.5.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the GCC environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the GCC environment"

# for Tcl script use only
set     package         gcc
set     version         9.5.0
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/gcc
setenv          OMPI_CC         $prefix/bin/gcc
setenv          MPICH_CC        $prefix/bin/gcc
setenv          CXX             $prefix/bin/g++
setenv          OMPI_CXX        $prefix/bin/g++
setenv          MPICH_CXX       $prefix/bin/g++
setenv          FC              $prefix/bin/gfortran
setenv          OMPI_FC         $prefix/bin/gfortran
setenv          MPICH_FC        $prefix/bin/gfortran
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib64

Version 10.5.0

On the head node, build

$ git clone git://gcc.gnu.org/git/gcc.git gcc-git
$ cd gcc-git
$ git checkout releases/gcc-10.5.0
$ cd .. && mkdir -p build && cd build
$ ../gcc-git/configure --prefix=/opt/gcc/10.5.0 --enable-lto --disable-multilib --enable-gold --enable-ld
$ make -j20 && make install

On the head node, edit /etc/modulefiles/compiler/gcc/10.5.0

#%Module1.0#####################################################################
##
## modules gcc-10.5.0
##
## modulefiles/gcc-10.5.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the GCC environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the GCC environment"

# for Tcl script use only
set     package         gcc
set     version         10.5.0
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/gcc
setenv          OMPI_CC         $prefix/bin/gcc
setenv          MPICH_CC        $prefix/bin/gcc
setenv          CXX             $prefix/bin/g++
setenv          OMPI_CXX        $prefix/bin/g++
setenv          MPICH_CXX       $prefix/bin/g++
setenv          FC              $prefix/bin/gfortran
setenv          OMPI_FC         $prefix/bin/gfortran
setenv          MPICH_FC        $prefix/bin/gfortran
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib64

Version 11.5.0

On the head node, build

$ git clone git://gcc.gnu.org/git/gcc.git gcc-git
$ cd gcc-git
$ git checkout releases/gcc-11.5.0
$ cd .. && mkdir -p build && cd build
$ ../gcc-git/configure --prefix=/opt/gcc/11.5.0 --enable-lto --disable-multilib --enable-gold --enable-ld
$ make -j20 && make install

On the head node, edit /etc/modulefiles/compiler/gcc/11.5.0

#%Module1.0#####################################################################
##
## modules gcc-11.5.0
##
## modulefiles/gcc-11.5.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the GCC environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the GCC environment"

# for Tcl script use only
set     package         gcc
set     version         11.5.0
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/gcc
setenv          OMPI_CC         $prefix/bin/gcc
setenv          MPICH_CC        $prefix/bin/gcc
setenv          CXX             $prefix/bin/g++
setenv          OMPI_CXX        $prefix/bin/g++
setenv          MPICH_CXX       $prefix/bin/g++
setenv          FC              $prefix/bin/gfortran
setenv          OMPI_FC         $prefix/bin/gfortran
setenv          MPICH_FC        $prefix/bin/gfortran
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib64

Version 12.4.0

On the head node, build

$ git clone git://gcc.gnu.org/git/gcc.git gcc-git
$ cd gcc-git
$ git checkout releases/gcc-12.4.0
$ cd .. && mkdir -p build && cd build
$ ../gcc-git/configure --prefix=/opt/gcc/12.4.0 --enable-lto --disable-multilib --enable-gold --enable-ld
$ make -j20 && make install

On the head node, edit /etc/modulefiles/compiler/gcc/12.4.0

#%Module1.0#####################################################################
##
## modules gcc-12.4.0
##
## modulefiles/gcc-12.4.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the GCC environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the GCC environment"

# for Tcl script use only
set     package         gcc
set     version         12.4.0
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/gcc
setenv          OMPI_CC         $prefix/bin/gcc
setenv          MPICH_CC        $prefix/bin/gcc
setenv          CXX             $prefix/bin/g++
setenv          OMPI_CXX        $prefix/bin/g++
setenv          MPICH_CXX       $prefix/bin/g++
setenv          FC              $prefix/bin/gfortran
setenv          OMPI_FC         $prefix/bin/gfortran
setenv          MPICH_FC        $prefix/bin/gfortran
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib64

Update MODULEPATH as described in Environment Modules.

Build GCC again using the new compiler.

Version 13.3.0

On the head node, build

$ git clone git://gcc.gnu.org/git/gcc.git gcc-git
$ cd gcc-git
$ git checkout releases/gcc-13.3.0
$ cd .. && mkdir -p build && cd build
$ ../gcc-git/configure --prefix=/opt/gcc/13.3.0 --enable-lto --disable-multilib --enable-gold --enable-ld
$ make -j20 && make install

On the head node, edit /etc/modulefiles/compiler/gcc/13.3.0

#%Module1.0#####################################################################
##
## modules gcc-13.3.0
##
## modulefiles/gcc-13.3.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the GCC environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the GCC environment"

# for Tcl script use only
set     package         gcc
set     version         13.3.0
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/gcc
setenv          OMPI_CC         $prefix/bin/gcc
setenv          MPICH_CC        $prefix/bin/gcc
setenv          CXX             $prefix/bin/g++
setenv          OMPI_CXX        $prefix/bin/g++
setenv          MPICH_CXX       $prefix/bin/g++
setenv          FC              $prefix/bin/gfortran
setenv          OMPI_FC         $prefix/bin/gfortran
setenv          MPICH_FC        $prefix/bin/gfortran
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib64

Update MODULEPATH as described in Environment Modules.

Build GCC again using the new compiler.

Version 14.2.0

On the head node, build

$ git clone git://gcc.gnu.org/git/gcc.git gcc-git
$ cd gcc-git
$ git checkout releases/gcc-14.2.0
$ cd .. && mkdir -p build && cd build
$ ../gcc-git/configure --prefix=/opt/gcc/14.2.0 \
    --enable-lto --disable-multilib --enable-gold \
    --enable-ld
$ make -j20
$ make install

On the head node, edit /etc/modulefiles/compiler/gcc/14.2.0

#%Module1.0#####################################################################
##
## modules gcc-14.2.0
##
## modulefiles/gcc-14.2.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the GCC environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the GCC environment"

# for Tcl script use only
set     package         gcc
set     version         14.2.0
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/gcc
setenv          OMPI_CC         $prefix/bin/gcc
setenv          MPICH_CC        $prefix/bin/gcc
setenv          CXX             $prefix/bin/g++
setenv          OMPI_CXX        $prefix/bin/g++
setenv          MPICH_CXX       $prefix/bin/g++
setenv          FC              $prefix/bin/gfortran
setenv          OMPI_FC         $prefix/bin/gfortran
setenv          MPICH_FC        $prefix/bin/gfortran
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib64

Update MODULEPATH as described in Environment Modules.

Build GCC again using the new compiler.

Clang-LLVM

See Getting Started with the LLVM System.

NOTE: These Clang compilers require the use of --stdlib=libc++ option for C++ code compilation and --stdlib=libc++ -fuse-ld=lld for linking (without --stdlib=libc++ when linking pure C code).

Install dependencies (on the head node):

$ yum install cmake3 libedit-devel libxml2-devel xz-devel

(Optional) Install ninja (on the head node):

$ yum install ninja

Clone LLVM Project repository.

$ git clone https://github.com/llvm/llvm-project.git

Version 5.0.2

On the head node, edit /etc/modulefiles/compiler/llvm/5.0.2

#%Module1.0#####################################################################
##
## modules llvm-5.0.2
##
## modulefiles/llvm-5.0.2.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         5.0.2
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib

Build:

$ cd llvm-project
$ git checkout llvmorg-5.0.2
$ cd .. && rm -rf build
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/5.0.2 \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;compiler-rt;libcxx;libcxxabi;libunwind;lld;openmp;polly"
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 9.0.1

On the head node, edit /etc/modulefiles/compiler/llvm/9.0.1

#%Module1.0#####################################################################
##
## modules llvm-9.0.1
##
## modulefiles/llvm-9.0.1.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         9.0.1
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib

Build:

$ cd llvm-project
$ git checkout llvmorg-9.0.1
$ cd .. && rm -rf build
$ module load gcc/7.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/9.0.1 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi;compiler-rt;lld"
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 10.0.1

On the head node, edit /etc/modulefiles/compiler/llvm/10.0.1

#%Module1.0#####################################################################
##
## modules llvm-10.0.1
##
## modulefiles/llvm-10.0.1.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         10.0.1
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib

Build:

$ cd llvm-project
$ git checkout llvmorg-10.0.1
$ cd .. && rm -rf build
$ module load gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/10.0.1 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;compiler-rt;libclc;libcxx;libcxxabi;libunwind;lld;openmp;parallel-libs;polly;pstl"
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 11.1.0

On the head node, edit /etc/modulefiles/compiler/llvm/11.1.0

#%Module1.0#####################################################################
##
## modules llvm-11.1.0
##
## modulefiles/llvm-11.1.0.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         11.1.0
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib

Build:

$ cd llvm-project
$ git checkout llvmorg-11.1.0
$ cd .. && rm -rf build
$ module load gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/11.1.0 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;libcxx;libcxxabi;compiler-rt;libunwind;openmp"
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 12.0.1

On the head node, edit /etc/modulefiles/compiler/llvm/12.0.1

#%Module1.0#####################################################################
##
## modules llvm-12.0.1
##
## modulefiles/llvm-12.0.1.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         12.0.1
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib

Build:

$ cd llvm-project
$ git checkout llvmorg-12.0.1
$ cd .. && rm -rf build
$ module load gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/12.0.1 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;libcxx;libcxxabi;compiler-rt;libunwind;openmp"
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 13.0.1

On the head node, edit /etc/modulefiles/compiler/llvm/13.0.1

#%Module1.0#####################################################################
##
## modules llvm-13.0.1
##
## modulefiles/llvm-13.0.1.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         13.0.1
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib

Build:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ git checkout llvmorg-13.0.1
$ cd .. && rm -rf build
$ module load gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/13.0.1 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;libcxx;libcxxabi;libunwind;openmp;compiler-rt"
$ ninja -j24 -C build
$ ninja -j1 -C build install

NOTE: The compilation fails due to the declarition of constexpr destructor in the optional headers. Removing the constexpr from the destructors and recompilation seem to solve the issue.

Version 14.0.6

On the head node, edit /etc/modulefiles/compiler/llvm/14.0.6

#%Module1.0#####################################################################
##
## modules llvm-14.0.6
##
## modulefiles/llvm-14.0.6.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         14.0.6
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LD_LIBRARY_PATH $prefix/lib/x86_64-unknown-linux-gnu

Build:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ git checkout llvmorg-14.0.6
$ cd .. && rm -rf build
$ module load gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/14.0.6 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;openmp;compiler-rt;polly;pstl" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind"
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 15.0.7

On the head node, edit /etc/modulefiles/compiler/llvm/15.0.7

#%Module1.0#####################################################################
##
## modules llvm-15.0.7
##
## modulefiles/llvm-15.0.7.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         15.0.7
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LD_LIBRARY_PATH $prefix/lib/x86_64-unknown-linux-gnu

Build:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ git checkout llvmorg-15.0.7
$ cd .. && rm -rf build
$ module load gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/15.0.7 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;openmp;compiler-rt;polly;pstl" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind"
$ ninja -j24 -C build
$ ninja -j1 -C build install

NOTE: In case of error, error: modification of '<temporary>' is not a constant expression, go to the file where the error occurred and replace constexpr with static const in the variable decl.

Version 16.0.4

On the head node, edit /etc/modulefiles/compiler/llvm/16.0.4

#%Module1.0#####################################################################
##
## modules llvm-16.0.4
##
## modulefiles/llvm-16.0.4.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         16.0.4
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LD_LIBRARY_PATH $prefix/lib/x86_64-unknown-linux-gnu

Build (requires cmake version 3.20.0 or above):

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ git checkout llvmorg-16.0.4
$ cd .. && rm -rf build
$ module load cmake gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/16.0.4 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;openmp;polly;pstl" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind"
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 17.0.6

On the head node, edit /etc/modulefiles/compiler/llvm/17.0.6

#%Module1.0#####################################################################
##
## modules llvm-17.0.6
##
## modulefiles/llvm-17.0.6.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         17.0.6
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LD_LIBRARY_PATH $prefix/lib/x86_64-unknown-linux-gnu

Build (requires cmake version 3.20.0 or above):

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ git checkout llvmorg-17.0.6
$ cd .. && rm -rf build
$ module load cmake gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/17.0.6 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;openmp;polly;pstl" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind"
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 18.1.8

On the head node, edit /etc/modulefiles/compiler/llvm/18.1.8

#%Module1.0#####################################################################
##
## modules llvm-18.1.8
##
## modulefiles/llvm-18.1.8.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         18.1.8
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LD_LIBRARY_PATH $prefix/lib/x86_64-unknown-linux-gnu

Build (requires cmake version 3.20.0 or above):

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ git checkout llvmorg-18.1.8
$ cd .. && rm -rf build
$ module load cmake gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/18.1.8 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;openmp;polly;pstl" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind"
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 19.1.7

On the head node, edit /etc/modulefiles/compiler/llvm/19.1.7

#%Module1.0#####################################################################
##
## modules llvm-19.1.7
##
## modulefiles/llvm-19.1.7.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         19.1.7
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LD_LIBRARY_PATH $prefix/lib/x86_64-unknown-linux-gnu

Build (requires cmake version 3.20.0 or above):

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ git checkout llvmorg-19.1.7
$ cd .. && rm -rf build
$ module load cmake gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/19.1.7 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;openmp;polly;pstl" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" \
-DLLVM_INCLUDE_TESTS=NO
$ ninja -j24 -C build
$ ninja -j1 -C build install

Version 20.1.3

On the head node, edit /etc/modulefiles/compiler/llvm/20.1.3

#%Module1.0#####################################################################
##
## modules llvm-20.1.3
##
## modulefiles/llvm-20.1.3.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the LLVM Compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the LLVM Compiler environment"

# for Tcl script use only
set     package         llvm
set     version         20.1.3
set     prefix          /opt/$package/$version


setenv          CC              $prefix/bin/clang
setenv          OMPI_CC         $prefix/bin/clang
setenv          MPICH_CC        $prefix/bin/clang
setenv          CXX             $prefix/bin/clang++
setenv          OMPI_CXX        $prefix/bin/clang++
setenv          MPICH_CXX       $prefix/bin/clang++
setenv          CXXFLAGS        --stdlib=libc++
setenv          OMPI_CXXFLAGS   --stdlib=libc++
setenv          MPICH_CXXFLAGS  --stdlib=libc++
setenv          LDFLAGS         -fuse-ld=lld
setenv          OMPI_LDFLAGS    -fuse-ld=lld
setenv          MPICH_LDFLAGS   -fuse-ld=lld
prepend-path    PATH            $prefix/bin
prepend-path    MANPATH         $prefix/share/man
prepend-path    LD_LIBRARY_PATH $prefix/lib
prepend-path    LD_LIBRARY_PATH $prefix/lib/x86_64-unknown-linux-gnu

Build (requires cmake version 3.20.0 or above):

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ git checkout llvmorg-20.1.3
$ cd .. && rm -rf build
$ module load cmake gcc/9.5.0
$ cmake -G Ninja -S llvm-project/llvm -B build \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/20.1.3 \
-DCMAKE_CXX_STANDARD=17 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON \
-DLLVM_TARGET_ARCH:STRING=host -DLLVM_TARGETS_TO_BUILD:STRING=X86 \
-DLLVM_ENABLE_LTO=Off -DLLVM_PARALLEL_COMPILE_JOBS:STRING=24 -DLLVM_PARALLEL_LINK_JOBS:STRING=24 \
-DLLVM_ENABLE_LIBCXX=On -DLLVM_STATIC_LINK_CXX_STDLIB=On -DBUILD_SHARED_LIBS=Off \
-DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld;compiler-rt;openmp;polly;pstl" \
-DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" \
-DLLVM_INCLUDE_TESTS=NO
$ ninja -j24 -C build
$ ninja -j1 -C build install

Note that LLVM_INCLUDE_TESTS=NO is added compared to "llvm-18.1.8" to avoid the error on the minimum python version requirement.

Go

See https://golang.org/doc/install.

Update MODULEPATH as described in Environment Modules. Edit /etc/modulefiles/compiler/go/1.18:

#%Module1.0#####################################################################
##
## modules go-1.18
##
## modulefiles/go-1.18.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - loads the GO compiler environment"
puts stderr "\n\tVersion $version\n"
}

module-whatis   "loads the GO environment"

# for Tcl script use only
set     package         go
set     version         1.18
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/bin

See also installing Singularity.


Misc.

Boost

Version 1.82.0

Note that the compiled binaries will likely be dependent on the version of the toolset used.

$ mkdir boost && cd boost
$ wget https://boostorg.jfrog.io/artifactory/main/release/1.82.0/source/boost_1_82_0.tar.bz2
$ bzcat -v boost_1_82_0.tar.bz2 | tar -x
$ cd boost_1_82_0
$ module load mvapich2/2.3.7 llvm/16.0.4
$ ./bootstrap.sh --with-toolset=clang --prefix=$PWD/../local --with-libraries=
$ echo "using mpi : mpicxx ;" >> project-config.jam
$ ./b2 -j8 -d1 --build-dir=$PWD/../build --layout=system
$ ./b2 --build-dir=$PWD/../build --layout=system install

Version 1.83.0

Note that the compiled binaries will likely be dependent on the version of the toolset used.

$ mkdir boost && cd boost
$ wget https://boostorg.jfrog.io/artifactory/main/release/1.83.0/source/boost_1_83_0.tar.bz2
$ bzcat -v boost_1_83_0.tar.bz2 | tar -x
$ cd boost_1_83_0
$ module load mvapich2/2.3.7 llvm/17.0.6
$ ./bootstrap.sh --with-toolset=clang --prefix=$PWD/../local --with-libraries=
$ echo "using mpi : mpicxx ;" >> project-config.jam
$ ./b2 -j8 -d1 --build-dir=$PWD/../build --layout=system
$ ./b2 --build-dir=$PWD/../build --layout=system install

Version 1.84.0

TODO: Write a modulefile.

$ version="1.84.0"
$ git clone https://github.com/boostorg/boost.git
$ cd boost
$ git checkout "boost-${version}"
$ git submodule update --init --recursive
$ module load cmake ninja mvapich2/2.3.7 llvm/18.1.8
$ cmake -G "Ninja" -B build -S . \
        -DCMAKE_BUILD_TYPE=Release \
        -DCMAKE_INSTALL_PREFIX="${PWD}/local" \
        -DCMAKE_BUILD_WITH_INSTALL_RPATH=On \
        -DBOOST_RUNTIME_LINK=static \
        -DBUILD_SHARED_LIBS=Off \
        -DBOOST_ENABLE_MPI=On \
        -DBOOST_ENABLE_PYTHON=Off \
        -DBOOST_EXCLUDE_LIBRARIES="" \
        -DBOOST_INCLUDE_LIBRARIES=""
$ cmake --build build -- -j24
$ cmake --install build

CMake

Version 3.21.4

On the head node (and using the default compiler shipped with the OS),

wget https://github.com/Kitware/CMake/releases/download/v3.21.4/cmake-3.21.4.tar.gz
tar xzvf cmake-3.21.4.tar.gz
mkdir build && cd build
make && make install

Update MODULEPATH as described in Environment Modules. Edit /etc/modulefiles/misc/cmake/3.21.4:

#%Module1.0#####################################################################
##
## modules cmake-3.21.4
##
## modulefiles/cmake-3.21.4.  Written by Kyungguk Min
##
proc ModulesHelp { } {
    global version package

        puts stderr "\t$package - load environment for cmake."
        puts stderr "\n\tVersion $version\n"
}

module-whatis   "load environment for cmake"

# for Tcl script use only
set     package         cmake
set     version         3.21.4
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/bin

Version 3.28.1

On the head node (and using the default compiler shipped with the OS),

$ version="3.28.1"
$ git clone https://github.com/Kitware/CMake.git
$ cd CMake
$ git checkout "v${version}"
$ ./bootstrap --parallel=24 --prefix="/opt/cmake/${version}"
$ make -j24
$ make install

Update MODULEPATH as described in Environment Modules. Edit /etc/modulefiles/misc/cmake/3.28.1:

#%Module1.0#####################################################################
##
## modules cmake-3.28.1
##
## modulefiles/cmake-3.28.1.  Written by Kyungguk Min
##
proc ModulesHelp { } {
    global version package

        puts stderr "\t$package - load environment for cmake."
        puts stderr "\n\tVersion $version\n"
}

module-whatis   "load environment for cmake"

# for Tcl script use only
set     package         cmake
set     version         3.28.1
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/bin

Version 3.30.5

On the head node (and using the default compiler shipped with the OS),

$ version="3.30.5"
$ git clone https://github.com/Kitware/CMake.git
$ cd CMake
$ git checkout "v${version}"
$ ./bootstrap --parallel=24 --prefix="/opt/cmake/${version}"
$ make -j24
$ make install

Update MODULEPATH as described in Environment Modules. Edit /etc/modulefiles/misc/cmake/3.30.5:

#%Module1.0#####################################################################
##
## modules cmake-3.30.5
##
## modulefiles/cmake-3.30.5.  Written by Kyungguk Min
##
proc ModulesHelp { } {
    global version package

        puts stderr "\t$package - load environment for cmake."
        puts stderr "\n\tVersion $version\n"
}

module-whatis   "load environment for cmake"

# for Tcl script use only
set     package         cmake
set     version         3.30.5
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/bin

Ninja

Version 1.11.1

On the head node (and using the default compiler shipped with the OS),

$ version="1.11.1"
$ git clone https://github.com/ninja-build/ninja.git
$ cd ninja
$ git checkout "v${version}"
$ cmake -B build -S . -G "Unix Makefiles" \
        -DCMAKE_INSTALL_PREFIX="/opt/ninja/${version}" \
        -DCMAKE_BUILD_TYPE=Release
$ cmake --build build -- -j24
$ make -C build install

Update MODULEPATH as described in Environment Modules. Edit /etc/modulefiles/misc/ninja/1.11.1:

#%Module1.0#####################################################################
##
## modules ninja-1.11.1
##
## modulefiles/ninja-1.11.1.  Written by Kyungguk Min
##
proc ModulesHelp { } {
    global version package

        puts stderr "\t$package - load environment for ninja."
        puts stderr "\n\tVersion $version\n"
}

module-whatis   "load environment for ninja"

# for Tcl script use only
set     package         ninja
set     version         1.11.1
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/bin

Singularity

See https://sylabs.io/guides/3.0/user-guide/installation.html.

At this point, only libseccomp-devel and squashfs-tools are missing. On the head node,

$ yum install squashfs-tools libseccomp-devel

Of course on compute nodes, exclude libseccomp-devel.

On the head node

Install Go, set GOPATH and PATH.

Download Singularity repo and checkout the release version (e.g., v3.6.3 or v3.7.2 or v3.9.6).

To compile,

$ ./mconfig --prefix=/opt/singularity/3.9.6 --localstatedir=/var

Update MODULEPATH as described in Environment Modules. Edit /etc/modulefiles/misc/singularity/3.9.6:

#%Module1.0#####################################################################
##
## modules singularity-3.9.6
##
## modulefiles/singularity-3.9.6.  Written by Kyungguk Min
##
proc ModulesHelp { } {
global version package

puts stderr "\t$package - load environment for singularity."
puts stderr "\n\tVersion $version\n"
}

module-whatis   "load environment for singularity"

# for Tcl script use only
set     package         singularity
set     version         3.9.6
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/bin

On the compute nodes

Copy the localstatedir directory structure:

$ for i in {1..10}; do rsync -Cavzn /var/singularity/ node$i:/var/singularity/; done
$ for i in {1..10}; do rsync -Cavz  /var/singularity/ node$i:/var/singularity/; done

Cluster Monitoring

Under construction

zsh

See CentOS 7.x Install ZSH Terminal.

On all node,

$ yum install zsh

The shell configuration should be user-basis.

Wolfram Mathematica

On the head node, run the downloaded Wolfram Mathematica Installer:

$ ./Mathematica_12.3.0_LINUX.sh

Follow the steps captured below:

-------------------------------------------------------------------------------------------------------------------------------------
                                                Wolfram Mathematica 12.3 Installer
-------------------------------------------------------------------------------------------------------------------------------------

Copyright (c) 1988-2021 Wolfram Research, Inc. All rights reserved.

WARNING: Wolfram Mathematica is protected by copyright law and international treaties. Unauthorized reproduction or distribution
may result in severe civil and criminal penalties and will be prosecuted to the maximum extent possible under law.

Enter the installation directory, or press ENTER to select /usr/local/Wolfram/Mathematica/12.3:
> /opt/Wolfram/Mathematica/12.3

Create directory (y/n)?
> y

Now installing...

[**********************************************************************************************************************************]

Type the directory path in which the Wolfram Mathematica script(s) will be created, or press ENTER to select /usr/local/bin:
> /opt/Wolfram/Mathematica/12.3/bin

Create directory (y/n)?
> y


WolframScript allows Wolfram Language code to be run from the command line and from self-executing script files. It is always
available from /opt/Wolfram/Mathematica/12.3/Executables/wolframscript. WolframScript system integration makes the wolframscript
binary accessible from any terminal, and allows .wls script files to be executed by double-clicking them in the file manager.

Install WolframScript system integration? (y/n)
> y

VernierLink provides the ability to control sensors and instruments by Vernier Software & Technology using the Wolfram Language.

Users must have read and write permissions to the devices for this functionality to work. It is possible to configure this
computer so that all users have read and write permissions for Vernier devices. If you do not have any Vernier devices, or if
you wish to control device permissions yourself, it is safe to answer "no" to this question.

Configure computer so that Vernier devices are writable by all users? (y/n)
> n

The installer has detected that SELinux is enabled on this system. The security context of the included libraries may need to
be altered in order for Wolfram Mathematica to function properly.

Should the installer attempt to make this change (y/n)?
> y

WARNING: No Avahi Daemon was detected so some Kernel Discovery features will not be available. You can install Avahi Daemon
using your distribution's package management system.

For Red Hat based distributions, try running (as root):

yum install avahi

Installation complete.

Probably, the avahi won't be needed.

Update MODULEPATH as described in Environment Modules. Edit /etc/modulefiles/misc/mathematica/12.3:

#%Module1.0#####################################################################
##
## modules mathematica-12.3
##
## modulefiles/mathematica-12.3.  Written by Kyungguk Min
##
proc ModulesHelp { } {
    global version package

        puts stderr "\t$package - load environment for mathematica."
        puts stderr "\n\tVersion $version\n"
}

module-whatis   "load environment for mathematica"

# for Tcl script use only
set     package         Wolfram/Mathematica
set     version         12.3
set     prefix          /opt/$package/$version


prepend-path    PATH            $prefix/Executables

libuuid

On the head node,

yum install libuuid libuuid-devel

On the compute nodes,

for i in {0..10}; do ssh node$i "yum -y install libuuid"; done

Security

2FA - Google Authenticator PAM

To add one-time passcode authentication for the SSH login to the head node,

$ yum install google-authenticator

Edit /etc/pam.d/sshd to include (appearing as the first auth)

-auth      required     pam_google_authenticator.so

Execute google-authenticator as a user and follow the instructions.

Limit on Super User Priviliage

Limit on sudoers

For security reasion, it is recommented to hard-code the default editor for visudo. On all nodes, execute EDITOR=nano visudo and add

Defaults editor=/usr/bin/nano

Check if users in group wheel is allowed to run all commands

%wheel  ALL=(ALL)    ALL

Limit on su

To limit su to a user in the wheel group, edit /etc/pam.d/su and uncomment

auth            required        pam_wheel.so use_uid

Disable crontab

In all nodes, create an empty file /etc/cron.allow

$ for i in {0..10}; do ssh node$i "touch /etc/cron.allow"; done