Skip Nav

SGI ICE XA (Centennial)
User Guide

Table of Contents

1. Introduction

1.1. Document Scope and Assumptions

This document provides an overview and introduction to the use of the SGI ICE XA (Centennial) located at the ARL DSRC,along with a description of the specific computing environment on Centennial. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:

  • Use of the UNIX operating system
  • Use of an editor (e.g., vi or emacs)
  • Remote usage of computer systems via network or modem access
  • A selected programming language and its related tools and libraries

1.2. Policies to Review

Users are expected to be aware of the following policies for working on Centennial.

1.2.1. Login Node Abuse Policy

Memory or CPU intensive programs running on the login nodes can significantly affect all users of the system. Therefore, only small applications requiring a minimal amount of runtime and memory are allowed on the login nodes. Any job running on the login nodes that affects their overall interactive performance may be unilaterally terminated.

1.2.2. Workspace Purge Policy

The /work1 directory is subject to a 21-day purge policy. A system "scrubber" monitors scratch space utilization, and if available space becomes low, files not accessed within 21 days are subject to removal, although files may remain longer if the space permits. There are no exceptions to this policy.

Note! If it is determined as part of the normal purge cycle that files in your $WORKDIR directory must be deleted, you WILL NOT be notified prior to deletion. You are responsible to monitor your workspace to prevent data loss.

1.3. Obtaining an Account

The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account." If you do not yet have a pIE User Account, please visit HPC Centers: Obtaining An Account and follow the instructions there. Once you have an active pIE User Account, visit the ARL accounts page for instructions on how to request accounts on the ARL DSRC HPC systems. If you need assistance with any part of this process, please contact the HPC Help Desk at accounts@helpdesk.hpc.mil.

1.4. Requesting Assistance

The HPC Help Desk is available to help users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m. - 8:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).

You can contact the ARL DSRC directly in any of the following ways for support services not provided by the HPC Help Desk:

For more detailed contact information, please see our Contact Page.

2. System Configuration

2.1. System Summary

Centennial is an SGI ICE XA. The login and compute nodes are populated with two Intel Broadwell-core processors. Centennial uses the Expanded Data Rate InfiniBand interconnect in a Non-Blocking Fat Tree configuration as its high-speed network for MPI messages and IO traffic. Centennial uses Lustre to manage its parallel file system that targets the disk RAID arrays. Centennial has 1,848 compute nodes that share memory only on the node; memory is not shared across the nodes. Each standard compute node has two 20-core processors (40 cores) with its own Red Hat Enterprise Linux OS, sharing 128 GBytes of memory, with no user-accessible swap space. Each Large-Memory compute node has two 20-core processors (40 cores) with its own Red Hat Enterprise Linux OS, sharing 512 GBytes of memory, with no user-accessible swap space. Each GPU compute node has two 20-core processors (40 cores) and one NVIDIA Tesla K40 GPU with its own Red Hat Enterprise Linux OS, sharing 256 GBytes of memory and 2 TBytes of local solid state storage, with no user-accessible swap space. Centennial is rated at 2.6 peak PFLOPS and has 12 PBytes (formatted) of disk storage.

Centennial is intended to be used as a batch-scheduled HPC system. Its login nodes are not to be used for large computational (memory, IO, long executions) work. All executions that require large amounts of system resources must be sent to the compute nodes by batch job submission.

Node Configuration
Login Nodes Compute Nodes
Standard Memory Large Memory GPU
Accelerated
Total Nodes 24 1,784 32 32
Operating System RHEL
Cores/Node 40 40 + 1 GPU
(1 x 2,880 GPU cores)
Core Type Intel Xeon
E5-2698v4 Broadwell
Intel Xeon
E5-2698v4 Broadwell
+NVIDIA Tesla K40
Core Speed 2.2 GHz
Memory/Node 256 GBytes 128 GBytes 512 GBytes 256 GBytes
Accessible Memory/Node 252 GBytes 124 GBytes 508 GBytes 252 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type Ethernet / InfiniBand
File Systems on Centennial
Path Capacity Type
/p/home
($HOME)
678 TBytesLustre
/p/work1
($WORKDIR)
10.859 PBytesLustre
/p/work2136 TBytesLustre
/p/app
($CSI_HOME)
203 TBytesLustre

2.2. Processors

Centennial uses 2.2-GHz Intel Broadwell processors on its login and compute nodes. There are 2 processors per node, each with 20 cores, for a total of 40 cores per node. In addition, these processors have a last level cache of 45 to 55 MBytes.

2.3. Memory

Centennial uses both shared and distributed memory models. Memory is shared among all the cores on a node, but is not shared among the nodes across the cluster.

Each login node contains 256 GBytes of main memory. All memory and cores on the node are shared among all users who are logged in. Therefore, users should not use excessive amounts of memory at any one time.

Each of the 1,784 compute and 32 GPU nodes contains 124 GBytes of user-accessible shared memory. Each of the 32 Large-Memory nodes contains 508 GBytes of user-accessible shared memory.

2.4. Operating System

The operating system on Centennial is RedHat Linux. The operating system supports 64-bit software.

2.5. File Systems

Centennial has the following file systems available for user storage:

2.5.1. /p/home

This file system is locally mounted from Centennial's Lustre file system. It has a formatted capacity of 678 TBytes. All users have a home directory located on this file system which can be referenced by the environment variable $HOME.

2.5.2. /p/work1

This directory comprises Centennial's scratch file area and is a locally mounted Lustre file system. /p/work1 has a formatted capacity of 10.859 PBytes. All users have a work directory located on /p/work1 which can be referenced by the environment variable $WORKDIR.

2.5.3. /p/work2

This directory is a specialized file system for unique security requirements, and has a formatted capacity of 136 TBytes.

2.5.4. /p/app

All center-managed COTS packages are stored in /p/app. This file system is locally mounted from Centennial's Lustre file system. It has a formatted capacity of 203 TBytes and can be referenced by the environment variable $CSI_HOME. In addition, users may request space in this area under /p/app/unsupported to store user-managed software packages that they wish to make available to other owner-designated users. This area can be referenced by the environment variable $PROJECTS_HOME. To have space allocated in /p/app/unsupported users should submit a request to the ARL DSRC Help Desk. Send e-mail to dsrchelp@arl.army.mil or call 1-800-ARL-1552 (1-800-275-1552) or (410) 278-1700.

2.5.5. /archive

This NFS-mounted file system is accessible from the login nodes on Centennial. Files in this file system are subject to migration to tape and access may be slower due to the overhead of retrieving files from tape. It has a formatted capacity of 16 TBytes with a petascale archival tape storage system. The disk portion of the file system is automatically backed up. Users should migrate all large input and output files to this area for long-term storage. Users should also migrate all important smaller files from their home directory area in /p/home to this area for long-term storage. All users have a directory located on this file system which can be referenced by the environment variable $ARCHIVE_HOME.

2.5.6. /tmp or /var/tmp

Never use /tmp or /var/tmp for temporary storage! These directories are not intended for temporary storage of user data, and abuse of these directories could adversely affect the entire system.

2.5.7. /p/cwfs

This path is directed to the Center-Wide File System (CWFS) which is meant for short-term storage (no longer than 120 days). All users have a directory defined in this file system. The environment variable for this is $CENTER. This is accessible from the unclassified HPC system login nodes. The CWFS has a formatted capacity of 3300 TBytes and is managed by IBM’s Spectrum Scale (formerly GPFS).

2.6. Peak Performance

Centennial is rated at 2.6 peak PFLOPS.

3. Accessing the System

3.1. Kerberos

A Kerberos client kit must be installed on your desktop system to enable you to get a Kerberos ticket. Kerberos is a network authentication tool that provides secure communication by using secret cryptographic keys. Only users with a valid HPCMP Kerberos authentication can gain access to Centennial. More information about installing Kerberos clients on your desktop can be found at HPC Centers: Kerberos & Authentication.

3.2. Logging In

The system host name for the Centennial cluster is centennial.arl.hpc.mil, which will redirect the user to one of twenty-four login nodes. Hostnames and IP addresses to these nodes are available upon request from the HPC Help Desk.

The preferred way to login to Centennial is via ssh, as follows:

% ssh centennial.arl.hpc.mil

Kerberized rlogin is also allowed.

3.3. File Transfers

File transfers to ARL DSRC systems (except for those to the local archive server) must be performed using the following tools: scp, mpscp, ftp, and sftp.

Windows users may use a graphical file transfer protocol (ftp) client such as FileZilla.

4. User Environment

4.1. User Directories

4.1.1. Home Directory

When you log on to Centennial, you will be placed in your home directory, /p/home/username. The environment variable $HOME is automatically set for you and refers to this directory. $HOME is visible to both the login and compute nodes, and may be used to store small user files, but it has limited capacity and is not backed up on a daily basis and therefore should not be used for long-term storage.

4.1.2. Work Directory

The path for your working directory on Centennial's scratch file system is /p/work1/username. The environment variable $WORKDIR is automatically set for you and refers to this directory. $WORKDIR is visible to both the login and compute nodes, and should be used for temporary storage of active data related to your batch jobs.

Note: Although the $WORKDIR environment variable is automatically set for you, the directory itself is not created. You can create your $WORKDIR directory as follows:

mkdir $WORKDIR

The scratch file system provides 1.4 PBytes of formatted disk space. This space is not backed up, however, and is subject to a purge policy.

REMEMBER: This file system is considered volatile working space. You are responsible for archiving any data you wish to preserve. To prevent your data from being "scrubbed," you should copy files that you want to keep into your /home directory (see below) for long-term storage.

4.1.3. Archive Directory

In addition to $HOME and $WORKDIR, each user is also given a directory on the /archive file system. This file system is visible to the login nodes (not the compute nodes) and is the preferred location for long-term file storage. All users have an area defined in /archive for their use. This area can be accessed using the $ARCHIVE_HOME environment variable. We recommend that you keep large computational files and more frequently accessed files in the $ARCHIVE_HOME directory. We also recommend that any important files located in $HOME should be copied into $ARCHIVE_HOME as well.

Because the compute nodes are unable to see $ARCHIVE_HOME, you will need to pre-stage your input files to your $WORKDIR from a login node before submitting jobs. After jobs complete, you will need to transfer output files from $WORKDIR to $ARCHIVE_HOME from a login node. This may be done manually or through the transfer queue, which executes serial jobs on login nodes.

4.1.4. Center-Wide File System Directory

The Center-Wide File System (CWFS) provides file storage that is accessible from Centennial's login nodes and from the HPC Portal. The CWFS allows for file transfers and other file and directory operations from Centennial using standard Linux commands. Each user has their own directory in the CWFS. The name of your CWFS directory may vary between machines and between centers, but the environment variable $CENTER will always refer to this directory.

The example below shows how to copy a file from your work directory on Centennial to the CWFS ($CENTER).

While logged into Centennial, copy your file from your work directory to the CWFS.

% cp $WORKDIR/filename $CENTER

4.2. Shells

The following shells are available on Centennial: csh, bash, ksh, tcsh, zsh, and sh. To change your default shell, please email a request to require@hpc.mil. Your preferred shell will become your default shell on the Centennial cluster within 1-2 working days.

4.3. Environment Variables

A number of environment variables are provided by default on all HPCMP HPC systems. We encourage you to use these variables in your scripts where possible. Doing so will help to simplify your scripts and reduce portability issues if you ever need to run those scripts on other systems. The following environment variables are common to both the login and batch environments:

Common Environment Variables
Variable Description
$ARCHIVE_HOME Your directory on the archive server.
$ARCHIVE_HOST The host name of the archive server.
$BC_HOST The generic (not node specific) name of the system.
$CC The currently selected C compiler. This variable is automatically updated when a new compiler environment is loaded.
$CENTER Your directory on the Center-Wide File System (CWFS).
$COST_HOME This variable contains the path to the base directory of the default installation of the Common Open Source Tools (COST) installed on a particular compute platform. (See BC policy FY13-01 for COST details.)
$CSI_HOME The directory containing the following list of heavily used application packages: ABAQUS, Accelrys, ANSYS, CFD++, Cobalt, EnSight, Fluent, GASP, Gaussian, LS-DYNA, and MATLAB, formerly known as the Consolidated Software Initiative (CSI) list. Other application software may also be installed here by our staff.
$CXX The currently selected C++ compiler. This variable is automatically updated when a new compiler environment is loaded.
$DAAC_HOME The directory containing DAAC-supported visualization tools: ParaView, VisIt, and EnSight.
$F77 The currently selected Fortran 77 compiler. This variable is automatically updated when a new compiler environment is loaded.
$F90 The currently selected Fortran 90 compiler. This variable is automatically updated when a new compiler environment is loaded.
$HOME Your home directory on the system.
$JAVA_HOME The directory containing the default installation of JAVA.
$KRB5_HOME The directory containing the Kerberos utilities.
$PET_HOME The directory containing the tools formerly installed and maintained by the PET staff. This variable is deprecated and will be removed from the system in the future. Certain tools will be migrated to $COST_HOME, as appropriate.
$PROJECTS_HOME A common directory where group-owned and supported applications and codes may be maintained for use by members of a group. Any project may request a group directory under $PROJECTS_HOME.
$SAMPLES_HOME The Sample Code Repository. This is a collection of sample scripts and codes provided and maintained by our staff to help users learn to write their own scripts. There are a number of ready-to-use scripts for a variety of applications.
$WORKDIR Your work directory on the local temporary file system (i.e., local high-speed disk).

4.4. Modules

Software modules are a very convenient way to set needed environment variables and include necessary directories in your path so commands for particular applications can be found. We strongly encourage you to use modules. For more information on using modules, see the Modules User Guide.

4.5. Archive Usage

Archive storage is provided through the /home NFS-mounted file system. All users are automatically provided a directory under this file system. However, it is only accessible from the login nodes. Since space in a user's login home area in /p/home is limited, all large data files requiring permanent storage should be placed in /home. Also, it is recommended that all important smaller files in /p/home for which a user requires long-term access be copied to /home as well. For more information on using the archive system, see the Archive System User Guide.

4.6. Login Files

When an account is created on Centennial, a default .cshrc, and/or .profile file is placed into your home directory. This file contains the default modules setup to configure modules, PBS and other system defaults. We suggest you customize the following: .cshrc.pers or .profile.pers for your shell with any paths, aliases, or libraries you may need to load. The files should be sourced at the end of your .cshrc and/or .profile file as necessary. For example:

if (-f $HOME/.cshrc.pers) then
source $HOME/.cshrc.pers
endif

If you need to connect to other Kerberized systems within the program, you should use /usr/brl/bin/ssh. If you use Kerberized ssh often, you may want to add an alias in your .cshrc.pers or .profile.pers files in $HOME, as follows:

alias ssh /usr/brl/bin/ssh # .cshrc.pers - csh/tcsh
alias ssh=/usr/brl/bin/ssh # .profile.pers - sh/ksh/bash

Note: the commands krcp, krlogin, and krsh are officially deprecated and will be removed at some point in the future. Users are strongly advised to stop using these three commands as soon as possible.

5. Program Development

5.1. Programming Models

Centennial supports two programming models: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP). A hybrid (MPI/OpenMP) programming model is also supported. MPI is an example of a message- or data-passing model. OpenMP only uses shared memory on a node by spawning threads. And, the hybrid model combines both models.

5.1.1. Message Passing Interface (MPI)

Centennial has two MPI-3.0 standard library suites: SGI MPT and IntelMPI. The modules for these MPI libraries are mpi/sgimpt/x.x.x and mpi/intelmpi/x.x.x.

5.1.2. Open Multi-Processing (OpenMP)

OpenMP is available in Intel's Software Development suite for C, C++, and Fortran. Use the "-openmp" flag.

5.1.3. Hybrid Processing (MPI/OpenMP)

In hybrid processing, all intranode parallelization is accomplished using OpenMP, while all internode parallelization is accomplished using MPI. Typically, there is one MPI task assigned per node, with the number of OpenMP threads assigned to each node set at the number of cores available on the node.

5.2. Available Compilers

Centennial has three compiler suites:

  • Intel
  • PGI
  • GNU

All versions of MPI share a common base set of compilers that are available on both the login and compute nodes.

Common Compiler Commands
Compiler Intel PGI GNU Serial/Parallel
C icc pgcc gcc Serial/Parallel
C++ icc pgcc g++ Serial/Parallel
Fortran 77 ifort pgf77 gfortran Serial/Parallel
Fortran 90 ifort pgf90 gfortran Serial/Parallel

SGI MPT codes are built using the above compiler commands with addition of "-lmpi" option on the link line. The following additional compiler wrapper scripts are used for building IntelMPI codes:

Intel MPI Compiler Wrapper Scripts
Compiler Intel PGI GNU Serial/Parallel
MPI C mpiicc mpicc mpicc Parallel
MPI C++ mpiicc mpicc mpicc Parallel
MPI F77 mpiifort mpif77 mpif77 Parallel
MPI F90 mpiifort mpif90 mpif90 Parallel

To select one of these compilers for use, load its associated module. See Relevant Modules (below) for more details.

5.2.1. Intel C, C++, and Fortran Compiler

Intel's latest compiler suite improves performance for large-memory and F90 applications over the previous version of this product. Intel's latest Fortran compiler, ifort, includes the code-generation and optimization power of the Intel compiler. The standard Intel Fortran compiler tools continue to be available as well. The latest Intel C++ compiler now has a full binary mix and match operability with gcc 5.3 and greater. The compiler also includes support for the gcc Standard Template Library (libstdC++) and allows precompiled headers for Linux compilation.

Several optimizations and tuning options are available for code developed with all Intel compilers. For more information see Code Profiling and Optimization. The table below shows some compiler options that may help with optimization.

Useful Intel Compiler Options
OptionPurpose
-O0 disable optimization
-g create symbols for tracing and debugging
-O1 optimize for speed with no loop unrolling and no increase in code size
-O2 or -default default optimization, optimize for speed with inline intrinsic and loop unrolling
-O3 level -O2 optimization plus memory optimization (allows compiler to alter code)
-ipo interprocedural optimization, inline functions in separate files, partial inlining, dead code elimination, etc.

The following tables contain examples of serial, MPI, and OpenMP compile commands for C, C++, and Fortran.

Example C Compile Commands
Programming ModelCompile Command
Serial icc -O3 my_code.c -o my_code.x
SGIMPT icc -O3 my_code.c -o my_code.x -lmpi
IntelMPI mpiicc -O3 my_code.c -o my_code.x
OpenMP icc -O3 my_code.c -o my_code.x -openmp
Example C++ Compile Commands
Programming ModelCompile Command
Serial icc -O3 my_code.C -o my_code.x
SGIMPT icc -O3 my_code.C -o my_code.x -lmpi
IntelMPI mpiicxx -O3 my_code.C -o my_code.x
OpenMP icc -O3 my_code.C -o my_code.x -openmp
Example Fortran Compile Commands
Programming ModelCompile Command
Serial ifort -O3 my_code.f90 -o my_code.x
SGIMPT ifort -O3 my_code.f90 -o my_code.x -lmpi
IntelMPI mpiifort -O3 my_code.f90 -o my_code.x
OpenMP ifort -O3 my_code.f90 -o my_code.x -openmp

For more information on the Intel compilers, please consult Intel's Software Documentation Library.

5.2.2. PGI C, C++, and Fortran Compiler

The latest versions of the PGI compiler suite is also available to provide compatibility and portability of codes from other systems.

Several optimizations and tuning options are available for code developed with all PGI compilers. The table below shows some compiler options that may help with optimization.

Useful PGI Compiler Options
OptionPurpose
-O0 disable optimization
-g create symbols for tracing and debugging
-O1 optimize for speed with no loop unrolling and no increase in code size
-O2 or -default default optimization, optimize for speed with inline intrinsic and loop unrolling
-O3 level -O2 optimization plus memory optimization (allows compiler to alter code)
-Mipa Enable and specify options for Interprocedural Analysis (IPA)

The following tables contain examples of serial, MPI, and OpenMP compile commands for C, C++, and Fortran.

Example C Compile Commands
Programming ModelCompile Command
Serial pgcc -O3 my_code.c -o my_code.x
SGIMPT pgcc -O3 my_code.c -o my_code.x -lmpi
IntelMPI mpicc -O3 my_code.c -o my_code.x
OpenMP pgcc -O3 my_code.c -o my_code.x -mp
Example C++ Compile Commands
Programming ModelCompile Command
Serial pgc++ -O3 my_code.C -o my_code.x
SGIMPT pgc++ -O3 my_code.C -o my_code.x -lmpi
IntelMPI mpicxx -O3 my_code.C -o my_code.x
OpenMP pgc++ -O3 my_code.C -o my_code.x -mp
Example Fortran Compile Commands
Programming ModelCompile Command
Serial pgf90 -O3 my_code.f90 -o my_code.x
SGIMPT pgf90 -O3 my_code.f90 -o my_code.x -lmpi
IntelMPI mpif90 -O3 my_code.f90 -o my_code.x
OpenMP pgf90 -O3 my_code.f90 -o my_code.x -mp
5.2.3. GNU Compiler

The default GNU compilers are good for compiling utility programs, but are probably not appropriate for computationally intensive applications. It is available without loading a separate module. The primary selling point of using GNU compilers is the compatibility between different architectures. They can be executed using the commands in the table above. For GNU compilers, the "-O" flag is the basic optimization setting.

More GNU compiler information can be found in the GNU gcc 4.8.5 manual.

5.2.4. Centennial Default Compiler and MPI Suite Environment

By default, all users will have the default Intel compiler and SGI MPT modules loaded in their environment at login. Users who wish to have the PGI compiler or a non-default Intel compiler as the default upon login should add the following in their .csh.pers or .profile.pers files:

module unload compiler/intel/x.x.x mpi/sgimpt/x.x
module load compiler/new_compiler/new_version
mpi/new_MPI_suite/new_version

5.3. Relevant Modules

If you compile your own codes or run codes that require compiler/MPI modules that are different then the defaults, you will need to select which compiler and MPI version you want to use. For example:

module load mpi/intelmpi/x.x.x
or
module unload compiler/intel/x.x.x mpi/sgimpt/x.x.x
module load compiler/pgi/x.x mpi/intelmpi/x.x.x

These same module commands should be executed in your batch script before executing your program.

Centennial provides individual modules for each compiler (except for gcc) and MPI version. To see the list of currently available modules use the "module avail" command. You can use any of the available MPI versions with each compiler by pairing them together when you load the modules.

The table below shows the naming convention used for various modules.

Module Naming Conventions
Module Module Name
Intel Compilerscompiler/intel/#.#.#
PGI Compilerscompiler/pgi/#.#
SGI MPT Librarympi/sgimpt/#.#
Intel MPI Librarympi/intelmpi/#.#.#

For more information on using modules, see the Modules User Guide.

5.4. Libraries

5.4.1. BLAS

The Basic Linear Algebra Subprogram (BLAS) library is a set of high quality routines for performing basic vector and matrix operations. There are three levels of BLAS operations:

  • BLAS Level 1: vector-vector operations
  • BLAS Level 2: matrix-vector operations
  • BLAS Level 3: matrix-matrix operations

More information on the BLAS library can be found at http://www.netlib.org/blas.

5.4.2. Intel Math Kernel Library (Intel MKL)

Centennial provides the Intel Math Kernel Library, a set of numerical routines tuned specifically for Intel platform processors and optimized for math, scientific, and engineering applications. The routines, which are available via both FORTRAN and C interfaces, include:

  • LAPACK plus BLAS (Levels 1, 2, and 3)
  • ScaLAPACK plus PBLAS (Levels 1, 2, and 3)
  • Fast Fourier Transform (FFT) routines for single-precision, double-precision, single-precision complex, and double-precision complex data types
  • Discrete Fourier Transforms (DFTs)
  • Fast Math and Fast Vector Library
  • Vector Statistical Library Functions (VSL)
  • Vector Transcendental Math Functions (VML)

The MKL routines are part of the Intel Programming Environment as Intel's MKL is bundled with the Intel Compiler Suite.

Linking to the Intel Math Kernel Libraries can be complex and is beyond the scope of this introductory guide. Documentation explaining the full feature set along with instructions for linking can be found at the Intel Math Kernel Library documentation page.

Intel also makes a link advisor available to assist users with selecting proper linker and compiler options: http://software.intel.com/sites/products/mkl/.

5.4.3. Additional Math Libraries

There is also an extensive set of Math libraries available in the $PET_HOME/MATH directory on Centennial. Information about these libraries may be found on the Baseline Configuration Web site at BC policy FY13-01.

5.5. Debuggers

5.5.1. gdb

The GNU Project Debugger (gdb) is a debugger that works similarly to dbx and can be invoked either with a program for execution or a running process id. To use gdb to debug a program during execution, use:

gdb a.out corefile

To debug a process that is currently executing on this node, use:

gdb a.out pid

For more information, the GDB manual can be found at http://sourceware.org/gdb/current/onlinedocs/gdb.

5.5.2. idb

The Intel Debugger (idb) is a symbolic debugger that implements a stop and examine model to help locate run-time errors in code. It can also attach to running processes to perform kernel debugging, and it has the ability to manage several processes at once as well as multi-threaded applications. To use idb, the code to be debugged must be compiled and linked with the "-Od", "-Oy" and "-Zi" options. By default, idb begins in dbx mode, but can be run in gdb mode by specifying the "-gdb" option. A graphical version of idb can be invoked using the "-gui" option. The Intel Debugger Manual can be found at http://software.intel.com/en-us/articles/intel-fortran-compiler-for-linux-9x-manuals/ .

Note: the user must first load the Intel module to access IDB.

5.5.3. TotalView

TotalView is a debugger that supports threads, MPI, OpenMP, C/C++, and Fortran, mixed-language codes, advanced features like on-demand memory leak detection, other heap allocation debugging features, and the Standard Template Library Viewer (STLView). Unique features like dive, a wide variety of breakpoints, the Message Queue Graph/Visualizer, powerful data analysis, and control at the thread level are also available.

To start TotalView on a PBS job, you need to run an interactive batch job.

To do this:

Check to see how many cores are free using "qview". You should have "ssh -Y centennial.arl.hpc.mil" to get an SSH tunnel to Centennial from your machine, or if you are using Windows, start a X-Server such as Xming or Cygwin and then in PuTTy set the X11 forwarding in PuTTy's SSH - X11.

Once on Centennial's login node, test the SSH tunnel with an "xclock".

To get your X DISPLAY sent from your batch job to your desktop, add the PBS option "-X" to the interactive job request as in the line below:

To get an interactive batch session with allocated compute nodes,

## number of cores needs to be as many or fewer than the number of

## available TotalView TeamPlus licenses.

## Here, for example, if there are 16 or more available licenses, try

> qsub -X -A myproject -l walltime=01:00:00 -q debug -l select=1:ncpus=40:mpiprocs=16 -I

NOTE: You no longer need to use qtunnel or get another Kerberos ticket.

Once the interactive batch session starts...

> cd /p/work1/---/where_work_is

Alternatively,

> cd $WORKDIR/.../where_work_is

Now test if your X11 display works from the PBS mom node:

> xclock

Load your TotalView module:

> module load totalview

Now you can run TotalView on your executable, in this example, mpi_test.x

> mpiexec_mpt -tv -np 16 ./mpi_test.x

5.5.4. DDT

DDT is a debugger that supports threads, MPI, OpenMP, C/C++, and Fortran, Coarray Fortran, UPC, and CUDA. Memory debugging and data visualization are supported for large-scale parallel applications. The Parallel Stack Viewer is a unique way to see the program state of all processes and threads at a glance.

To start DDT on a PBS job, you need to run an interactive batch job.

To do this:

Check to see how many cores are free using "qview".You should have "ssh -Y centennial.arl.hpc.mil" to get an SSH tunnel to Centennial from your machine, or if you are using Windows, start a X-Server such as Xming or Cygwin and then in PuTTy set the X11 forwarding in PuTTy's SSH - X11.

Once on Centennial's login node, test the SSH tunnel with an "xclock".

To get your X DISPLAY sent from your batch job to your desktop, add the PBS option "-X" to the interactive job request as in the line below:

To get an interactive batch session with allocated compute nodes,

## number of cores needs to be as many or fewer than the number of

## available TotalView TeamPlus licenses.

## Here, for example, if there are 16 or more available licenses, try

> qsub -X -A myproject -l walltime=01:00:00 -q debug -l select=1:ncpus=40:mpiprocs=16 -I

NOTE: You no longer need to use qtunnel or get another Kerberos ticket.

Once the interactive batch session starts...

> cd /p/work1/---/where_work_is

Alternatively,

> cd $WORKDIR/.../where_work_is

Now test if your X11 display works from the PBS mom node:

> xclock

Load your module:

> module load ddt

Change the $TMPDIR environment variable from its default to /tmp:

> export TMPDIR=/tmp # Bourne/Korn/Bash shell syntax

or

> setenv TMPDIR /tmp # CSH and TCSH shell syntax

Now you can run ddt on your executable, in this example mpi_test.x:

> ddt -n 16 ./mpi_test.x

Wait for the DDT GUI to open. Two GUI windows should appear and remain - a large Allinea DDT GUI, and then a smaller job execution GUI titled "Run (on rNiMnO)", where rNiMnO is the compute node upon which the interactive job is running.

Check that the application in the top box of "Run" is the correct path and application you are debugging.

Check that the MPI is correct. This information follows the string "Implementation:". If necessary, select either "SGI MPT" exactly or "Intel MPI" exactly. Check that "mpirun" is indicated and that the correct path is stated. For example, if linked against SGI MPT 2.10, then in the "Options" window, brought up either by the "Change" button next to MPI Implementation or else the pull-down menu from "File" in the upper left-hand corner of the "Allinea DDT" primary window, the second box should have a checked check-box and should read "/opt/sgi/mpt/mpt-2.15/bin/mpirun" exactly sans quotes.

In the "Run" window, hit the "Run" button, lower right hand corner.

A new window should come up with 5 panels, and a new smaller box indicating status showing DDT firing up the debugging MPI job. When it completes, you should see the code itself loaded in the top middle panel; a mouse-driven directory tree to the code files in the top left-hand panel; three buttons showing "Locals", "Current Line(s)", and "Current Stack" in the top right-hand panel with the "Current Lines" button selected; and two bottom panels named "Stacks" on the left and "Evaluate" on the right.

Execution should be paused at the top line of the code following the include statements, i.e., main entry line, if in C, or else the first executable statement if in Fortran.

The code panel is interactive and can be used to set break points, etc., to aid in debugging your code. Choose the "Control" drop-down menu for stepping thru the code, adding break points, etc. The buttons in the "Control" menu may be displayed in a row underneath the drop-down menus.

Hit the green arrow button, top left corner inside the new Allinea DDT window to begin execution. Execution should then proceed to the first error encountered and stop there. If no errors are encountered, execution will proceed to the end.

5.6. Code Profiling and Optimization

Profiling is the process of analyzing the execution flow and characteristics of your program to identify sections of code that are likely candidates for optimization, which increases the performance of a program by modifying certain aspects for increased efficiency.

We provide two profiling tools: gprof and codecov to assist you in the profiling process. A basic overview of optimization methods with information about how they may improve the performance of your code can be found in Performance Optimization Methods (below).

5.6.1. gprof

The GNU Project Profiler (gprof) is a profiler that shows how your program is spending its time and which functions calls are made. To profile code using gprof, use the "-pg" option during compilation. For more information, the gprof manual can be found at http://sourceware.org/binutils/docs/gprof/index.html.

5.6.2. codecov

The Intel Code Coverage Tool (codecov) can be used in numerous ways to improve code efficiency and increase application performance. The tool leverages Profile-Guided optimization technology (discussed below). Coverage can be specified in the tool as file-level, function-level or block-level. Another benefit to this tool is the ability to compare the profiles of two application runs to find where the optimizations are making a difference. More detailed information on this tool can be found at http://www.intel.com/software/products/compilers.

5.6.3. Program Development Reminders

If an application is not programmed for distributed memory, then only the cores on a single node can be used. This is limited to 16 cores on Centennial.

Check the utilization of the nodes your application is running on to see if it is taking advantage of all the resources available to it. This can be done by finding the nodes assigned to your job by executing "qstat -f", logging into one of the nodes using the ssh command, and then executing the top command to see how many copies of your executable are being executed on the node.

Keep the system architecture in mind during code development. For instance, if your program requires more memory than is available on a single node, then you will need to parallelize your code so that it can function across multiple nodes.

5.6.4. Performance Optimization Methods

Optimization generally increases compilation time and executable size, and may make debugging difficult. However, it usually produces code that runs significantly faster. The optimizations that you can use will vary depending on your code and the system on which you are running.

Note: Before considering optimization, you should always ensure that your code runs correctly and produces valid output.

In general, there are five main categories of optimization:

  • Global Optimization
  • Loop Optimization
  • Interprocedural Analysis and Optimization(IPA)
  • Function Inlining
  • Profile-Guided Optimizations
Global Optimization

A technique that looks at the program as a whole and may perform any of the following actions:

  • Performed on code over all its basic blocks
  • Performs control-flow and data-flow analysis for an entire program
  • Detects all loops, including those formed by IF and GOTOs statements and performs general optimization.
  • Constant propagation
  • Copy propagation
  • Dead store elimination
  • Global register allocation
  • Invariant code motion
  • Induction variable elimination
Loop Optimization

A technique that focuses on loops (for, while, etc.) in your code and looks for ways to reduce loop iterations or parallelize the loop operations. The following types of actions may be performed:

  • Vectorization - rewrites loops to improve memory access performance. With the Intel compilers, loops can be automatically converted to utilize the MMX/SSE/SSE2/SSE3 instructions and registers if they meet certain criteria.
  • Loop unrolling - (also known as "unwinding") replicates the body of loops to reduce loop branching overhead and provide better opportunities for local optimization.
  • Parallelization - divides loop operations over multiple processors where possible.
Interprocedural Analysis and Optimization (IPA)

A technique that allows the use of information across function call boundaries to perform optimizations that would otherwise be unavailable.

Function Inlining

A technique that seeks to reduce function call and return overhead.

  • Used with functions that are called numerous times from relatively few locations.
  • Allows a function call to be replaced by a copy of the body of that function.
  • May create opportunities for other types of optimization
  • May not be beneficial. Improper use may increase code size and actually result in less efficient code.
Profile-Guided Optimizations

Profile-Guided optimizations are available which allow the compiler to make data driven decisions during compilation on branch predictions, increased parallelism, block ordering, register allocation, function ordering, and more. The build for this option takes about three steps though and uses a representative data set to come up with the optimizations.

For example:

  • Step 1: Instrumentation, Compilation, and Linking

    ifort -prof-gen -prof-dir ${HOME}/profdata -O2 -c a1.f a2.f a3.f
    ifort -o a1 a1.o a2.o a3.o

  • Step 2: Instrumentation Execution

    a1

  • Step 3: Feedback Compilation

    ifort -prof-use -prof-dir ${HOME}/profdata -ipo a1.f a2.f a3.f

6. Batch Scheduling

6.1. Scheduler

The Portable Batch System (PBS) is currently running on Centennial. It schedules jobs and manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. PBS is able to manage both single-processor and multiprocessor jobs. The PBS module is automatically loaded by the Master module on Centennial at login.

6.2. Queue Information

The following table describes the PBS queues available on Centennial:

Queue Descriptions and Limits
Priority Queue
Name
Job
Class
Max Wall
Clock Time
Max Cores
Per Job
Comments
Highest debug Debug 1 Hour N/A User diagnostic jobs
Down Arrow for decreasing priority transfer N/A 48 Hours 1 Data transfer for user jobs
urgent Urgent 96 Hours N/A Designated urgent jobs by DoD HPCMP
staff N/A 368 Hours N/A ARL DSRC staff testing only. System testing and user support.
high High 96 Hours N/A Designated high-priority jobs by DoD HPCMP
frontier Frontier 168 Hours N/A Frontier projects only
cots Standard 96 Hours N/A Abaqus, Fluent, and Cobalt jobs
interactive Standard 12 Hours N/A Interactive jobs
standard-long Standard 200 Hours N/A ARL DSRC permission required
standard Standard 168 Hours N/A Normal user jobs
Lowest background Background 24 Hours N/A User jobs that will not be charged against the project allocation.

6.3. Interactive Logins

When you log in to Centennial, you will be running in an interactive shell on a login node. The login nodes provide login access for Centennial and support such activities as compiling, editing, and general interactive use by all users. Please note the Login Node Abuse policy. The preferred method to run resource intensive executions is to use an interactive batch session.

6.4. Interactive Batch Sessions

An interactive session on a compute node is possible using a proper PBS command line syntax from a login node. Once PBS has scheduled your request on the compute pool, you will be directly logged into a compute node, and this session can last as long as your requested wall time.

To submit an interactive batch job, use the following submission format:

qsub -I -X -l walltime=HH:MM:SS -l select=#_of_nodes:ncpus=40:mpiprocs=40 (line continues...)
-l place=scatter:excl -A proj_id -q interactive -V

Your batch shell request will be placed in the interactive queue and scheduled for execution. This may take a few minutes or a long time depending on the system load. Once your shell starts, you will be logged into the first compute node of the compute nodes that were assigned to your interactive batch job. At this point, you can run or debug applications interactively, execute job scripts, or start executions on the compute nodes you were assigned. The "-X" option enables X-Windows access, so it may be omitted if that functionality is not required for the interactive job.

6.5. Batch Request Submission

PBS batch jobs are submitted via the qsub command. The format of this command is:

qsub [ options ] batch_script_file

qsub options may be specified on the command line or embedded in the batch script file by lines beginning with "#PBS".

For a more thorough discussion of PBS Batch Submission, see the Centennial PBS Guide.

6.6. Batch Resource Directives

A listing of the most common batch Resource Directives is available in the Centennial PBS Guide.

6.7. Launch Commands

There are different commands for launching MPI executables from within a batch job depending on which MPI implementation your script uses.

To launch an SGI MPT executable, mpiexec_mpt command as follows:

mpiexec_mpt -n #_of_MPI_tasks ./mpijob.exe

To launch an IntelMPI executable, use the mpirun command as follows:

mpirun ./mpijob.exe

For OpenMP executables, no launch command is needed.

6.8. Sample Script

The following script is a basic example. More thorough examples are available in the Centennial PBS Guide and in the Sample Code Repository ($SAMPLES_HOME) on Centennial.

#!/bin/csh
#  Specify job name.
#PBS -N myjob

#  Specify queue name.
#PBS -q standard

# select = # of nodes
# ncpus is ALWAYS set to 40!
# mpiprocs is the number of cores on each node to use
# This run will use (select)x(mpiprocs) cores = 4*44=176 cores
#PBS -l select=4:ncpus=40:mpiprocs=40

#  Specify how MPI processes should be distributed across nodes.
#PBS -l place=scatter:excl

#  Specify maximum wall clock time.
#PBS -l walltime=24:00:00

#  Specify Project ID to use. ID may have the form ARLAP96090RAY.
#PBS -A XXXXXXXXXXXXX

#  Specify that environment variables should be passed to master MPI process.
#PBS -V

set JOBID=`echo #PBS_JOBID | cut -f1 d.`

#  Create a temporary working directory within $WORKDIR for this job run.
set TMPD=${WORKDIR}/${JOBID}
mkdir -p $TMPD

# Change directory to submit directory
# and copy executable and input file to scratch space
cd $PBS_O_WORKDIR
cp mpicode.x $TMPD
cp input.dat $TMPD

cd $TMPD

# The following line provides an example of running a code built
#  with the default Intel compiler and SGI MPT MPI Suite.
mpiexec_mpt -n 160 ./mpicode.x > out.dat

# The following three lines provide an example of setting up and running
#  an IntelMPI MPI parallel code built with the Intel compiler.
module unload mpi/sgimpt/2.15
module load compiler/ intel/2017.1.132  mpi/intelmpi/2017.1.132
mpirun ./mpicode.x > out.dat

cp out.dat $PBS_O_WORKDIR
exit

6.9. PBS Commands

The following commands provide the basic functionality for using the PBS batch system:

qsub: Used to submit jobs for batch processing.
qsub [ options ] my_job_script

qstat: Used to check the status of submitted jobs.
qstat PBS_JOBID ## check one job
qstat -u my_user_name ## check all of user's jobs

qdel: Used to kill queued or running jobs.
qdel PBS_JOBID

A more complete list of PBS commands is available in the Centennial PBS Guide.

6.10. Determining Time Remaining in a Batch Job

In batch jobs, knowing the time remaining before the workload management system will kill the job enables the user to write restart files or even prepare input for the next job submission. However, adding such capability to an existing source code requires knowledge to query the workload management system as well as parsing the resulting output to determine the amount of remaining time.

The DoD HPCMP allocated systems now have the library, WLM_TIME, as an easy way to provide the remaining time in the batch job to C, C++, and Fortran programs. The library can be added to your job using the following:

For C:

#include <wlm_time.h>
void wlm_time_left(long int *seconds_left)

For Fortran:

SUBROUTINE WLM_TIME_LEFT(seconds_left)
INTEGER seconds_left

For C++:

extern "C" {
#include <wlm_time.h>
}

For simplicity, wall-clock-time remaining is returned as an integer value of seconds.

To simplify usage, a module file defines the process environment, and a pkg-config metadata file defines the necessary compiler linker options:

For C:

module load wlm_time
$(CC) ctest.c `pkg-config --cflags --libs wlm_time`

For Fortran:

module load wlm_time
$(F90) test.f90 `pkg-config --cflags-only-I --libs wlm_time`

For C++:

module load wlm_time
$(CXX) Ctest.C `pkg-config --cflags --libs wlm_time`

WLM_TIME works currently with PBS. The developers expect that WLM_TIME will continue to provide a uniform interface encapsulating the underlying aspects of the workload management system.

6.11. Advance Reservations

A subset of Centennial's nodes has been set aside for use as part of the Advance Reservation Service (ARS). The ARS allows users to reserve a user-designated number of nodes for a specified number of hours starting at a specific date/time. This service enables users to execute interactive or other time-critical jobs within the batch system environment. The ARS is accessible via most modern web browsers at https://reservation.hpc.mil. Authenticated access is required. The ARS User Guide is available on HPC Centers.

7. Software Resources

7.1. Application Software

All Commercial Off The Shelf (COTS) software packages can be found in the $CSI_HOME (/p/app) directory. A complete listing of software on Centennial with installed versions can be found on our software page. The general rule for all COTS software packages is that the two latest versions will be maintained on our systems. For convenience, modules are also available for most COTS software packages.

7.2. Useful Utilities

The following utilities are available on Centennial:

Useful Utilities
Command Description Usage
archive Perform the basic file-handling function on the archive system. archive put output.tar
node_use Displays memory-use and load-average information for all login nodes of the system on which it is executed. node_use
qpeek Returns the standard output (STDOUT) and standard error (STDERR) messages for any submitted PBS job from the start of execution. qpeek PBS_JOB_ID
qview Lists the status and current usage of all PBS queues on Centennial. "qview -h" shows all the qview options available.
show_queues Lists the status and current usage of all PBS queues on Centennial. show_queues
show_storage Provides quota and usage information for the storage areas in which the user owns data on the current system. show_storage
show_usage Lists the project ID and total hours allocated / used in the current FY for each project you have on Centennial. show_usage
dos2unix Strip DOS end-of-record control characters from a text file. dos2unix myfile

7.3. Sample Code Repository

The Sample Code Repository is a directory that contains examples for COTS batch scripts, building and using serial and parallel programs, data management, and accessing and using serial and parallel math libraries. The $SAMPLES_HOME environment variable contains the path to this area, and is automatically defined in your login environment. Below is a listing of the examples provided in the Sample Code Repository on Centennial.

Sample Code Repository on Centennial
Applications
Application-specific examples; interactive job submit scripts; use of the application name resource; software license use.
Sub-DirectoryDescription
abaqusBasic batch script and input deck for an Abaqus application.
adfBasic batch script and input deck for an ADF application.
ansysBasic batch script and input deck for an ANSYS application.
castepBasic batch script and input deck for a CASTEP application.
cfd++Basic batch script and input deck for a CFD++ application.
cfxBasic batch script and input deck for an ANSYS CFX application.
comsolBasic batch script and input deck for a COMSOL application.
cthBasic batch script and input deck for a CTH application.
dmol3Basic batch script and input deck for a DMOL3 application.
fluentBasic batch script and input deck for a FLUENT (now ACFD) application.
GAMESSauto_submit script and input deck for a GAMESS application.
gaussianInput deck for a GAUSSIAN application and automatic submission script for submitting a Gaussian job.
ls-dynaBasic batch script and input deck for a LS-DYNA application.
lsoptBasic batch script and input deck for an LS-OPT application.
mathematicaBasic batch script and input deck for a MATHEMATICA application.
matlabBasic batch script and sample m file for a MATLAB application.
mesodynBasic batch script and sample m file for a Mesadyn application.
picalcBasic PBS example batch script.
STARCCM+Basic batch script and input deck for a STRACCM+ applicatoin. xpatch Basic batch script and input deck for a Xpatch application.
Data_Management
Archiving and retrieving files; Lustre striping; file searching; $WORKDIR use.
Sub-DirectoryDescription
MPSCP_ExampleDirectory containing a README file giving examples of how to use the mpscp command to transfer files between Excalibur and remote systems.
OST_StripesDescription of how to OST striping to improve disk I/O.
Postprocess_ExampleSample batch script showing how to submit a transfer queue job at the end of your computation job.
Transfer_ExampleSample batch script showing how to stage data out after a job executes using the transfer queue.
Transfer_Queue_with_Archive_CommandsSample directory containing sample batch scripts demonstrating how to use the transfer queue to retrieve input data for a job, chain a job that uses that data to run a parallel computation, then chain that job to another that uses the transfer queue to put the data back in archive for long term storage.
Documentation
User Documentation
Sub-DirectoryDescription
User_Manual_SGI-MPI.pdfCentennial User's Manual
FlexLm
Sample License server software commands
Sub-DirectoryDescription
lmutilSample lmutil command
rlmutilSample rlmutil command
Parallel_Environment
MPI, OpenMP, and hybrid examples; large number of nodes jobs; single-core jobs; large memory jobs; running multiple applications within a single batch job.
Sub-DirectoryDescription
HybridSimple MPI/OpenMP hybrid example and batch script.
Large_JobsA sample PBS job script is provided for you to copy for use to execute large jobs, those requiring more than 11,000 cores or 305 nodes.
Large_Memory_JobsA sample large-memory jobs script.
MPI_PBS_ExamplesSample PBS job scripts for SGI MPT and IntelMPI codes built with the Intel and GNU compilers.
Multiple_Jobs_per_NodeSample PBS job scripts for running multiple jobs on the same node.
OpenMPA simple Open MP example and batch script.
Programming
Basic code compilation; debugging; use of library files; static vs. dynamic linking; Makefiles; Endian conversion.
Sub-DirectoryDescription
COMPILE_INFOProvides common options for Compiling and Configure
Core_FilesProvides Examples of three core file viewers.
DDT_ExampleUsing DDT to debug a small example code in an interactive batch job.
Endian_ConversionInstructions on how to manage data created on a machine with different Endian format.
GPU_ExamplesSeveral examples demonstrating use of system tools, compilation techniques, and PBS scripts to generate and execute code using the GPGPU accelerators on Excalibur.
Intel_MPI_ExampleSimple example of how to run a job built with IntelMPI.
ITAC_ExampleExample for using Intel Trace Analyzer and Collector.
Large_Memory_ExampleSimple example of how to run a job using Large-Memory nodes.
Memory_UsageSample build and script that shows how to determine the amount of memory being used by a process.
MKL_BLACS_ExampleExample of how to build and run codes built using the INTEL MKL BLACS libraries
MKL_ScaLAPACK_ExampleExample of how to build and run codes built using the INTEL MKL ScaLAPACK libraries.
MPI_CompilationExamples of how to build SGI MPT, IntelMPI and OpenMPI code.
Open_Files_LimitsThis example discusses the maximum number of simultaneously open files an MPI process may have, and how to adjust the appropriate settings in a PBS job.
SO_CompileSimple example of creating a SO (Shared Object) library and using it to compile and running against it on the compute nodes.
Timers_FortranSerial Timers using Fortran Intrinsics f77 and f90/95.
Totalview_ExampleInstructions on how to use the TotalView debugger to debug MPI code.
VTuneExample to use Intel Vtune
User_Environment
Use of modules; customizing the login environment.
Sub-DirectoryDescription
Module_Swap_ExampleInstructions for using module swap command.
Workload_Management
Basic batch scripting; use of the transfer queue; job arrays; job dependencies; Secure Remote Desktop; job monitoring.
Sub-DirectoryDescription
BatchScript_ExampleBasic PBS batch script example.
Core_Info_ExampleSample code for generating the MPI process/core or OpenMP thread/core associativity in compute jobs.
DocumentationMicrosoft Word version of the PBS User's Guide.
Hybrid_ExampleSimple MPI/OpenMP hybrid example and batch script.
Interactive_ExampleInstructions on how to submit an interactive PBS job.
Job_Array_ExampleInstructions and example job script for using job arrays.
Job_Dependencies_ExampleExample scripts on how to use PBS job dependencie

8. Links to Vendor Documentation

SGI Home: http://www.sgi.com
SGI ICE XA: https://www.sgi.com/products/servers/ice

RedHat Home: http://www.redhat.com

GNU Home: http://www.gnu.org
GNU Compiler: http://gcc.gnu.org

Intel Home: http://www.intel.com
Intel Broadwell Processor:
http://ark.intel.com/products/codename/38530/Broadwell
Intel Software Documentation Library: http://software.intel.com/en-us/articles/intel-software-technical-documentation

PGI Home: http://www.pgroup.com
PGI Compiler Documentation: http://www.pgroup.com/resources/docs.php