Skip Nav

ARL DSRC Introductory Guide

Table of Contents

1. Introduction

This document provides a system overview and introduction on usage of the ARL DSRC HPCMP unclassified HPC systems. There are two unclassified systems available for user access, Excalibur and Centennial. For information about restricted resources, see the Restricted Systems page.

2. Accessing ARL DSRC Systems

The ARL DSRC unclassified systems are accessible through DREN to all active customers via standard Kerberos commands. Customers may access any of the interactive login nodes on Excalibur and Centennial with Kerberized versions of rlogin and ssh.

For information about restricted resources, see the Restricted Systems page.

The login nodes are available for users to edit and submit jobs, and review completed job output.

File transfers between local and remote systems can be accomplished via the scp or the mpscp commands.

Kerberos binaries can be downloaded from HPC Centers: Kerberos & Authentication.

3. Obtaining an Account

The process of getting an account on the HPC systems at any of the DSRCs begins with getting an account on the HPCMP Portal to the Information Environment, commonly called a "pIE User Account." If you do not yet have a pIE User Account, please visit HPC Centers: Obtaining An Account and follow the instructions there. Once you have an active pIE User Account, visit the ARL accounts page for instructions on how to request accounts on the ARL DSRC HPC systems. If you need assistance with any part of this process, please contact the HPC Help Desk at accounts@helpdesk.hpc.mil.

4. System Overviews

4.1. Unclassified Systems

Centennial

centennial.arl.hpc.mil
SGI ICE XA - 2.6 PFLOPS
Login Nodes Compute Nodes
Standard Memory Large Memory GPU
Accelerated
Total Nodes 24 1,784 32 32
Operating System RHEL
Cores/Node 40 40 + 1 GPU
(1 x 2,880 GPU cores)
Core Type Intel Xeon
E5-2698v4 Broadwell
Intel Xeon
E5-2698v4 Broadwell
+NVIDIA Tesla K40
Core Speed 2.2 GHz
Memory/Node 256 GBytes 128 GBytes 512 GBytes 256 GBytes
Accessible Memory/Node 252 GBytes 124 GBytes 508 GBytes 252 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type Ethernet / InfiniBand

Excalibur

excalibur.arl.hpc.mil
Cray XC40 - 3.77 PFLOPS
Login Nodes Compute Nodes
Standard Memory Large Memory GPU
Accelerated
Total Nodes 16 3,098 32 32
Operating System SLES Cray Linux Environment
Cores/Node 32 32 + 1 GPU
(1 x 2,880 GPU cores)
Core Type Intel Xeon E5-2698 v3 Intel Xeon E5-2698 v3
+NVIDIA Tesla K40
Core Speed 2.3 GHz 2.5 GHz
Memory/Node 256 GBytes 128 GBytes 512 GBytes 256 GBytes
+12 GBytes
Accessible Memory/Node 2 GBytes 126 GBytes 508 GBytes 252 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type Ethernet / InfiniBand Cray Aries / Dragonfly Ethernet / InfiniBand

4.2. Restricted Systems

For information about restricted resources, see the Restricted Systems page.

5. Login Files

When an account is created at the ARL DSRC, a default .cshrc, and/or .profile file is placed into your home directory. This file contains the default modules setup to configure modules, PBS and other system defaults. We suggest you customize the following: .cshrc.pers or .profile.pers for your shell with any paths, aliases, or libraries you may need to load. The files should be sourced at the end of your .cshrc and/or .profile file as necessary. For example:

if (-f $HOME/.cshrc.pers) then
source $HOME/.cshrc.pers
endif

If you need to connect to other Kerberized systems within the program, you should use krlogin or /usr/brl/bin/ssh. If you use Kerberized ssh often, you may want to add an alias in your .cshrc.pers or .profile.pers files in $HOME, as follows:

alias ssh /usr/brl/bin/ssh # .cshrc.pers - csh/tcsh
alias ssh=/usr/brl/bin/ssh # .profile.pers - sh/ksh/bash

6. File Systems

All users will be given a new home directory on the login and compute nodes, named /p/home/username. When you login, you will automatically be placed in your local /p/home home directory. In addition, all users are given accounts on the archive storage system, /archive/service/username . Users are also given space on the center-wide file system, /p/cwfs/username. While the login nodes of ARL DSRC systems have the same connectivity (over NFS) to the /home, /archive, and /p/cwfs file systems, the compute nodes do not. When your job script runs on a compute node, it will not be able to access your /home, /archive, or /p/cwfs directories. Therefore, you will need to pre-stage your input files from a login node to the scratch area, /work or /p/work1 ($WORKDIR), before submitting your jobs. Similarly, output files will need to be staged back to the archive system and center-wide file system. This may be done manually, or through the "transfer" PBS queue which runs serial jobs on a login node. It is recommended that all important files in your /p/home home area be copied to /archive or /home as well.

The scratch file system, /p/work1 or /work, should be used for active temporary data storage and batch processing. A system "scrubber" will monitor utilization of the scratch space and files not accessed within 21 days on the scratch file system are subject to removal, but may remain longer if the space permits. There are no exceptions to this policy. Customers who wish to keep files for long-term storage should copy files selected for retention back into their /home or /archive directories to avoid data loss by the "scrubber." Customers are responsible for archiving files from the scratch file systems. This file system is considered volatile working storage and no automated backups will be performed.

Please do not use /tmp or /var/tmp for temporary storage!

7. Software

For a complete list of all the application, programming, system tools and scientific visualization software available at the ARL DSRC see our Software List.

8. Batch Processing

Batch queuing systems are used to control access to the compute nodes of large scale clusters, such as the systems deployed at the ARL DSRC. Without a queuing system, users could overload systems, resulting in tremendous performance degradation. It is the job of a queuing system to regulate processing on each system to maximize job throughput while not overloading the system. The queuing system will run your job as soon as it can while honoring the following:

  • Meets your resource requests
  • Does not overload systems
  • Runs higher priority jobs first

Batch jobs for all HPC systems at the ARL DSRC are submitted utilizing the PBS Professional queuing system. The PBS module should be automatically loaded at startup/login, allowing you access to the PBS commands.

For information on using PBS, please see the appropriate system user guide or system PBS guide listed at the end of this page.

9. Advance Reservation Service (ARS)

A subset of all Allocated Systems' nodes has been set aside for use as part of the Advance Reservation Service (ARS). The ARS allows users to reserve a user-designated number of nodes for a specified number of hours starting at a specific date/time. This service enables users to execute interactive or other time-critical jobs within the batch system environment. The ARS is accessible via most modern web browsers at https://reservation.hpc.mil. Authenticated access is required. The ARS User Guide is available on HPC Centers.

10. Contacting the HPC Help Desk

Questions, comments, and suggestions are always welcome. If you have questions about this guide, or any of the ARL DSRC's assets, please contact the DoD HPCMP's HPC Help Desk in any of the following ways: