UVic Computing  |  UVic Home
  

Using the Cluster

Running your jobs on the cluster is not, unfortunately, as simple as logging onto Mercury or Mercury2 and executing your program or script. If you do this, your job will simply run on the head node like any other application. This will slow down the system and impact other users, while simultaneously giving you a slower processor and a smaller share of available resources.

How Jobs Run on the Cluster

When a job is properly dispatched to the cluster, it will run on one of the many compute nodes, or blades. These blades have faster processors, two each, of which each job gets one entire, along with half of the memory. All the resources of each blade are evenly split between the two jobs that can be dispatched there.

Jobs are run on the cluster using PBS, a batch job system comprised primarily of policy-based queues, and Maui, a scheduler. Jobs are submitted to PBS, which sorts them into the proper execution queues. Maui periodically grabs the list of jobs from PBS, decides when and where each job should run, and passes that back to PBS.

PBS and Maui decide how jobs should run on the cluster based on a number of factors, but it is at its core a first-in, first-out system. The policies and subtleties beyond that are not necessary for initial use of the cluster.

Preparing and Submitting Jobs

Before a job is submitted to the cluster, it must be defined, via a small script, that sets up its environment and runs the program that does the actual processing. This script is then submitted to PBS. The script may be written in one of a number of languages, including bash, csh and Perl, but will contain a number of directives that tell PBS which resources are required.

For an example, consider the following bash script:

#!/bin/bash
#PBS -l walltime=1:00:00
#PBS -l mem=100mb
#PBS -l ncpus=1
#PBS -W Output_Path=/home/dleske/test/output/
#PBS -W Error_Path=/home/dleske/test/output/

hostname
date

This job definition script starts off with five PBS directives. These directives request a number of resources and specifies where the results should be written. Of the directives, the most important is the first. This is the wall time, which specifies how long the job is expected to run. In your own jobs, be generous here--if the job takes longer than specified with the wall time, it will be terminated. Most users take their best estimates and multiply it by three for the wall time.

Following the directives is, simply, the rest of the script, doing whatever processing the user requires. In this example the job simply returns the hostname and the current time.

Once the script is written and the job is thus prepared, it is submitted to PBS using the qsub command:

mercury$ qsub test.sh
5449.mercury2.uvic.ca

PBS returns a job ID that may then be used to query the status of the job, or to delete it (generally, only the first numerical portion is necessary). When the job is finished processing, its output and standard error will be written to files. If output paths were not specified, as they were in the example, the files will be written to the current working directory.

For more information on qsub and its directives, please consult the qsub manual page, via man qsub at the Mercury2 prompt.

Checking Job Status

While a job is queued or running its status may be viewed by issuing the qstat command to view the state of the queues. By specifying a user ID, jobs owned by that user are listed:

mercury$ qstat -u dleske
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
5449.mercury2 test.sh dleske 0 Q test

Other syntax will cause the command to display the number of jobs assigned to a particular queue:

mercury$ qstat -q hep_
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
hep_ -- -- -- -- 63 166 -- E R
--- ---
63 166

For more information on this command, issue man qstat from the head node.

Another useful command is the showq command. This is a Maui command that shows the state (running, idle, or blocked) of all jobs in the system in the current order they are running or will be dispatched. For running jobs, it lists when they started running and how much time remains in their wall time. For blocked or idle jobs, it lists their wall time and the amount of time they have been queued.

ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
5208 babarpro Running 1 17:26:38 Tue Mar 1 08:19:26
5210 babarpro Running 1 17:31:50 Tue Mar 1 08:24:38
5211 babarpro Running 1 17:32:22 Tue Mar 1 08:25:10
...more jobs...
63 Active Jobs 63 of 72 Processors Active (87.50%)
32 of 36 Nodes Active (88.89%)

IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME

0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
5283 babarpro Idle 1 23:59:00 Mon Feb 28 23:18:19
5284 babarpro Idle 1 23:59:00 Mon Feb 28 23:18:20
...more jobs...

Total Jobs: 229 Active Jobs: 63 Idle Jobs: 0 Blocked Jobs: 166

Generally, when resources are fully utilised and jobs are queued, the queued jobs will be considered "blocked", waiting for resources to become available. "Idle" jobs are those waiting for specific resources, such as a particular node.