UVic Computing  |  UVic Home
  

Using Euler

Accessing the Cluster

The cluster can be accessed via ssh at euler.uvic.ca. An email request to sysadmin@uvic.ca should be made in order to have an account created on Euler. The Euler account will use your UVic NetLink-ID credentials, this being said a valid NetLink-ID will be required to gain access to Euler.

What Nodes Are Available?

Finding out what nodes are available in the cluster you can use the following command:

$ pbsnodes -a 

To list any OFFLINE or DOWN nodes:

$ pbsnodes -l 

Compiling Programs

Software builds can be performed on the head node.


The following packages are available on Euler cluster:

  1. TexLive /usr/local/texlive/2009/bin/x86_64-linux
  2. python 2.5 /opt/sage/local/bin/python2.5
  3. openMPI-1.3.3, PATH=/opt/openmpi-1.3.3
  4. gcc-4.1.2
  5. gcc-4.4.0
  6. gcc-c++-4.1.2
  7. gcc4-gfortran-4.1.2-42.el5
  8. gcc-g77-3.4.6
  9. gcc
  10. gsl version 1.12
  11. matlab R2011a PATH=/opt/matlab
  12. R v2.7.0 - a language for data analysis and graphics
  13. R v2.11.1 /usr/local/R/bin/R
  14. Java openjdk version 1.7.0
  15. R packages, papply, snow, rjags, R2jags, and coda
  16. sage-3.4
  17. ncl Version 5.0.0   ( export  NCARG_ROOT="/" )
  18. netcdf-3.6.2
  19. Intel fortran compiler v. 11( PATH = /opt/intel/Compiler/11.0/074/bin/intel64/ifort)
  20. Intel C compiler v 11   ( PATH = /opt/intel/Compiler/11.0/074/bin/intel64/icc )
  21. emacs
  22. maple version 12
  23. ess-5.3.6 - in ~/.emacs file add the following line (load "/usr/local/share/emacs/site-lisp/ess-site")
  24. JAGS /opt/jags
  25. jasper 1.900.1 /opt/jasper
  26. openbugs 3.2.1 rev 781  /opt/openbugs
  27. Wine 1.0.1
  28. octave 3.0.5  /opt/octave

MPI compiler wrappers:

c

mpicc

c++

mpiCC
mpicxx

fortran

mpif77
mpif90

# 
# "mpicc" adds the directories for the include and lib files. Hence,
# -I and -L for MPI stuff is not necessary
#

CC = mpicc

#
# Modify TOPDIR if you use your own include files and library files

#

PROGRAM = mpi_pi_test # name of the binary

SRCS = mpi_pi_test.c # source file

OBJS = $(SRCS:.c=.o) # object file


#
# Targets
#

default: all

all: $(PROGRAM)

$(PROGRAM): $(OBJS)
$(CC) $(OBJS) -o $(PROGRAM) $(LDFLAGS)

clean:
/bin/rm -f $(OBJS) $(PROGRAM)

Submitting Jobs

Parallel Jobs (using Message Passing Interface)

All jobs will be submitted into the cluster using euler.uvic.ca. Currently a single queue exists, called 'batch' into which jobs will be submitted onto Euler. Note that the default walltime for each job is set to one hour, after one hour or run time your job will be terminated regardless of it's state. To alter this behaviour you can set the walltime as in the example below.

In order to submit jobs onto Euler use the qsub command, an example of running a job:

$ qsub test.sh

The contents of test.sh command file are shown below. The Torque options are listed at the top of the command file are declared with the #PBS directive. This command file will run an MPI job on 16 compute nodes using 2 CPUs per nodes thus totaling 32processors and a walltime of 12 hours. The -wdir option will make use of a scratch directory on each compute node.

#!/bin/bash
#PBS -l nodes=16:ppn=2,walltime=12:00:00
#PBS -N MPI-testing
#PBS -j oe
 #mpirun it
/opt/openmpi-1.3.3/bin/mpiexec -v -wdir /scr /opt/bin/mpitest

For more information about MPI

http://www.open-mpi.org/

 

Alternatively #PBS -l nodes=32  would better distribute the available cpu's without a dependency on having exactly  16 nodes with 2 processors.

 

 

Serial Jobs

Serial jobs can be submitted on to Euler. Below is an example pbs file titled serial.pbs that shows how to submit a serial job onto Euler. This job will run a copy of the UNIX command uname and then sleep for ten seconds on a single compute node. The output and error messages will be sent to the specified directory.

There is scratch space on each node that can be used to spool temporary data while running a job. To access this space you can point to /scr


#!/bin/bash
#PBS -l nodes=1:ppn=1,mem=200mb
#PBS -N test_job
#PBS -o /home1l/homedir/out/${PBS_JOBID%%.*}.out
#PBS -e /home1l/homedir/out/${PBS_JOBID%%.*}.err
#run it /bin/uname -a
/bin/sleep 10

Interactive Jobs

Interactive jobs can be submitted onto Euler. Below is an example submission that will allocate one node for one minute. The walltime can be adjusted per requirements. Request a node with 200mb of memory.

$ qsub -I -l nodes=1:ppn=1,mem=200mb,walltime=00:01:00 -N interactive_job -q batch

qsub examples

 

Usage

Description

> qsub -l nodes=12

request 12 nodes of any type

 > qsub -l nodes=2:server+14 

request 2 "server" nodes and 14 other nodes (a total of 16) - this specifies two node_specs, "2:server" and "14"

> qsub -l nodes=server:hippi+10:noserver+3:bigmem:hippi

request (a) 1 node that is a "server" and has a "hippi" interface, (b) 10 nodes that are not servers, and (c) 3 nodes that have a large amount of memory an have hippi

> qsub -l nodes=r01u03+r01u04+r01u17

request 3 specific nodes by hostname

> qsub -l nodes=4:ppn=2,walltime=12:00:00

request 2 processors on each of four nodes

> qsub -l nodes=1:ppn=4,walltime=10:00:00

request 4 processors on one node

 > qsub -l nodes=r01u04,mem=200mb

This job will wait until r01u04 has 200mb free

 

Please note that all jobs have a default walltime of 1 hour. To alter this behaviour you must specify the walltime as in some of the examples above. Walltime is specified in

hour:minutes:seconds

 

 

Monitoring Jobs

  • Using qstat

    $ qstat -u hbr



    euler.uvic.ca:

    Req'd Req'd Elap

    Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time

    -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----

    70.euler.uvic.ca hbr batch test_job 17226 1 -- -- -1:00 R --


    -bash-3.00$ qstat -q
       
     server: euler.cluster.uvic.ca



    Queue Memory CPU Time Walltime Node Run Que Lm State

    ---------------- ------ -------- -------- ---- --- --- -- -----

    batch -- -- -- -- 0 0 -- E R

    --- ---

    1 0
       
      

  • Using showq (running the job two times)

    $ showq

    ACTIVE JOBS--------------------

    JOBNAME USERNAME STATE PROC REMAINING STARTTIME



    72 hbr Running 1 1:00:00 Fri Nov 16 08:34:16

    73 hbr Running 1 1:00:00 Fri Nov 16 08:34:16
    2 Active Job 2 of 88 Processors Active (3.41%)

    1 of 11 Nodes Active (9.09%)



    IDLE JOBS----------------------

    JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
         



        0 Idle Jobs
           
     BLOCKED JOBS----------------

    JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME





    Total Jobs: 2 Active Jobs: 2 Idle Jobs: 0 Blocked Jobs:
      • Job output via Torque results files, they will exist in the directory from which you started your job, where job_name.o?? for STDOUT and job_name.e?? for STDERR. If you specify the PBS -o and PBS -e in you script you can change this behaviour.

    $ ls  73.*

    -rw------- 1 hbr hbr 3361 Nov 16 09:46 73.out

    -rw------- 1 hbr hbr 42 Nov 16 09:46 73.err


  • Using checkjob

      $ checkjob -v jobnumber

      checking job 75 (RM job '75.euler.uvic.ca')


      State: Running

      Creds: user:hbr group:hbr class:batch qos:DEFAULT

      WallTime: 00:00:00 of 1:00:00

      SubmitTime: Fri Nov 16 09:56:47

      (Time Queued Total: 00:00:05 Eligible: 00:00:05)


      StartTime: Fri Nov 16 09:56:52

      Total Tasks: 1


      Req[0] TaskCount: 1 Partition: DEFAULT
          Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
      Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
    Exec:  ''  ExecSize: 0  ImageSize: 0

    Dedicated Resources Per Task: PROCS: 1

    Utilized Resources Per Task: [NONE]

    Avg Util Resources Per Task: [NONE]

    Max Util Resources Per Task: [NONE]

    NodeAccess: SHARED

    NodeCount: 0

    Allocated Nodes:

    [r01u05:1]

    Task Distribution: r01u05



    IWD: [NONE] Executable: [NONE]

    Bypass: 0 StartCount: 1

    PartitionMask: [ALL]

    Flags: HOSTLIST RESTARTABLE

    HostList:

    [r01u05:1]

    Reservation '75' (00:00:00 -> 1:00:00 Duration: 1:00:00)

    PE: 1.00 StartPriority: 1

Job Control

  • Canceling a job

      qdel jobnumber

Useful Links

 

Matlab

    Matlab has been setup on each node, there is a 32 processor license for the Distributed Computing Engine.

Matlab in Serial

    A sample PBS script that shows a 1-CPU Matlab job looks like this:

    #PBS -N matlab1
    #PBS -q batch
    #PBS -l nodes=1,walltime=24:00:00
    #PBS -M your-email-address
    #PBS -m abe
    #PBS -V
    #
    echo "I ran on:"
    cat $PBS_NODEFILE
    #
    #cd to your execution directory first
    cd /home/your-user-name/your-matlab-directory
    #
    matlab < file.m
    #
    ------------------ file file.m ---------------------------------

    Plotting in Batch Sample

    y=1:100;
    plot(y,y)
    print -depsc yvsy
    %save the plot as yvsy.eps (postscript color)
    %run: 'help print' for options (-djpeg -dpng etc)
    exit
    %matlab must exit in all batch jobs when finished
    ---------------------------------------------------------------

    Note that your Matlab job will not have any access to a graphical display or terminal, so all of your input must be handled by one .m file (which can call other .m files) and all of your output must be written to either a file or the screen (standard out) by Matlab or saved with the print command.

 

 

 

Resources for Matlab/DCT

 http://biowulf.nih.gov/apps/matlabdce.html

http://cac.engin.umich.edu/resources/software/matlabdct/


Tutorial by  Andre Kerstens

rcf.uvic.ca/pdfs/tutorial_andre_kerstens_033005_2-1.pdf

Introduction to Parallel Programming and MPI

https://computing.llnl.gov/tutorials/parallel_comp/

https://computing.llnl.gov/tutorials/mpi/

 

Parallel Programming with R, rmpi and snow (papply)

http://cran.r-project.org/src/contrib/Descriptions/snow.html

http://cran.r-project.org/src/contrib/Descriptions/Rmpi.html

 http://cac.engin.umich.edu/resources/software/snow.html

 http://ace.acadiau.ca/math/ACMMaC/software/papply/

 

Running MAPLE

 

http://www.cae.tntech.edu/help/parallel/cluster-job-example-maple