});

Compute Canada Notes

slurm general

A clarification between srun, sbatch, salloc
The srun is for single command execute.
The sbatch will let slurm take care of the standard output, suitable for long-term tasks
The salloc allocate node (CPU, GPU etc.) for interactive operation.

Official Machine Learing Courses

Deploy your virtual environment and submit a job

NB: load does not mean package installed, in jupyter, you have to run pip3 install --no-index $pacakge_name.

Job submission requires a bash script file.
A example script is shown:

Then submit the script by:

python package and environment

For cedar, required python package may need to be installed by

Another thing is Graham can be the most responsive cluster since it is held by Univ of Waterloo.

Note the cluster distribution. Cedar is in BC, and Graham also support Jupyter now.

Real-time python output to file

For server running code. Recommend real-time output

file storage

{Official Doc Link}
Scratch has 20TB but file older than 60 days will be purged. Project has 1TB and don’t get purged. Best implementation is do intensive read-write on scratch and back-up on project. (For search index purpose, leave expired, expiry, expiring here.)
A email will be send to user before purge
To locate the files in purge warning:

file transfer

  1. To dropbox: https://riptutorial.com/dropbox-api
  2. Nextcloud by Compute Canada: https://docs.alliancecan.ca/wiki/Nextcloud
    Go to bottom there are 2 command lines for this.

  1. Cloud Local: https://docs.alliancecan.ca/wiki/Transferring_data#From_the_World_Wide_Web
    baiscally using sftp

matlab

matlab on the Compute Canada cluster requires:

Since the RAM-greedy nature of MatLab, salloc is usually used ahead of running it.

check submitted slurm requests:

It seems jobs can use complete node (node mode) or partial node (task-mode).
{Ref-Official Doc}
Also can do job array for sequential jobs, or parallel jobs with MPI.
Here is a PDF introducing the job submission and scheduling regulations:
{Link}

An example alloc request is below:

tips on Graham

Official doc
Includes many handy customize functions.
https://wiki.math.uwaterloo.ca/fluidswiki/index.php?title=Graham_Tips#Virtual_Desktop

request GPU

Official doc {Link} gives example of:

  1. gpu one 1 node
  2. task-orientated multi-gpu
  3. MPI muliti-threading

An example of GPU request is shown below:

I am using single node requesting. But the task-orientated request and multi-threading are alluring. May do task-orientated soon.

screen

{Doc}
NB: window and region is the display region, screen or bash is the running bash

  • start with ctrl+a to command mode
  • vertical split region: |;
  • horizontal split region: h
  • canel split region: Q
  • switch region: Tab
  • activate or switch bash ctrl+a
  • change bash title A
  • switch to certain bash num
  • command mode: :
    • focus right focus on right part
    • resize change size, can also do ctrl+- and ctrl++_ for fast decrease and increase size
    • save layout layout dump .my_filename
    • reload layout source .my_filename
    • set as default echo source .my_filename >> ~/.screenrc

A example workflow:

  1. ctrl+a+| or S split regions
  2. create new screen by ctrl a c or activate by double ctrl a
  3. change title by ctrl a A

job scheduling

Ref: {official doc}

check remaining quota (user limits)

use sshare -A def-<account>_<cpu|gpu> -l -U <user> to check user limits. Replace <> with your user name, and <cpu|gpu> means choose either of them. This is tricky as there are actually two separate accounts for cpu and gpu jobs.

The EffectvUsage column tells the used proportion. A low EffectvUsage usually comes with a high LevelFS indicating high priority.

The partitiion-status command should return the load of each node, however, not working on Cedar.

to minimize wait time

The official doc {Link} suggests less than 3 hours allocation requests tend to get instant responses.

My experience on Graham is set –time=3:00:00 almost get queued immediately.

The full run time level are:

  • 3 hours or less
  • 12 hours or less
  • 24 hours (1 day) or less
  • 72 hours (3 days) or less
  • 7 days or less
  • 28 days or less

The official instructions:

  • Specify the job runtime only slightly (~10-20%) larger than the estimated value.
  • Only ask for the memory your code will actually need (with a bit of a cushion).
  • Minimize the number of node constraints.
  • Do not package what is essentially a bunch of serial jobs into a parallel (MPI/threaded) job – it is much faster to schedule many independent serial jobs than a
    single parallel job using the same number of cpu cores.

Some handy command line combo

match pattern and print the next few lines

remove echo to do actually rename

connect to allocated nodes

{Official Doc-Attach to a running job}

The tmux is a screen-like software for multi-screen usage.
The Cheat Sheet of tmux: {Ref}

check job status

And check the progress by:

Jobs can have 3 status:

  1. CG job completed
  2. PD pending, followed by reason (Resources, Priority, ReqNodeNotAvail) {Ref}
  3. R running

Ref: SHARCNET official course series {Dashboard_Link} {ML_Intro}, {Scheduler}

tensorflow deployment

The main challenge is numpy&tensorflow compatibility.
The following scheme works for now (2023/03/28) link
Official doc {Link}

tensorboard: interactive, visualized probing

Motivation: It requires a quick visualization for the increasing workload in model profiling, especially with usage of transfer learning.
Official wiki: {Link}
Recommend connect to a running node before using. You may found this operation in the above section.
Start tensorboard with command below. Default port is 6006, use a different port to avoid interference. –load_fast seems to be a compute canada specified option.

Then bind your local port with the remote port to visit.

iPython

zip

{Ref}

ML 101

Reinforcement Learning

From wiki
It doesn\’t need paired labels. The practitioner only need to mark the results generated by the model.

Illustration

scheduler run MNIST with GPU

Reading man of sbatch

  1. --contraint can specify the cpu gpu features.

  2. if use win notepad++ write bash scripts, replace all \’\r\n\’ to \’\r\’ before run on linux

  3. sbatch submit account only accept def-bingqli so far

  4. need to install on your own environment ahead of time if certain package required. Such as torchvision, torchtext, torchaudio:

  5. see this page for jupyterHub on clusters:
    https://docs.computecanada.ca/wiki/JupyterHub

  6. check available wheels here
    https://docs.computecanada.ca/wiki/Available_Python_wheels

  7. Check jobs in queue

parallel computing

  1. Only parallize task longer than 1e-4s
  2. Check the flow by timer, not guessing

Windows Users

参考{知乎-保姆级入门教程}
包括:

  • 载入python
  • 构建虚拟环境
  • 批量安装所需包

还有第二弹{Job Submission}
主打多任务自动提交

目前已经成功接入graham
查看官网说明,似乎有推荐的数据结构
Storage&File Management
官方教程中有大多数操作文件操作的说明:
{FAQ}

Visual exploration of Data by SHARCNET: {Youtube}
Highlight:

  • df.groupby.().plot(kind=\’hist\’)
  • seaborn PCA plotting, etc.
  • interactive matplotlib (hide data)
  • create your own python web app with bohec

Come back when you need to visualize your data.

free WebDAV from NextCloud

NextCloud is a cloud disk service hold by compute canada as well. Each user/group has 100GB quota.
I personally use it for Zotero. To deploy the WebDAV, simply log into nextcloud then click the settings on bottom left and copy the WebDAV url paste it in the required url cell in other softwares. Username and password are the same as your nextcloud one.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.