====== Monitoring ======
On the EEMCS-HPC Cluster are several modules available to monitor jobs and computing hardware.\\
The scheduler based utilities can be loaded with the following command :
module load slurm/utils
The compute node utilities can be loaded with the following command :
module load monitor/node
===== Scheduler/Jobs =====
==== cluster dashboard ====
The overall status of jobs (queued/running), racks, mapping, partitions, qos and reservations can be viewed accessing the **[[http://hpc-status.ewi.utwente.nl/slurm|EEMCS-HPC Slurm dashboard page]]**.
==== sinfo ====
This command is used to view partition and node information.
==== squeue ====
This command is used to view job and job step information for jobs managed by the Slurm scheduler.
==== scancel ====
This command is used to signal jobs or job steps that are under the control of the Slurm scheduler.\\
For example to cancel your job run the following command :
scancel
==== scontrol ====
This command is used view or modify the Slurm configuration and state.
==== seff ====
When the job is finished, the efficiency of a job can be viewed with the **seff** perl script.
seff
===== Compute nodes =====
==== top-node ====
With this shell script you can see what your Cpu processes are on the specified node:
top-node [+optional top arguments]
==== nvtop-node ====
With this shell script you can see what your Gpu processes are on the specified node:
nvtop-node [+optional nvtop arguments]
==== nvidia-smi ====
With this shell script you can request info of the Gpus listed on the specified node:
nvidia-smi-node [+optional nvidia-smi arguments]