====== Monitoring ======

On the EEMCS-HPC Cluster are several modules available to monitor jobs and computing hardware.\\

The scheduler based utilities can be loaded with the following command :
<code block>
module load slurm/utils
</code>

The compute node utilities can be loaded with the following command : 
<code block>
module load monitor/node
</code>

===== Scheduler/Jobs =====

==== cluster dashboard ====

The overall status of jobs (queued/running), racks, mapping, partitions, qos and reservations can be viewed accessing the **[[http://hpc-status.ewi.utwente.nl/slurm|EEMCS-HPC Slurm dashboard page]]**.

==== sinfo ====
This command is used to view partition and node information.

==== squeue ====
This command is used to view job and job step information for jobs managed by the Slurm scheduler. 

==== scancel ====
This command is used to signal jobs or job steps that are under the control of the Slurm scheduler.\\
For example to cancel your job run the following command :
<code block>
scancel <job-id>
</code>

==== scontrol ====
This command is used view or modify the Slurm configuration and state. 

==== seff ====

When the job is finished, the efficiency of a job can be viewed with the **seff** perl script.
<code block>
seff <job-id>
</code>

===== Compute nodes =====

==== top-node ====

With this shell script you can see what your Cpu processes are on the specified node:
<code block>
top-node <node-name> [+optional top arguments]
</code>

==== nvtop-node ====

With this shell script you can see what your Gpu processes are on the specified node:
<code block>
nvtop-node <node-name> [+optional nvtop arguments]
</code>

==== nvidia-smi ====

With this shell script you can request info of the Gpus listed on the specified node:
<code block>
nvidia-smi-node <node-name> [+optional nvidia-smi arguments]
</code>