====== Monitoring ====== On the EEMCS-HPC Cluster are several modules available to monitor jobs and computing hardware.\\ The scheduler based utilities can be loaded with the following command : module load slurm/utils The compute node utilities can be loaded with the following command : module load monitor/node ===== Scheduler/Jobs ===== ==== cluster dashboard ==== The overall status of jobs (queued/running), racks, mapping, partitions, qos and reservations can be viewed accessing the **[[http://hpc-status.ewi.utwente.nl/slurm|EEMCS-HPC Slurm dashboard page]]**. ==== sinfo ==== This command is used to view partition and node information. ==== squeue ==== This command is used to view job and job step information for jobs managed by the Slurm scheduler. ==== scancel ==== This command is used to signal jobs or job steps that are under the control of the Slurm scheduler.\\ For example to cancel your job run the following command : scancel ==== scontrol ==== This command is used view or modify the Slurm configuration and state. ==== seff ==== When the job is finished, the efficiency of a job can be viewed with the **seff** perl script. seff ===== Compute nodes ===== ==== top-node ==== With this shell script you can see what your Cpu processes are on the specified node: top-node [+optional top arguments] ==== nvtop-node ==== With this shell script you can see what your Gpu processes are on the specified node: nvtop-node [+optional nvtop arguments] ==== nvidia-smi ==== With this shell script you can request info of the Gpus listed on the specified node: nvidia-smi-node [+optional nvidia-smi arguments]