Partitions in Slurm can be considered as a resource abstraction. A partition configuration defines job limits and access controls for a group of nodes. Slurm allocates resources to a job within the selected partition by taking into consideration the job's requested resources and the partition's available resources and restrictions.
Thirteen partitions are currently configured:
- MFCF CPU partitions
- cpu_pr1 : For running jobs on the hpc-pr2 cluster
- cpu_pr3 : For running jobs on the hpc-pr3 cluster
- MFCF GPU partitions
- gpu_p100 : For running jobs on Pascal 100 GPU server
- gpu_a100 : For running jobs on the Ampere 100 GPU server
- gpu_h100 : For running jobs on the Hopper 100 GPU server
- gpu_l40s : For running jobs on the Ada Lovelace L40S GPU server
- hagrid cluster partitions
- barrio1 partition
- barrio1 : For running jobs on the barrio1 machine
- mosaic cluster partitions: The Mosaic cluster was purchased by two CFI projects andÌýwas originally operated by SHARCNET. The owners haveÌýcontributed this cluster to MFCF for use by the Faculty of Mathematics (other than SCS). The owners and their collaborators retain higher priority for use of the cluster.
Details of each partition's resources are shown in tables below. Resources may be adjusted according to observed usage patterns. Also partitions' computational resources may overlap. For example, cpu_mosaic_owner and cpu_mosaic_guest share the same computation resources. They differ in job limits, access controls and job priorities.
±«²õ±ð´Ú³Ü±ôÌý³¦´Ç³¾³¾²¹²Ô»å²õ
- For detailed information on all partitions on a cluster
scontrol show partition
- For information on specific partition
scontrol show partition <partition-name>
Partitions details
- cpu_pr1 partition: for running production jobs on hpc-pr2 cluster.
Partition name |
cpu_pr1 |
---|
Total available memory |
512 GB |
---|
Max Cores |
96 cores |
---|
Threads per core |
2 Threads |
---|
Total GPU devices |
0 |
---|
GPU memory per device |
0 GB |
---|
Compute Nodes |
hpc-pr2-[01-08] |
cpu_pr1 partition resources
Ìý
Max runtime (h) |
180 hours |
---|
Max Nodes |
6 Nodes |
cpu_pr1 partition per job resource limits
- cpu_pr3 partition: for running production jobs on hpc-pr3 cluster.
Partition name |
cpu_pr3 |
---|
Total available memory |
1024 GB |
---|
Max Cores |
256 cores |
---|
Threads per core |
2 Threads |
---|
Total GPU devices |
0 |
---|
GPU memory per device |
0 GB |
---|
Compute Nodes |
hpc-pr3-[01-08] |
cpu_pr3 partition resources
Ìý
Max runtime (h) |
180 hours |
---|
Max Nodes |
6 Nodes |
cpu_pr3 partition per job resource limits
- gpu_p100 partition: for jobs using the P100 GPUs
Partition name |
gpu_p100 |
---|
Total available memory |
44 GB |
---|
Max Cores |
28 cores |
---|
Threads per core |
2 Threads |
---|
Total GPU devices |
4 Tesla P100 |
---|
GPU memory per device |
16 GB |
---|
Compute Nodes |
gpu-pr1-01 |
gpu_p100 partition resources
Ìý
Max runtime (h) |
180 hours |
---|
Max Nodes |
1 Node |
gpu_p100 partition per job resource limits
- gpu_a100 partition: for jobs using the A100 GPUs
Partition name |
gpu_a100 |
---|
Total available memory |
1 TB |
---|
Max Cores |
64 cores |
---|
Threads per core |
1 Threads |
---|
Total GPU devices |
8 Tesla A100 |
---|
GPU memory per device |
4 GPUs with 40 GB
4 GPUs with 80 GB |
---|
Compute Nodes |
gpu-pr1-02 |
gpu_a100 partition resources
Ìý
Max runtime (h) |
180 hours |
---|
Max Nodes |
1 Node |
gpu_a100 partition per job resource limits
- gpu_h100 partition: for jobs using the H100 GPUs
Partition name |
gpu_h100 |
---|
Total available memory |
1 TB |
---|
Max Cores |
112 cores |
---|
Threads per core |
1 Threads |
---|
Total GPU devices |
4 H100 |
---|
GPU memory per device |
80 GB |
---|
Compute Nodes |
gpu-pr1-03 |
gpu_h100 partition resources
Max runtime (h) |
180 hours |
---|
Max Nodes |
1 Node |
gpu_h100 partition per job resource limits
- gpu_l40s partition: for jobs using the L40S GPUs
Partition name |
gpu_l40s |
---|
Total available memory |
768 GB |
---|
Max Cores |
84 cores |
---|
Threads per core |
1 Threads |
---|
Total GPU devices |
3 L40S |
---|
GPU memory per device |
48 GB |
---|
Compute Nodes |
gpu-pr1-04 |
gpu_l40s partition resources
Max runtime (h) |
180 hours |
---|
Max Nodes |
1 Node |
gpu_l40s partition per job resource limits
- hagrid_batch partition: This partition is accessible only by hagrid cluster users. If hagrid_batchÌýpartition is selected, you have to specify hagrid account using --account=hagrid option.
Partition name |
hagrid_batch |
---|
Total available memory |
1440 GB |
---|
Total available cores |
160 cores |
---|
Threads per core |
1 Threads |
---|
Total GPU devices |
0 |
---|
Compute Nodes |
hagrid[01-08] |
hagrid_batch partition resources
Ìý
Max runtime (h) |
200 hours |
---|
Max Nodes |
6 Nodes |
hagrid_batch partition per job resource limits
- hagrid_interactive partition: This partition is for Slurm interactive sessions and is accessible only by hagrid cluster users. Interactive jobs, or sessions, are useful for jobs that require direct user input such as code development, compiling, testing/debugging etc.
Partition name |
hagrid_interactive |
---|
Total available memory |
180 GB |
---|
Total available cores |
20 cores |
---|
Threads per core |
2 Threads |
---|
Total GPU devices |
0 |
---|
Compute Nodes |
hagrid-storage |
hagrid_interactive partition resources
Ìý
Max runtime (h) |
4 hours |
---|
Max Nodes |
1 Node |
hagrid_interactive partitionÌý per job resource limits
- barrio1 partition: This partition is accessible only by barrio1 cluster users. It is a single node partition. Slurm interactive session would be recommended for jobs that require direct user input such as code development, compiling, testing/debugging etc.
Partition name |
barrio1 |
---|
AllowAccounts |
barrio1 |
---|
Total available memory |
64 GB |
---|
Total available cores |
8 cores |
---|
Threads per core |
2 Threads |
---|
Total GPU devices |
0 |
---|
Compute Nodes |
barrio1.math |
barrio1 partition resources
Ìý
Max runtime (h) |
UNLIMITED |
---|
Max Nodes |
1 Node |
barrio1 partition per job resource limits
- mosaic cluster partitions
The Mosaic cluster consists of 20 dual-CPU machines each with one GPU device, and 4 quad-CPU machines with largeÌýmemory.ÌýÌýThe machines are connected by an InfiniBand network. ÌýAll non-CS Math Faculty researchers and grad students may use the cluster,Ìýbut the owners and their collaborators retain higher priority.
- cpu_mosaic_owner partition: this partition is for the owners and their collaborators
Partition name |
cpu_mosaic_owner |
---|
AllowAccounts |
mosaic_owners |
---|
Total available memory |
2304 GB |
---|
Total available cores |
96 cores |
---|
Threads per core |
1 Threads |
---|
Total GPU devices |
0 |
---|
Compute Nodes |
cpu_mosaic_owner.math |
cpu_mosaic_owner partition resources
Ìý
Max runtime (day-hh:mm:ss) |
7-04:00:00 |
---|
Max Nodes |
3 nodes |
cpu_mosaic_owner partition per job resource limits
Ìý
- cpu_mosaic_guest partition: this partition is for users other than the owners of the Mosaic cluster
Partition name |
cpu_mosaic_guest |
---|
AllowAccounts |
cpu_mosaic_guest |
---|
Total available memory |
2304 GB |
---|
Total available cores |
96 cores |
---|
Threads per core |
1 Threads |
---|
Total GPU devices |
0 |
---|
Compute Nodes |
mosaic-[21-23] |
cpu_mosaic_guest partition resources
Ìý
Max runtime (day-hh:mm:ss) |
7-04-00-00 |
---|
Max Nodes |
3 Ìýnodes |
cpu_mosaic_guest partition per job resource limits
- gpu_k20_mosaic_owner partition: This partition is for the owners. EachÌýnode in this partition hasÌýone Tesla K20m GPUÌýdevice. Use thisÌýpartition for GPU jobs.
Partition name |
gpu_k20_mosaic_owner |
---|
AllowAccounts |
gpu_k20_mosaic_owner |
---|
Total available memory |
4608 GB |
---|
Total available cores |
360 cores |
---|
Threads per core |
1 Threads |
---|
Total GPU devices |
18 Tesla Kepler |
---|
Compute Nodes |
mosaic-[01-19] |
gpu_k20_mosaic_owner partition resources
Ìý
Max runtime (day-hh:mm:ss) |
7-12:01:00 |
---|
Max Nodes |
9 Nodes |
gpu_k20_mosaic_owner partition per job resource limits
- gpu_k20_mosaic_guest partition: Anyone may use this partition for GPU jobs,Ìýbut jobs launched hereÌýwill have a lower priority than jobs launched via cpu_mosaic_owner. EachÌýnode in this partition has one Tesla K20m GPUÌýdevice. TheÌýgpu_k20_mosaic_guestÌýand gpu_k20_mosaic_owner partitions shareÌýthe same machines. Jobs launched via gpu_k20_mosaic_owner will pre-empt jobs in gpu_k20_mosaic_guest.
Partition name |
gpu_k20_mosaic_guest |
---|
AllowAccounts |
ALL |
---|
Total available memory |
4608 GB |
---|
Total available cores |
360 cores |
---|
Threads per core |
1 Threads |
---|
Total GPU devices |
18 Tesla Kepler |
---|
Compute Nodes |
mosaic-[01-19].math |
gpu_k20_mosaic_guest partition resources
Ìý
Max runtime (day-hh:mm:ss) |
7-12:01:00 |
---|
Max Nodes |
5 Nodes |
gpu_k20_mosaic_guest partition per job resource limits