(Last update 2024-03-29)
List of various limit values in TSUBAME4.0. It may be changed due to other factors such as batch jobs getting stuck beyond estimation and tight electricity supply-demand balance. Regardless of the upper limit value, please cooperate in easing congestion when crowded.
Job Execution
Weekdays(*1) | Weekends(*2) | |
---|---|---|
Number of jobs running simultaneously per user | 30 jobs | 100 jobs |
Number of slots running simultaneously per user (number of CPU cores) | 6144 slots | 12288 slots |
Maximum parallelism per job | 64(*3) | 64 |
Maximum execution time per job | 24 hours | 24 hours |
*1 : weekdays : Jobs that start every Sunday between 9:00 and Friday 16:00
*2 : weekends : Jobs that start between Friday 16:00 and Sunday 9:00. However, holidays are not considered to simplify processing.
*3 : Although 64 parallelism is permitted, note that only 32 node parallelism is permitted in node_f case because of the 6144 slot limit.
Node Reservation
April-September | October-March(busy period) | |
---|---|---|
Number of reservation providing nodes (whole) | 70 nodes(*4) | 20 modes(*4) |
1 Reservation maximum reservation time | 168 hours(7 days) | 96 hours(4 days) |
Total number of reservation frames that one group can simultaneously secure | 3360 node hours(*4) | 960 node hours(*4) |
*4 : Add unused subscription nodes (50 nodes) to reserved nodes. It also adds the total number of reservable nodes accordingly.
*5 : The above "Job Execution" limitations do not apply to "Node Reservation".
Subscription
Max | |
---|---|
Provided node (whole) | 50 nodes |
Number of nodes that can be secured by one group and one budget manager at the same time | 2 nodes |
Login nodes
As the login nodes (login, login1, login2) are shared with many users, please do not execute heavy workloads on them.
Please refrain from occupying the CPU in the login nodes.
Interactive job queue
- Assigned resources number of physical CPU cores 24 cores, 96GB memory, 1MIG, but up to 12 people share the same resources.
- If there's no suitable resource left, the job submission will fail like normal qrsh command.
- Since the memory contents are swapped out to the SSD according to the congestion situation, the performance may be significantly reduced.
- Number of jobs that can be executed simultaneously per user 1 job
- Maximum usage time 24 hours
- The local scratch area (SSD) is reserved and shared, and the available capacity cannot be guaranteed.
- Execution by reservation, Docker container job cannot be used. Container jobs with Apptainer are available.
- The programs with intermittent processor usage, such as debuggers, visualizers, Jupyter Lab, are expected and do not use this service for programs that dominate processors continuously.
- Depending on the program execution status, jobs that significantly hinder the execution of other users' programs may be deleted.
- These limitations may be revised without prior notice. Also, the service itself may be terminated or suspended without prior notice.
Group Disk
The capacity of group disks that can be purchased by one group is as follows.
Storage | Sizes available for purchase |
---|---|
/gs/bs (Large-scale (Big) storage, HDD) | 100TB |
/gs/fs (High-speed storage, SSD) | 3TB |