How do I specify the amount of memory my job requires?

LandryB · December 16, 2020, 7:57pm

I’m running into a memory allocation error when submitting a job to be ran on one of the GPU nodes, specifically:

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.17 GiB total capacity; 237.94 MiB already allocated; 19.62 MiB free; 242.00 MiB reserved in total by PyTorch)

Is there a way to specify a GPU that is not in use, or set the amount of memory my job requires beforehand so that the scheduler doesn’t attempt to run my job on a GPU node that doesn’t have adequate resources?

ljchang · January 4, 2021, 9:19pm

Not sure, I think @tiankang has also been having this problem. @arnsong might be able to help. I understand that there will be some new GPUs available soon accompanying a migration to the SLURM scheduler. This is also supposed to help with some of the wonky things we’ve all been experiencing with the GPUs on discovery.

LandryB · January 7, 2021, 6:22pm

It turns out it is as simple as using the mem argument. In the PBS script you would add:

#PBS -l mem=2gb

The argument is formatted as a positive integer followed by b, kb, mb or gb. Source

paxton · January 12, 2021, 4:49pm

Also, in case it’s helpful, here is the manual for the version of Torque Discovery uses. There’s a full list with descriptions of all resources that can be requested using the #PBS -l directive starting on pg. 74.

You can also view the list on discovery with man pbs_resources

Topic		Replies	Views
Running Interactive GPU Job in SLURM Discovery Questions	3	681	July 14, 2021
How do I submit an interactive job? Discovery Questions	2	800	February 22, 2022
Buying into Discovery Discovery Questions	3	706	September 21, 2020
How do I run a jupyter notebook server on discovery? Discovery Questions	2	1166	March 1, 2022
ISC implementation NLTools	4	1033	November 1, 2020

How do I specify the amount of memory my job requires?

Related topics