GPU usage documentation

Selecting a GPU to carry out a task seems a more intranscendental task than what it truly is. We shouldn't be using the same GPU that a colleague is already using if we have a free GPU available.

In this documentation we are going to tackle how to checkout which GPU's are being used, who is using each gpu, how to select a specific gpu and the relation between batch size and memory usage.

1. NVIDIA SMI tutorial

The NVIDIA System Management Interface (nvidia-smi) is a command line utility that allows us to query the GPU's current state and all processes. The SMI can be accessed to using the command nvidia-smi at the terminal.

Using it, we get the following output:

$ user@machinelearning2:~$ nvidia-smi
Thu Oct  7 15:56:00 2021  
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:18:00.0 Off |                  N/A |
| 51%   55C    P2   110W / 350W |   2329MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:3B:00.0 Off |                  N/A |
|  0%   46C    P8    27W / 260W |    158MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 3090    Off  | 00000000:86:00.0 Off |                  N/A |
| 34%   44C    P8    36W / 350W |      2MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  Off  | 00000000:AF:00.0 Off |                  N/A |
|  0%   38C    P8    20W / 260W |      3MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     27689      C   python                           2327MiB |
|    1   N/A  N/A      7492      C   python                            155MiB |
+-----------------------------------------------------------------------------+

The first table contains information about the GPU's. On the first column we have, mainly, the GPU number, the model, temperature and current wattage. On the next column we can find the amount of memory that the GPU is currently using. Lastly we can find the volatile utilization of the GPU. You can check out more information about this in this StackOverflow post.

On the second table we can find information about the processes that are currently running on each GPU. If you'd want more information about an specific process, you could use ps aux. Check the next section to find out how.

Note: The command gpustat is installed both in gea_1 and gea_2 and it works in a similar fashion than nvidia-smi. We will focus solely on nvidia-smi in this tutorial, but you can check out gpustat here.

2. Checking out processes using `ps aux`

The ps aux gives us information about the processes that are currently running. If we run the command ps aux | grep PID we get specific information of the proces with said PID.

Let's see what happens when we use the command with the process 7492 that was previously running:

$ user@machinelearning2:~$ ps aux | grep 7492
elopez    7492 1081  8.4 30075588 11168984 pts/64 Dl+ oct06 18199:29 python train_2terms.py
jcribei+ 14683  0.0  0.0  14364   964 pts/32   S+   15:57   0:00 grep --color=auto 7492

Here, we can see the user that is running the process (elopez), when he/she started the process (oct06) and what he/she is specifficly running.

3. Criteria for selecting a GPU for training

Now that we have stablish how to check the GPU usage and individual processes, it's time to talk about which GPU to select for training and when.

Plain simply, you should avoid using a GPU that someone else is currently using. If it's not possible, select the GPU that you know that have enough memory left to support all of the tasks that are running, or wait for one to free up. Why is it? When a GPU is running a process, running another process on top of it queue more work into the GPU, thus slowing down all the processes that the GPU is running. Besides that, another problems may surge when running multiple experiments at the same GPU that can end-up on a crash.

Up to ai-project-template v0.6.0 selecting a GPU was a manual task, but since this version there is a full suite of options available for correctly selecting a GPU for your experiment:

Legacy GPU manual selection: the classical way of selecting a GPU using its ID.
Assisted manual selection: similar to the manual one, but the user is presented with the available GPUs and the state of each one before launching the experiment so he/she can judge which one to use.
Automatic selection: an algorithm determines which GPU to use based in available memory and status of the GPUs available in the server.

You are free to use any of the three previous ways to select a GPU, in the next section is explained how to use all of them.

4. How to select a GPU for training: manual, assisted and auto

Check out the experiment stage from ai-project's dvc.yaml file:

# dvc.yaml
# . . .
  run_experiment_mlflow:
    cmd: export MLFLOW_TRACKING_URI="http://10.10.30.58:8999/" &&
      mlflow run . --experiment-name {{cookiecutter.__project_slug}} --no-conda
      -P dataset=configs/datasets/{{cookiecutter.dataset}}.py
      -P model=configs/models/resnet_18.py
      -P runtime=configs/runtimes/runtime.py
      -P scheduler=configs/schedulers/one_cycle_8_epochs.py
      -P gpu=2
    deps:
      - configs
      - results/data/transform/coco_to_mmclassification-{{cookiecutter.dataset}}
    metrics:
      - results/metrics.json:
          cache: false
    plots:
      - results/prc.json:
          cache: false
          x: recall
          y: precision

Here, the gpu argument indicates which GPU selection mode will be used.

If the gpu argument isn't present in the command or if its value is gpu=-2 the assisted mode will be used. This is the default mode for GPU selection.
If gpu=-1 the automatic mode will be used, so the user doesn't have to select anything.
Otherwise, if gpu=n where n is any positive integer including 0, this activates the legacy mode. The GPU with its ID equal to the value of n will be used.

For example, take a look to the GPUs using the nvidia-smi command:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090    Off  | 00000000:18:00.0 Off |                  N/A |
| 51%   55C    P2   110W / 350W |   2329MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:3B:00.0 Off |                  N/A |
|  0%   46C    P8    27W / 260W |    158MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 3090    Off  | 00000000:86:00.0 Off |                  N/A |
| 34%   44C    P8    36W / 350W |      2MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  Off  | 00000000:AF:00.0 Off |                  N/A |
|  0%   38C    P8    20W / 260W |      3MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

gpu=2 would refer to the third GPU in the table in legacy mode. Analogously, using the assisted mode the user will be presented with this output:

                      __       __
  ___ ____  __ _____ / /____ _/ /_
 / _ `/ _ \/ // (___/ __/ _ `/ __/
 \_, / .__/\_,_/___/\__/\_,_/\__/
/___/_/  

machinelearning2        Tue Feb  1 12:46:49 2022  460.32.03
[0] GeForce RTX 3090    | 53'C,   0 % |  2329 / 24268 MB | userA(2329M)
[1] GeForce RTX 2080 Ti | 37'C,   0 % |   158 / 11019 MB | userB(158M)
[2] GeForce RTX 3090    | 60'C,   0 % |     2 / 24268 MB |
[3] GeForce RTX 2080 Ti | 38'C,   0 % |     3 / 11019 MB |

Select a GPU for operation: |

which shows equivalent info to the nvidia-smi output.

Automatic mode tries to find the GPU with the maximum available memory available_memory = max_memory - used_memory and solves the ties selecting the GPU with the lowest used_memory. We strongly recommend using the assited mode for fine grained control over which GPU to use.

5. Relation between batch size, GPU memory and training speed

Keep this in mind at all times:

The larger the batch size, the larger the memory consumption, the larger the GPU usage; the faster the training speed.

When setting up your training remember to maximize your batch size in order to speed up your training and, at the same time, reduce the amount of queues formed around the GPUs.

NOTE: Taking a large batch size it's not always good, as it can reduce the regularization of the model. You can read more about it here.

6. Recap and conclusions

Summing up:

Before training, check for available GPUs using nvidia-smi. Decide which GPU selection mode to use, aim to assisted for fine grained control or auto for fast and easy deployment. Finally, try to maximize the batch size to train efficiently and reduce queues around the GPU.