在服务器上部署docker配置深度学习环境
Step 0 Install Nvidia Driver on Server
- 首先需要确保在服务器上安装了 Nvidia 的显卡驱动,
nvidia-smi
命令要能用; - 查看服务器显卡型号:
lspci | grep -i nvidia
; - Nvidia 驱动下载地址:click ;
Step 1 Setting up Docker
- 安装 docker:
sudo apt install docker.io
; - 启动 docker:
sudo systemctl start docker
;
Step 2 Setting up NVIDIA Container Toolkit
-
在终端执行下列指令,直接复制回车:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
-
安装 Nvidia Container Toolkit:
sudo apt install -y nvidia-container-toolkit
; -
配置 Docker 守护进程识别 Nvidia Container Runtime :
sudo nvidia-ctk runtime configure --runtime=docker
; -
重启 Docker,完成安装:
sudo systemctl restart docker
; -
最后可以测试一下能否使用 cuda:
sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
;-
成功的话会显示如下:
-
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
-
-
更多细节可参考 Nvidia 官方文档 ;
Step 3 Pull Image
- 下载配有 pytorch、cuda、cudnn 的镜像:
docker pull hambaobao/dlb:pytorch2.0.0-cuda11.7-cudnn8-v1.0
; - 可以自己下载 Pytorch 或者 Nvidia 提供的更原始的镜像使用:
Step 4 Create a container
- example:
docker run -itd --name s87d --hostname s87d --gpus all -v /data0:/data0 -v /data1:/data1 -v /data2:/data2 -v /home/:/home/ -p 2094:2094 -p 4382:4382 hambaobao/dlb:pytorch2.0.0-cuda11.7-cudnn8-v1.0
;--name
:容器的名字;--hostname
:容器内的主机名;--gpus
:!!!非常关键!!!,没有这个选项容器内无法访问 GPUs ;-v
:挂载目录,建议挂载所有data
目录和home
目录;-p
:端口映射,用于ssh
和proxy control
;- 一个最后是使用的镜像;