Message boards : Questions and problems : Problem accessing GPU memory
Message board moderation
Author | Message |
---|---|
Send message Joined: 10 Mar 25 Posts: 2 |
Hi all, I just setup boinc-client 8.0.2-715 amd64 on Debian 12 CLI with a Nvidia GTX 950 GPU and an Intel CPU. The CPU was working fine but I recently got 18 new asteroids@home GPU tasks which have all failed (with "state: compute error"). I was wondering if this is a problem with the task itself or with my GPU drivers, but I am strongly leaning towards the drivers. I checked the logs in /var/lib/boinc/stderrgpudetect.txt and found this OpenCL: libOpenCL.so.1: cannot open shared object file: No such file or directory NVIDIA library reports 1 GPU [coproc] cuMemGetInfo(0) returned 201 ATI: libaticalrt.so: cannot open shared object file: No such file or directory[coproc] cuMemGetInfo(0) returned 201 seems to be very relevant but I am not sure how to solve this. Most results online are people using this function in their own code. When running, nvidia-smi, I also noticed that this log is produced (unfortunately I cannot find which log file it is from). kernel: __vm_enough_memory: pid: 1533, comm: nvidia-smi, no enough memory for the allocation Here are a few more details: $ nvidia-smi Sun Mar 9 21:16:43 2025 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.216.01 Driver Version: 535.216.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 950 Off | 00000000:01:00.0 Off | N/A | | 32% 31C P8 15W / 75W | 1MiB / 2048MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ I tried installing the CUDA toolkit just in case. $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0 Then I reinstalled all NVIDIA packages and drivers, including CUDA, and it has not helped. Kernel version is 6.1.0-31-amd64. All packages should be up to date. Is there any chance that installing an older GPU driver could fix the issue? Which one should I try installing? I just started learning Linux (and it has been a rough ride) so please be as clear and detailed as possible in your response. Any help would be appreciated in solving this issue. Thank you! |
Send message Joined: 5 Sep 22 Posts: 41 ![]() |
In reply to Pandemonyum's message of 10 Mar 2025: Hi all, This is not a problem with Linux. You only have 2 GB of memory on your GPU but the tasks you are trying to run need more than that. |
Send message Joined: 10 Mar 25 Posts: 2 |
The vm_enough_memory log would show up when running the nvidia-smi command. In that case, I don't think much GPU memory is required, if any. I noticed that persistence mode was off, which is strange because the nvidia persistance daemon was active. I could not manually enable persistence mode as it was throwing out an error. It turns out that the manual command only works if the daemon is not running. This kills the process and turns on persistence mode: ps aux | grep nvidia-persistenced sudo kill [PID] sudo nvidia-smi -pm 1 After running these commands, I ended up getting more GPU tasks and they were all successful. I am still getting the [coproc] cuMemGetInfo(0) returned 201 error in /var/lib/boinc/stderrgpudetect.txt, but it seems to be irrelevant to the issue at hand. I will try to make the daemon work, but if I cannot do it, a quick and easy fix would be disabling the daemon and running a script at boot to turn persistence mode on manually. I hope this helps someone else! |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.