Message boards : GPUs : Fedora 25 GPUs sometimes there but reported missing
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Jun 11 Posts: 30 |
I am in the process of setting up a new PC running Fedora 25. When the system boots, it reports the Nvidia drivers there, but no usable GPUs. e.g.
14-Mar-2017 11:50:23 [---] log flags: file_xfer, sched_ops, task, coproc_debug 14-Mar-2017 11:50:23 [---] Libraries: libcurl/7.51.0 NSS/3.27 zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.11) libssh2/1.8.0 nghttp2/1.13.0 14-Mar-2017 11:50:23 [---] Running as a daemon 14-Mar-2017 11:50:23 [---] Data directory: /var/lib/boinc 14-Mar-2017 11:50:23 [---] [coproc] launching child process at /usr/bin/boinc_client 14-Mar-2017 11:50:23 [---] [coproc] relative to directory /var/lib/boinc 14-Mar-2017 11:50:23 [---] [coproc] with data directory /var/lib/boinc 14-Mar-2017 11:50:23 [---] [coproc] NVIDIA drivers present but no GPUs found 14-Mar-2017 11:50:23 [---] [coproc] ATI: libaticalrt.so: cannot open shared object file: No such file or directory 14-Mar-2017 11:50:23 [---] [coproc] clGetPlatformIDs() failed to return any OpenCL platforms 14-Mar-2017 11:50:23 [---] No usable GPUs found 14-Mar-2017 11:50:23 [---] app version refers to missing GPU type NVIDIA 14-Mar-2017 11:50:23 [Einstein@Home] Application uses missing NVIDIA GPU 14-Mar-2017 11:50:23 [Einstein@Home] Missing coprocessor for task LATeah0017L_644.0_0_0.0_941250_1 14-Mar-2017 11:50:23 [Einstein@Home] Missing coprocessor for task LATeah0017L_644.0_0_0.0_770570_1 14-Mar-2017 11:50:23 [Einstein@Home] Missing coprocessor for task LATeah0017L_644.0_0_0.0_962585_0 14-Mar-2017 11:50:23 [Einstein@Home] Missing coprocessor for task LATeah0017L_644.0_0_0.0_769315_1 14-Mar-2017 11:50:23 [---] Host name: modron 14-Mar-2017 11:50:23 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz [Family 6 Model 158 Stepping 9] 14-Mar-2017 11:50:23 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp 14-Mar-2017 11:50:23 [---] OS: Linux: 4.9.13-201.fc25.x86_64 14-Mar-2017 11:50:23 [---] Memory: 15.63 GB physical, 14.90 GB virtual 14-Mar-2017 11:50:23 [---] Disk: 205.02 GB total, 151.14 GB free 14-Mar-2017 11:50:23 [---] Local time is UTC +0 hours 14-Mar-2017 11:50:23 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 12504473; resource share 100 14-Mar-2017 11:50:23 [---] General prefs: from http://setiathome.berkeley.edu/ (last modified 07-Nov-2012 15:55:30) 14-Mar-2017 11:50:23 [---] Host location: none 14-Mar-2017 11:50:23 [---] General prefs: using your defaults 14-Mar-2017 11:50:23 [---] Reading preferences override file 14-Mar-2017 11:50:23 [---] Preferences: 14-Mar-2017 11:50:23 [---] max memory usage when active: 8000.31MB 14-Mar-2017 11:50:23 [---] max memory usage when idle: 14400.56MB 14-Mar-2017 11:50:23 [---] max disk usage: 151.26GB 14-Mar-2017 11:50:23 [---] max CPUs used: 2 14-Mar-2017 11:50:23 [---] suspend work if non-BOINC CPU load exceeds 50% 14-Mar-2017 11:50:23 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
14-Mar-2017 11:52:24 [---] Received signal 15 14-Mar-2017 11:52:24 [---] Exiting 14-Mar-2017 11:52:30 [---] Starting BOINC client version 7.6.22 for x86_64-pc-linux-gnu 14-Mar-2017 11:52:30 [---] log flags: file_xfer, sched_ops, task, coproc_debug 14-Mar-2017 11:52:30 [---] Libraries: libcurl/7.51.0 NSS/3.27 zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.11) libssh2/1.8.0 nghttp2/1.13.0 14-Mar-2017 11:52:30 [---] Running as a daemon 14-Mar-2017 11:52:30 [---] Data directory: /var/lib/boinc 14-Mar-2017 11:52:30 [---] [coproc] launching child process at /usr/bin/boinc_client 14-Mar-2017 11:52:30 [---] [coproc] relative to directory /var/lib/boinc 14-Mar-2017 11:52:30 [---] [coproc] with data directory /var/lib/boinc 14-Mar-2017 11:52:30 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 378.13, device version OpenCL 1.2 CUDA, 1996MB, 1996MB available, 495 GFLOPS peak) 14-Mar-2017 11:52:30 [---] [coproc] NVIDIA drivers present but no GPUs found 14-Mar-2017 11:52:30 [---] [coproc] ATI: libaticalrt.so: cannot open shared object file: No such file or directory 14-Mar-2017 11:52:30 [---] Host name: modron 14-Mar-2017 11:52:30 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz [Family 6 Model 158 Stepping 9] 14-Mar-2017 11:52:30 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp 14-Mar-2017 11:52:30 [---] OS: Linux: 4.9.13-201.fc25.x86_64 14-Mar-2017 11:52:30 [---] Memory: 15.63 GB physical, 14.90 GB virtual 14-Mar-2017 11:52:30 [---] Disk: 205.02 GB total, 151.14 GB free 14-Mar-2017 11:52:30 [---] Local time is UTC +0 hours 14-Mar-2017 11:52:30 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 12504473; resource share 100 14-Mar-2017 11:52:30 [---] General prefs: from http://setiathome.berkeley.edu/ (last modified 07-Nov-2012 15:55:30) 14-Mar-2017 11:52:30 [---] Host location: none 14-Mar-2017 11:52:30 [---] General prefs: using your defaults 14-Mar-2017 11:52:30 [---] Reading preferences override file 14-Mar-2017 11:52:30 [---] Preferences: 14-Mar-2017 11:52:30 [---] max memory usage when active: 8000.31MB 14-Mar-2017 11:52:30 [---] max memory usage when idle: 14400.56MB 14-Mar-2017 11:52:30 [---] max disk usage: 151.26GB 14-Mar-2017 11:52:30 [---] max CPUs used: 2 14-Mar-2017 11:52:30 [---] suspend work if non-BOINC CPU load exceeds 50% 14-Mar-2017 11:52:30 [---] (to change preferences, visit a project web site or select Preferences in the Manager) 14-Mar-2017 11:52:30 [Einstein@Home] [coproc] Assigning NVIDIA instance 0 to LATeah0017L_644.0_0_0.0_962585_0 [mike@modron ~]$
|
Send message Joined: 5 Oct 06 Posts: 5128 |
The operating system is starting BOINC too early in the initialisation process, before all components are fully available for use. There have been posts over the years with guidance on changing the boot sequence, but I'll leave it to the Linux specialists to pick out one appropriate to Fedora 25. |
Send message Joined: 30 May 15 Posts: 265 |
I'm not well versed in Fedora but searching for Fedora in the advanced search reveals GPU CUDA issues on Fedora 23 with NVIDIA GTX 970 If the same issue, then it looks like you have not installed the cuda libraries, or boinc client cannot find them, but you are crunching using the OpenCL libraries. There are likely to be several other issues (which explains the no detection at start) these may be addressed with permissions settings and/or adding a delay in the start script. Upgrading boinc to something more recent (7.6.31 or later) which has better GPU detection, may also help. |
Send message Joined: 5 Oct 06 Posts: 5128 |
But look at the second log, after the BOINC restart - BOINC detects the GPU, and even starts running an Einstein task. It's purely a timing thing - the drivers haven't initialised before the first attempt, so BOINC gets no response when it queries them. But later, everything is ready, and works as planned. Edit - but it might be worth a try to install the newer CUDA drivers and a later version of BOINC to iron out some of those remaining warning messages. |
Send message Joined: 30 May 15 Posts: 265 |
But look at the second log, after the BOINC restart - BOINC detects the GPU, and even starts running an Einstein task. OP asked why no CUDA tasks. I don't disagree there is probably a timing issue here, however the first log is likely to have been done by systemd, the second later in x-windows client session with sudo privs. To "detect the GPU" the boinc client forks a copy of itself and makes makes several dlopen calls. OpenCL and CUDA libraries are different and are often installed separately (depends on the tribe). Older versions of boinc client make old assumptions where these libraries are found, and so these calls fail. hth |
Send message Joined: 14 Jun 11 Posts: 30 |
Thanks for the replies CUDA is not part of the standard Red Hat Fedora repositories. Looking back at some old PCs, it appears that CUDA was installed from rpmfusion, I don't have much time to investigate more at the moment, so for now I'll simply restart the boinc-client service when I log on. |
Send message Joined: 14 Jun 11 Posts: 30 |
As a footnote to this thread, I just need to say I worked out what was going on. It seems that Selinux was the problem. To recap. Linux version 4.9.14-200.fc25.x86_64 (Fedora 25 distribution). Nvidia driver 375.39. BOINC client version 7.6.22 for x86_64-pc-linux-gnu, using the standard Fedora repositories, and the Nvidia driver being the only code not from the Fedora repositories. From what I can remember, I cleared the problems with boinc-client first with ... ausearch -c 'boinc_client' --raw | audit2allow -M my-boincclient semodule -i my-boincclient.pp I then had to set the Selinux permissions with "restorecon -v /dev/nvidia-uvm" and then restart boinc-client. That gave me a whole new set of abuse from Selinux about nvidia-modprobe which I cleared in the standard way with... ausearch -c 'nvidia-modprobe' --raw | audit2allow -M my-nvidiamodprobe semodule -i my-nvidiamodprobe.pp So that cleared up all the Selinux problems, but then I discovered I had to add an extra line to the boinc-client.service file to load the nvidia-uvm module, i.e. ExecStartPre=/sbin/modprobe nvidia-uvm Now the system boots and starts running my GPU stuff without any (more) fiddling. Mon 27 Mar 2017 10:26:35 BST | | Starting BOINC client version 7.6.22 for x86_64-pc-linux-gnu Mon 27 Mar 2017 10:26:35 BST | | log flags: file_xfer, sched_ops, task Mon 27 Mar 2017 10:26:35 BST | | Libraries: libcurl/7.51.0 NSS/3.27 zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.11) libssh2/1.8.0 nghttp2/1.13.0 Mon 27 Mar 2017 10:26:35 BST | | Running as a daemon Mon 27 Mar 2017 10:26:35 BST | | Data directory: /var/lib/boinc Mon 27 Mar 2017 10:26:36 BST | | CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 375.39, CUDA version 8.0, compute capability 3.0, 1996MB, 1970MB available, 1982 GFLOPS peak) Mon 27 Mar 2017 10:26:36 BST | | OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 375.39, device version OpenCL 1.2 CUDA, 1996MB, 1970MB available, 1982 GFLOPS peak) Mon 27 Mar 2017 10:26:36 BST | | Host name: modron Mon 27 Mar 2017 10:26:36 BST | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz [Family 6 Model 158 Stepping 9] Mon 27 Mar 2017 10:26:36 BST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp Mon 27 Mar 2017 10:26:36 BST | | OS: Linux: 4.9.14-200.fc25.x86_64 Mon 27 Mar 2017 10:26:36 BST | | Memory: 15.63 GB physical, 14.90 GB virtual Mon 27 Mar 2017 10:26:36 BST | | Disk: 205.02 GB total, 150.26 GB free I hope this information is of use to someone. |
Send message Joined: 21 May 16 Posts: 37 |
Hi there, I am the BOINC co-maintainer for Fedora / RHEL / CentOS. I found out your topic because I just managed to fix GPU detection issues on AMD Radeon, and I wanted to test it on nVidia too. By the way you seem to have fixed your troubles on your own (CUDA), in a different way than I should do to enable OpenCL. I haven't yet pushed an update, I am still doing some polishing work, and I would need your help to get the best results (I do not have a machine with a nVidia card installed) Could you please provide output of: # dnf list installed | grep nvidia # dnf list installed | grep boinc # lsmod | grep nvidia Thank you |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.