Message boards : Questions and problems : "Phantom" GPU devices showing up in 7.16.3 and 441.66 again
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 27 Jun 08 Posts: 642 ![]() |
I know how it happened and what can be done to fix it but not why. How: Had to replace blower fan on one of two boards on my office desktop, long story, but ended up with the two boards back in but the slots were reversed. Installed 441 after Microsoft put in 3xx as it seems reversing the PCIe slots confuses windows. Boinc showed 2 CUDA and 4 OpenCL devices with the pair of extra "phantom" GPU's attempting to crunch. Revo Uninstaller, clean install of 441 did not solve the problem. The Revo showed a mix of 339 and 441 but the clean install should have worked. Looked at the coproc_info xml file header cuda0 cuda1 opencl num,index OCLnv0 ===> 0,0 OCLnv1 ===> 0,0 OCLnv2 ===> 1,1 OCLnv3 ===> 1,1 C:\Users\josep\Desktop\debug coproc>fc OCLnv0.txt OCLnv1.txt Comparing files OCLnv0.txt and OCLnv1.TXT FC: no differences encountered C:\Users\josep\Desktop\debug coproc>fc OCLnv2.txt OCLnv3.txt Comparing files OCLnv2.txt and OCLnv3.TXT FC: no differences encountered C:\Users\josep\Desktop\debug coproc>fc OCLnv1.txt OCLnv3.txt Comparing files OCLnv1.txt and OCLnv3.TXT ***** OCLnv1.txt <opencl_driver_version>441.66</opencl_driver_version> <device_num>0</device_num> <peak_flops>8186112000000.000000</peak_flops> ***** OCLnv3.TXT <opencl_driver_version>441.66</opencl_driver_version> <device_num>1</device_num> <peak_flops>8186112000000.000000</peak_flops> ***** ***** OCLnv1.txt <opencl_available_ram>3726508031.000000</opencl_available_ram> <opencl_device_index>0</opencl_device_index> <warn_bad_cuda>0</warn_bad_cuda> ***** OCLnv3.TXT <opencl_available_ram>3726508031.000000</opencl_available_ram> <opencl_device_index>1</opencl_device_index> <warn_bad_cuda>0</warn_bad_cuda> ***** The gpu detect program wrote out duplicate entries for the same GPU. My fix was to delete the OCnv1 and OCnv3 and set the attributes of the coproc_info.xml file to read only. Suggestion: The program that writes out that file should check for duplicates. Alternately, the program that reads it in should do a check. other thoughts: clean uninstall should have worked. possibly I should have disconnected the ethernet to prevent windows from re-downloading the same 339 (?) driver. I was instructed to reboot several times to removed 441 and 339 stuff. Since I was busy with replacing the fan I may not have responded in time to continue the uninstall. |
![]() ![]() Send message Joined: 27 Jun 08 Posts: 642 ![]() |
Went back to feb 2019 and got the AMD RX-570 zipped coproc_info that I had provided earlier in the year when the problem first arose.. There is a difference, although both coproc info files have an extra pair of GPUs, the arrangement is not the same as nvidia. In this case I deleted the last two sections before making the file read-only. device_num, device_index OCLati0 0 0 OCLati1 1 1 OCLati2 2 0 OCLati3 3 1 C:\Users\josep\Desktop\debug coproc>fc OCLat0.txt OCLat1.txt Comparing files OCLat0.txt and OCLAT1.TXT ***** OCLat0.txt <opencl_driver_version>2766.5</opencl_driver_version> <device_num>0</device_num> <peak_flops>5095424000000.000000</peak_flops> ***** OCLAT1.TXT <opencl_driver_version>2766.5</opencl_driver_version> <device_num>1</device_num> <peak_flops>5095424000000.000000</peak_flops> ***** ***** OCLat0.txt <opencl_available_ram>4294967296.000000</opencl_available_ram> <opencl_device_index>0</opencl_device_index> <warn_bad_cuda>0</warn_bad_cuda> ***** OCLAT1.TXT <opencl_available_ram>4294967296.000000</opencl_available_ram> <opencl_device_index>1</opencl_device_index> <warn_bad_cuda>0</warn_bad_cuda> ***** The nvidia coprioc info lists 2 CUDA devices so if more than 2 OpenCL device then a clue there is a problem. There is no count of actual cards nor do any of the OpenCL have duplicate sections so the ATI problem I harder to solve if just analyzing the file. |
Send message Joined: 25 May 09 Posts: 1325 ![]() |
Given that the nVidia driver version 441.66 is known to have some issues when performing calculations you would do well to not be using it, despite having had it forced on you by MS (How often do we have to say "Do NOT allow MS to update your drivers for you, but ALWAYS get the drivers from nVidia"? (The highest known "fault free" version is 431.xx) That apart, what is the hardware configuration - is is a single GTX 1070ti?, or something else? I suspect that, like many other similar routines the one in question, simply finds the file it wants ad instead of over-writing it just appends the data into the appropriate sections. |
![]() Send message Joined: 29 Aug 05 Posts: 15625 ![]() |
The Revo showed a mix of 339 and 441 but the clean install should have worked.If an uninstaller finds multiple drivers, that's probably why BOINC finds these GPUs as well. Trouble with all drivers is that names change and their folders may change as well. In so much that the uninstaller of the new installer doesn't know per se where it has to clean all the old stuff of the previous installation when doing a clean install. Has happened quite some times. So much so even that at times people were forced to do a clean installation of Windows to get rid of previous driver remnants that didn't want to move. |
![]() ![]() Send message Joined: 27 Jun 08 Posts: 642 ![]() |
Ran some more tests after talking with Dell and it turned out the fan was not the problem. The NVidia board is running the fan at %100 which is ruining my hearing as well as the fan. Just removed the "read only" coproc file and started boinc and it wrote out a good coproc_info.xml file that actually matched the one I had edited. The board arrangement is the same. Maybe it needed another reboot for the "cleaner" to work. Turned out the "basic" warranty (have 40 days left) covers the video board so they wanted proof so I took a lot of pictures. GPUz was helpful as it showed 5000 rpm and "no load" on the bad board and 1100 rpm on the good one also at no load. It also shows the history which is as good as a video. I think an issue should be brought up about that coproc_info file. The detect GPU should never write out identical GPUs as the same address. If boinc has no control over the program doing the writing (which I suspect) then for sure when the client reads in the info file to see what is there it should ignore duplicates at the same bus address. Unfortunately, the ATI behavior is different. https://stateson.net/images/coproc_normal.png |
Send message Joined: 9 Mar 20 Posts: 5 ![]() |
Hi, I had a similar problem, but I lost 2 of my 4 GPUs. In my case the key was to edit coproc_info.xml file and double gpu entry. Thanks! |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.