Message boards : Questions and problems : BOINC 6.10.43 - Runs two task on single gpu
Message board moderation
Author | Message |
---|---|
Send message Joined: 1 Oct 09 Posts: 11 ![]() |
Hello I have problem with new boinc 6.10.43 on 32bit Windows XP. I have six nvidia gpus in this box. Problem is that boinc sometimes runs two task on single gpu. Some examples: There are six gpus numbered form 0 to 5. Short after reboot everything works great. Six tasks each running on one gpu. After some time I see that it still runs six tasks but on gpu number 1 are two of them and gpu number 5 is unoccupied. Sometimes it is even worse cause there are two task on gpu number 0, tho of them on gpu number 1 and both number 4 and 5 are free. The second scenario is rare but the first is very frequent. Link to this host on SETI |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
Please enable the following debug flags in cc_config.xml: <cc_config> <log_flags> <coproc_debug>1</coproc_debug> <cpu_sched_debug>1</cpu_sched_debug> </log_flags> </cc_config> Run with those flags on until you hit the problem again, then post the log from just before that it happened to where it happened (approximately). Please do not post the whole log (all 1,000 lines). You can also find this info in the stdoutdae.txt file in BOINC's Data directory. |
Send message Joined: 1 Oct 09 Posts: 11 ![]() |
Ok. I set this options. Now waiting... |
Send message Joined: 1 Oct 09 Posts: 11 ![]() |
Hmm. I got one about 17:30 CEST. But in file I have over 800 lines from just one minute. So maybe it will be better if i give You a link to download this file gzipped? |
Send message Joined: 1 Oct 09 Posts: 11 ![]() |
Log file: http://www.b0b3r.pl/stdoutdae.txt.gz Screenshot from manager: http://www.b0b3r.pl/SETI_two_tasks_on_single_gpu.png |
Send message Joined: 1 Oct 09 Posts: 11 ![]() |
Additionally when I hit "Suspend" on one of this task then it start next task on proper gpu number 5. |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
I've forwarded the thread to the developers. It may be that they need some extra logs with other/extra flags. I'll give those if need be. And with thanks. |
Send message Joined: 1 Oct 09 Posts: 11 ![]() |
I have found that it happen in specific moment. One of this gpus are significantly slower than others about 6 times. And when it finish computation then estimated time for queued task jumps from about 13 minutes to almost 2 hours. I see then a message that boinc think that it can't make queued task in time. And then it do this "silliness". |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
You double-compressed the log file, which confused me for a moment. Yes, it starts with 09-Apr-2010 17:30:11 [SETI@home] Computation for task 21fe07ae.3786.14387.8.10.4_0 finished 09-Apr-2010 17:30:11 [---] [cpu_sched_debug] Request CPU reschedule: handle_finished_apps 09-Apr-2010 17:30:11 [---] [cpu_sched_debug] schedule_cpus(): start 09-Apr-2010 17:30:11 [SETI@home] [cpu_sched_debug] Result 13dc06ae.20481.890.15.10.196_2 projected to miss deadline. ... 09-Apr-2010 17:30:11 [SETI@home] [cpu_sched_debug] Project has 313 projected NVIDIA GPU deadline misses In theory, the allocation is right: 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 0 to 13dc06ae.20481.890.15.10.196_2 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 1 to 12ja07ae.7563.19295.13.10.254_0 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 2 to 12ja07ae.7563.19295.13.10.252_0 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 3 to 12ja07ae.7563.19295.13.10.249_1 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 4 to 12ja07ae.7563.19295.13.10.248_1 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 5 to 12ja07ae.7563.19295.13.10.247_0 but it may have been confused by 09-Apr-2010 17:30:12 [---] [cpu_sched_debug] coproc quit pending, deferring start 09-Apr-2010 17:30:12 [---] [cpu_sched_debug] Request enforce CPU schedule: coproc quit retry Roll on per device client DCF! |
Send message Joined: 23 Apr 07 Posts: 1112 ![]() |
Hiamps has been complaining about this issue on and off for months, but never bothered posting any logs, even when reminded about it. Claggy |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
Sebastian, the developers thank you for the logs, they think the bug is extremely serious. In answer they'll soon (within 24 hours) post a new private BOINC for you to test with. It'll have extra messages for the <coproc_debug> flag and a possible first fix. In the mean time, you can edit your cc_config.xml file to run only with the <coproc_debug> flag (change <cpu_sched_debug>1</cpu_sched_debug> to <cpu_sched_debug>0</cpu_sched_debug>, save file and re-read config file from the Advanced menu), or temporarily disable it until you have the new BOINC if you're worried about all the extra messages. |
![]() ![]() Send message Joined: 27 Jun 08 Posts: 642 ![]() |
I as able to duplicate the problem on vista 64, 6.10.43. I had two 6.08 tasks running and "resumed" collatz. Collatz immediately went to device 0 ![]() I brought up gpuz and the load on both my gts250 and 9800gtx+ are "0" . After about 2 minutes one of the tasks switched to device 1. Currently, both task seem to be making progress but gpuz and msi afterburner both show 0 gpu load. [EDIT] I do not remember if collatz was originally on device 0 when suspended. Perhaps it just took a minute or to before the seti task was switched to 1. ![]() |
![]() ![]() Send message Joined: 27 Jun 08 Posts: 642 ![]() |
Please ignore the stated "gpu load is 0" I posted above. There is a load and the two boards are working, but I am getting 0 for the gpu load which is incorrect. I checked another system (XP and single gts250) and gpuz and msi both show 0 for its gpu load which I know is incorrect. From the test I ran it would appear that it simple takes about 1-2 minutes for one of the gpu's to switch fron device 0 to device 1 after one device is resumed. During that time both collatz and seti seemingly where on device 0. The may not be the same problem as reported in this thread. HTH. [EDIT} I have two collatz tasks supposidly running on device 0 for the last 10 minutes. Both GPUs are running warm so I assume both are being used. By alternately suspending and resuming tasks I was able to get two task stuck on device 0. Since both are crunching and both gpu's are running warm I suspect both are being used although I do see "Device 0" for both. again, HTH. |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
With apologies for the late delivery, some other things were in the way. Sebastian, please check your private messages, you have a personal BOINC to play with. :-) |
Send message Joined: 1 Oct 09 Posts: 11 ![]() |
With apologies for the late delivery, some other things were in the way. Sebastian, please check your private messages, you have a personal BOINC to play with. :-) Ok. I have it installed and running. |
Send message Joined: 1 Oct 09 Posts: 11 ![]() |
Currently everything looks ok. |
Send message Joined: 1 Oct 09 Posts: 11 ![]() |
Still no symptoms. |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
It never happens when you want it to. :-) |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.