Thread 'BOINC 6.10.43 - Runs two task on single gpu'

Author	Message
Sebastian Bobrecki Send message Joined: 1 Oct 09 Posts: 11	Message 32054 - Posted: 9 Apr 2010, 10:34:36 UTC Hello I have problem with new boinc 6.10.43 on 32bit Windows XP. I have six nvidia gpus in this box. Problem is that boinc sometimes runs two task on single gpu. Some examples: There are six gpus numbered form 0 to 5. Short after reboot everything works great. Six tasks each running on one gpu. After some time I see that it still runs six tasks but on gpu number 1 are two of them and gpu number 5 is unoccupied. Sometimes it is even worse cause there are two task on gpu number 0, tho of them on gpu number 1 and both number 4 and 5 are free. The second scenario is rare but the first is very frequent. Link to this host on SETI ID: 32054 ·

Jord Volunteer moderator Help desk expert Send message Joined: 29 Aug 05 Posts: 15984	Message 32056 - Posted: 9 Apr 2010, 10:59:35 UTC - in response to Message 32054. Please enable the following debug flags in cc_config.xml: <cc_config> <log_flags> <coproc_debug>1</coproc_debug> <cpu_sched_debug>1</cpu_sched_debug> </log_flags> </cc_config> Run with those flags on until you hit the problem again, then post the log from just before that it happened to where it happened (approximately). Please do not post the whole log (all 1,000 lines). You can also find this info in the stdoutdae.txt file in BOINC's Data directory. ID: 32056 ·

Sebastian Bobrecki Send message Joined: 1 Oct 09 Posts: 11	Message 32061 - Posted: 9 Apr 2010, 14:06:04 UTC Ok. I set this options. Now waiting... ID: 32061 ·

Sebastian Bobrecki Send message Joined: 1 Oct 09 Posts: 11	Message 32067 - Posted: 9 Apr 2010, 15:59:30 UTC Hmm. I got one about 17:30 CEST. But in file I have over 800 lines from just one minute. So maybe it will be better if i give You a link to download this file gzipped? ID: 32067 ·

Sebastian Bobrecki Send message Joined: 1 Oct 09 Posts: 11	Message 32068 - Posted: 9 Apr 2010, 16:06:29 UTC Log file: http://www.b0b3r.pl/stdoutdae.txt.gz Screenshot from manager: http://www.b0b3r.pl/SETI_two_tasks_on_single_gpu.png ID: 32068 ·

Sebastian Bobrecki Send message Joined: 1 Oct 09 Posts: 11	Message 32069 - Posted: 9 Apr 2010, 16:10:41 UTC Additionally when I hit "Suspend" on one of this task then it start next task on proper gpu number 5. ID: 32069 ·

Jord Volunteer moderator Help desk expert Send message Joined: 29 Aug 05 Posts: 15984	Message 32073 - Posted: 9 Apr 2010, 16:51:56 UTC I've forwarded the thread to the developers. It may be that they need some extra logs with other/extra flags. I'll give those if need be. And with thanks. ID: 32073 ·

Sebastian Bobrecki Send message Joined: 1 Oct 09 Posts: 11	Message 32077 - Posted: 9 Apr 2010, 17:08:55 UTC I have found that it happen in specific moment. One of this gpus are significantly slower than others about 6 times. And when it finish computation then estimated time for queued task jumps from about 13 minutes to almost 2 hours. I see then a message that boinc think that it can't make queued task in time. And then it do this "silliness". ID: 32077 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5173	Message 32078 - Posted: 9 Apr 2010, 17:29:07 UTC You double-compressed the log file, which confused me for a moment. Yes, it starts with 09-Apr-2010 17:30:11 [SETI@home] Computation for task 21fe07ae.3786.14387.8.10.4_0 finished 09-Apr-2010 17:30:11 [---] [cpu_sched_debug] Request CPU reschedule: handle_finished_apps 09-Apr-2010 17:30:11 [---] [cpu_sched_debug] schedule_cpus(): start 09-Apr-2010 17:30:11 [SETI@home] [cpu_sched_debug] Result 13dc06ae.20481.890.15.10.196_2 projected to miss deadline. ... 09-Apr-2010 17:30:11 [SETI@home] [cpu_sched_debug] Project has 313 projected NVIDIA GPU deadline misses In theory, the allocation is right: 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 0 to 13dc06ae.20481.890.15.10.196_2 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 1 to 12ja07ae.7563.19295.13.10.254_0 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 2 to 12ja07ae.7563.19295.13.10.252_0 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 3 to 12ja07ae.7563.19295.13.10.249_1 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 4 to 12ja07ae.7563.19295.13.10.248_1 09-Apr-2010 17:30:11 [SETI@home] [coproc_debug] Assigning CUDA instance 5 to 12ja07ae.7563.19295.13.10.247_0 but it may have been confused by 09-Apr-2010 17:30:12 [---] [cpu_sched_debug] coproc quit pending, deferring start 09-Apr-2010 17:30:12 [---] [cpu_sched_debug] Request enforce CPU schedule: coproc quit retry Roll on per device client DCF! ID: 32078 ·

Claggy Send message Joined: 23 Apr 07 Posts: 1112	Message 32079 - Posted: 9 Apr 2010, 17:37:07 UTC - in response to Message 32078. Hiamps has been complaining about this issue on and off for months, but never bothered posting any logs, even when reminded about it. Claggy ID: 32079 ·

Jord Volunteer moderator Help desk expert Send message Joined: 29 Aug 05 Posts: 15984	Message 32085 - Posted: 9 Apr 2010, 21:36:39 UTC Sebastian, the developers thank you for the logs, they think the bug is extremely serious. In answer they'll soon (within 24 hours) post a new private BOINC for you to test with. It'll have extra messages for the <coproc_debug> flag and a possible first fix. In the mean time, you can edit your cc_config.xml file to run only with the <coproc_debug> flag (change <cpu_sched_debug>1</cpu_sched_debug> to <cpu_sched_debug>0</cpu_sched_debug>, save file and re-read config file from the Advanced menu), or temporarily disable it until you have the new BOINC if you're worried about all the extra messages. ID: 32085 ·

Joseph Stateson Volunteer tester Send message Joined: 27 Jun 08 Posts: 642	Message 32087 - Posted: 10 Apr 2010, 1:39:54 UTC Last modified: 10 Apr 2010, 1:59:46 UTC I as able to duplicate the problem on vista 64, 6.10.43. I had two 6.08 tasks running and "resumed" collatz. Collatz immediately went to device 0 I brought up gpuz and the load on both my gts250 and 9800gtx+ are "0" . After about 2 minutes one of the tasks switched to device 1. Currently, both task seem to be making progress but gpuz and msi afterburner both show 0 gpu load. [EDIT] I do not remember if collatz was originally on device 0 when suspended. Perhaps it just took a minute or to before the seti task was switched to 1. ID: 32087 ·

Joseph Stateson Volunteer tester Send message Joined: 27 Jun 08 Posts: 642	Message 32089 - Posted: 10 Apr 2010, 3:03:33 UTC Last modified: 10 Apr 2010, 3:30:24 UTC Please ignore the stated "gpu load is 0" I posted above. There is a load and the two boards are working, but I am getting 0 for the gpu load which is incorrect. I checked another system (XP and single gts250) and gpuz and msi both show 0 for its gpu load which I know is incorrect. From the test I ran it would appear that it simple takes about 1-2 minutes for one of the gpu's to switch fron device 0 to device 1 after one device is resumed. During that time both collatz and seti seemingly where on device 0. The may not be the same problem as reported in this thread. HTH. [EDIT} I have two collatz tasks supposidly running on device 0 for the last 10 minutes. Both GPUs are running warm so I assume both are being used. By alternately suspending and resuming tasks I was able to get two task stuck on device 0. Since both are crunching and both gpu's are running warm I suspect both are being used although I do see "Device 0" for both. again, HTH. ID: 32089 ·

Jord Volunteer moderator Help desk expert Send message Joined: 29 Aug 05 Posts: 15984	Message 32137 - Posted: 12 Apr 2010, 17:29:53 UTC With apologies for the late delivery, some other things were in the way. Sebastian, please check your private messages, you have a personal BOINC to play with. :-) ID: 32137 ·

Sebastian Bobrecki Send message Joined: 1 Oct 09 Posts: 11	Message 32168 - Posted: 13 Apr 2010, 15:12:21 UTC - in response to Message 32137. With apologies for the late delivery, some other things were in the way. Sebastian, please check your private messages, you have a personal BOINC to play with. :-) Ok. I have it installed and running. ID: 32168 ·

Sebastian Bobrecki Send message Joined: 1 Oct 09 Posts: 11	Message 32175 - Posted: 13 Apr 2010, 19:37:54 UTC - in response to Message 32168. Currently everything looks ok. ID: 32175 ·

Sebastian Bobrecki Send message Joined: 1 Oct 09 Posts: 11	Message 32203 - Posted: 15 Apr 2010, 14:35:56 UTC - in response to Message 32175. Still no symptoms. ID: 32203 ·

Jord Volunteer moderator Help desk expert Send message Joined: 29 Aug 05 Posts: 15984	Message 32204 - Posted: 15 Apr 2010, 14:47:06 UTC - in response to Message 32203. It never happens when you want it to. :-) ID: 32204 ·

Copyright © 2026 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.