Message boards : Questions and problems : BOINC 6.10.43/6.10.44 no longer released for public
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
Rom Walton wrote: Earlier today we pulled the last round of stable clients and rolled back to the stable clients that were available in early December. For Windows: 6.10.43 For Linux: 6.10.44 For Macintosh: 6.10.43 Available from http://boinc.berkeley.edu/download.php Some of the changes for this release are: |
Send message Joined: 6 Apr 10 Posts: 12 ![]() |
* New: Suspend computation of BOINC applications if CPU usage from non-BOINC applications exceeds a volunteer defined value (Defaults to 25%) Hi, Where can i change this setting, i just installed v6.10.44 on Ubuntu Server (no graph interface), and running Einstein@Home (can not find setting on Account pages)? Thnx. |
Send message Joined: 25 Nov 05 Posts: 1654 ![]() |
In the BOINC manager's menu: Advanced -> Preferences _> processor usage |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
In the BOINC manager's menu: Advanced -> Preferences _> processor usage Les, he said no GUI! It's not in boinccmd either. Only solution I can come up with is a global_prefs_override.xml file with: <global_preferences> <cpu_usage_limit>100.000000</cpu_usage_limit> </global_preferences> |
Send message Joined: 25 Nov 05 Posts: 1654 ![]() |
Ooops. Yes. Still, at least he now knows not to look on the project pages. I don't know how people cope without the gui. :( edit I wonder if the devs considered this difficulty for those without gui access to these options, and was anything implemented/planned in place of the gui? |
Send message Joined: 6 Apr 10 Posts: 12 ![]() |
I think the 'global_prefs_override.xml' method will work, but i don't think 'cpu_usage_limit' is the correct parameter... :( I believe this parameter defines the max cpu % what can/may be used? So if devvers can give the correct parameter name, it will make my day ;) |
Send message Joined: 5 Oct 06 Posts: 5149 ![]() |
Beg your pardon, I did copy it from a working installation, but too many of mine are at 100%...... To put in a restriction: <suspend_cpu_usage>97.000000</suspend_cpu_usage> or no restriction at all <suspend_cpu_usage>0.000000</suspend_cpu_usage> |
Send message Joined: 6 Apr 10 Posts: 12 ![]() |
OK, seems to work :) Line "suspend work if non-BOINC CPU load exceeds 25 %" not mentioned anymore. Thnx peeps! ![]() |
![]() Send message Joined: 28 Mar 10 Posts: 1 ![]() |
Thanks for telling me where I needed to make changes for the suspend computation, since that part seems to be awfully twitchy. I'm not doing anything (i.e, I'm asleep), and it flips between working and not working. And then when it IS working, I get Firefox doing something at 25% (100% of one of my cores), and it doesn't hiccup. So I just changed that value to 0. Works just like it used to know. :) |
Send message Joined: 30 Dec 08 Posts: 24 ![]() |
Hi, Thanks for telling me where I needed to make changes for the suspend computation, since that part seems to be awfully twitchy.Same here... At first, I thought it could be an interesting tweak, but on multi-core machines, when doing something "usual" with a browser and boring non-optimized Flash, just ONE core is heavily used, the others aren't. So there's plainty other cores to play with for Boinc. So I just changed that value to 0. Works just like it used to know.All same! I don't get it anyway. Since Boinc is linked to the idle CPU time, why the need of this supplementary parameter? Multi-cores or not, if my system needs CPU, Boinc just cooldown naturally and progressively following idle time available. So, I would rather had put this parameter for the GPU. My kids are complaining about GPU WU which interfere with their greedy GPU ressources games! But most of the time, CPU cores aren't much busy. So, on modern machines with multi-cores, Boinc has always a bit of ressources available to crunch. Curious to see how all of this will evolve when everything will be OpenCL compliant!!! Hope you will not transform us, final users, into gurus of the tweaking parameters... |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
I don't get it anyway. As I wrote here: Please, think outside your own box. This feature isn't for everyone who has been using BOINC for ages and are running it 24/7 without looking at it much. It's built in for completely new people, people who were complaining that despite BOINC's applications running on the lowest possible priority, it taking up CPU cycles that would slow down their computer. These people would complain about that, uninstall, leave and tell other potential crunchers negative things about BOINC. It's added to help those people, for if they come back, to see that even they are listened to. You can easily disable the function by setting its value to zero. So, I would rather had put this parameter for the GPU. All GPU applications still run on the CPU. There are no applications that run on the GPU only, as there is no operating system that knows how to do that. All science applications geared towards the GPUs will still need the CPU for the execution of the application, and to do the necessary translation of whatever task there is to be done on the GPU, from the binary data to kernels that the GPU understands, to transfer that data to the GPU's memory and when the GPU is done with it, to transfer it back and store it on the disk. None of that can be done by the GPU itself. You can use the <exclusive_app> and <exclusive_gpu_app> functions of BOINC for suspending BOINC when it detects any of the games entering Windows memory. See my GPU FAQ for more on that. |
Send message Joined: 30 Dec 08 Posts: 24 ![]() |
Hi Ageless, as always: precious and pertinent arguments delivered. Please, think outside your own box. It's built in for ... people who were complaining that despite BOINC's applications running on the lowest possible priority, it taking up CPU cycles that would slow down their computer.Effectively, this is what some have opposed to me when I tried (years ago) to deploy CNET on the whole computer parc! And now, outside of my box, I get the point. Now that I see the potential of this parameter, I still think that it needs refinements. I insist: when just ONE core is heavily used, the others aren't always used too. Boinc could stop cores progressively by monitoring if there's still an heavy CPU usage after/during a given time. When the CPU stays at 25% during more than 2 minutes on a multi-cores, it could be 'nice' (arf) to stop one core for Boinc, but not ALL cores at the same time. We loose valuable cycles available on other cores. This is particulary true under Unix kernels where there's a fine repartition of the load. You can use the <exclusive_app> and <exclusive_gpu_app> functions of BOINC for suspending BOINC when it detects any of the games entering Windows memory.Another interesting parameters that I didn't know about. But it has always been my point to stay a "simple" user of Boinc! By not becoming a tuning "guru", I try to have a "simple" vision of what Boinc should do to perform quietly without disturbing the user. Regards |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
I insist: when just ONE core is heavily used, the others aren't always used too. At this moment BOINC doesn't know which processor/core is doing what. For instance you have a 4 core CPU, it runs CPDN, Einstein, Seti and Leiden at the same time. A non-BOINC program is taking up (part of) one of the cores. You want only that core to suspend its work and continue work on the other cores? What if the non-BOINC program takes up more than one core? (Windows Update will do that!) Again, that's not what this preference is for. It's really not for people who have been using the program for a while already, it's for those that are new, that find that their system is slow when they run BOINC at the same time as other (heavy) programs and they don't want to know about the <exclusive_apps> tags as less as you do. Besides, those that have been running with the program for some time now will then complain that their one core doing very important work is suspended. It's always very important work, isn't it? ;-) You know how to disable it. And seeing how something like scanning AV programs will break this function, it'll go through some function change in the future. :-) |
Send message Joined: 19 Apr 10 Posts: 4 |
I am trying to understand the scheduling behaviour of the Linux 6.10.44 release with GPUs, because I am seeing some strange things I can't explain. My client, from time to time, sits with a full task queue, not running any task (this is using the Milkyway@home cuda application) for anything up to 45 minutes at a time. Under other circumstances, a task will sit for days in the task queue and never start - even to the point that when the project went down for maintenance for 24 hours and every other task was completed, a five day old task was still stuck in the queue and never started, nor ran. My machine looks like this (Ubuntu 9.14 with the 195.36.15 release drivers): Fri 16 Apr 2010 09:01:44 PM EEST Starting BOINC client version 6.10.44 for x86_64-pc-linux-gnu Fri 16 Apr 2010 09:01:44 PM EEST log flags: file_xfer, sched_ops, task Fri 16 Apr 2010 09:01:44 PM EEST Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 c-ares/1.5.1 Fri 16 Apr 2010 09:01:44 PM EEST Data directory: /home/david/BOINC Fri 16 Apr 2010 09:01:44 PM EEST Processor: 4 AuthenticAMD AMD Phenom(tm) II X4 945 Processor [Family 16 Model 4 Stepping 2] Fri 16 Apr 2010 09:01:44 PM EEST Processor: 512.00 KB cache Fri 16 Apr 2010 09:01:44 PM EEST Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni monitor cx16 lahf_lm cmp_legacy svm extapic cr8_ Fri 16 Apr 2010 09:01:44 PM EEST OS: Linux: 2.6.28-18-generic Fri 16 Apr 2010 09:01:44 PM EEST Memory: 7.70 GB physical, 7.45 GB virtual Fri 16 Apr 2010 09:01:44 PM EEST Disk: 891.22 GB total, 757.31 GB free Fri 16 Apr 2010 09:01:44 PM EEST Local time is UTC +3 hours Fri 16 Apr 2010 09:01:44 PM EEST NVIDIA GPU 0: GeForce GTX 275 (driver version unknown, CUDA version 3000, compute capability 1.3, 895MB, 701 GFLOPS peak) Fri 16 Apr 2010 09:01:44 PM EEST NVIDIA GPU 1: GeForce GTX 275 (driver version unknown, CUDA version 3000, compute capability 1.3, 896MB, 701 GFLOPS peak) Fri 16 Apr 2010 09:01:44 PM EEST Milkyway@home URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 167065; resource share 100 Fri 16 Apr 2010 09:01:44 PM EEST Milkyway@home General prefs: from Milkyway@home (last modified 08-Apr-2010 17:50:24) Fri 16 Apr 2010 09:01:44 PM EEST Milkyway@home Host location: none Fri 16 Apr 2010 09:01:44 PM EEST Milkyway@home General prefs: using your defaults Fri 16 Apr 2010 09:01:44 PM EEST Reading preferences override file Fri 16 Apr 2010 09:01:44 PM EEST Preferences: Fri 16 Apr 2010 09:01:44 PM EEST max memory usage when active: 3943.92MB Fri 16 Apr 2010 09:01:44 PM EEST max memory usage when idle: 7099.06MB Fri 16 Apr 2010 09:01:44 PM EEST max disk usage: 10.00GB Fri 16 Apr 2010 09:01:44 PM EEST max CPUs used: 1 Fri 16 Apr 2010 09:01:44 PM EEST (to change, visit the web site of an attached project, Fri 16 Apr 2010 09:01:44 PM EEST or click on Preferences) Fri 16 Apr 2010 09:01:44 PM EEST Not using a proxy with one gpu marked compute compute prohibited and the other marked compute exclusive. I have "Compute while computer is in use" and "Use GPU while computer is in use" selected in the manager, and most of the time, it works fine. A typical example of the problem looks something like this: Mon 19 Apr 2010 11:45:37 AM EEST Milkyway@home Computation for task de_new_test2_29033_1271663028_0 finished Mon 19 Apr 2010 11:45:39 AM EEST Milkyway@home Started upload of de_new_test2_29033_1271663028_0_0 Mon 19 Apr 2010 11:45:42 AM EEST Milkyway@home Finished upload of de_new_test2_29033_1271663028_0_0 Mon 19 Apr 2010 11:46:17 AM EEST Milkyway@home Sending scheduler request: To fetch work. Mon 19 Apr 2010 11:46:17 AM EEST Milkyway@home Reporting 1 completed tasks, requesting new tasks for GPU Mon 19 Apr 2010 11:46:22 AM EEST Milkyway@home Scheduler request completed: got 1 new tasks Mon 19 Apr 2010 11:46:24 AM EEST Milkyway@home Started download of de_new_test2_46499_1271666646_search_parameters Mon 19 Apr 2010 11:46:27 AM EEST Milkyway@home Finished download of de_new_test2_46499_1271666646_search_parameters Mon 19 Apr 2010 11:47:28 AM EEST Milkyway@home Sending scheduler request: To fetch work. Mon 19 Apr 2010 11:47:28 AM EEST Milkyway@home Requesting new tasks for GPU Mon 19 Apr 2010 11:47:33 AM EEST Milkyway@home Scheduler request completed: got 0 new tasks Mon 19 Apr 2010 11:47:33 AM EEST Milkyway@home Message from server: No work sent Mon 19 Apr 2010 11:47:33 AM EEST Milkyway@home Message from server: (reached limit of 6 tasks in progress) Mon 19 Apr 2010 11:48:38 AM EEST Milkyway@home Sending scheduler request: To fetch work. Mon 19 Apr 2010 11:49:48 AM EEST Milkyway@home Sending scheduler request: To fetch work. Mon 19 Apr 2010 11:49:48 AM EEST Milkyway@home Requesting new tasks for GPU Mon 19 Apr 2010 11:49:53 AM EEST Milkyway@home Scheduler request completed: got 0 new tasks Mon 19 Apr 2010 11:49:53 AM EEST Milkyway@home Message from server: No work sent Mon 19 Apr 2010 11:49:53 AM EEST Milkyway@home Message from server: (reached limit of 6 tasks in progress) Mon 19 Apr 2010 11:50:58 AM EEST Milkyway@home Sending scheduler request: To fetch work. Mon 19 Apr 2010 11:50:58 AM EEST Milkyway@home Requesting new tasks for GPU Mon 19 Apr 2010 11:51:03 AM EEST Milkyway@home Scheduler request completed: got 0 new tasks Mon 19 Apr 2010 11:51:03 AM EEST Milkyway@home Message from server: No work sent Mon 19 Apr 2010 11:51:03 AM EEST Milkyway@home Message from server: (reached limit of 6 tasks in progress) Mon 19 Apr 2010 11:52:08 AM EEST Milkyway@home Sending scheduler request: To fetch work. Mon 19 Apr 2010 11:52:08 AM EEST Milkyway@home Requesting new tasks for GPU Mon 19 Apr 2010 11:52:13 AM EEST Milkyway@home Scheduler request completed: got 0 new tasks Mon 19 Apr 2010 11:52:13 AM EEST Milkyway@home Message from server: No work sent Mon 19 Apr 2010 11:52:13 AM EEST Milkyway@home Message from server: (reached limit of 6 tasks in progress) Mon 19 Apr 2010 11:52:22 AM EEST Milkyway@home Starting de_new_test2_18284_1271660211_2 Mon 19 Apr 2010 11:52:22 AM EEST Milkyway@home Starting task de_new_test2_18284_1271660211_2 using milkyway version 24 Here a task finishes and reports, with 5 other tasks in the queue. A new task is requested and downloaded from the scheduler, so that the task queue is full at 6 tasks, then nothing happens. The machine sits idle for several minutes, periodically polling for new work (and getting nothing because it has a full task queue), but no task ever starts. Then finally something happens. This is a small example, but I have observed these "fallow" periods persist for 45 minutes in a couple of cases. The question is why? Does your project have a source repository somewhere I could browse? I have a suspicion about what might be happening [it might be the client is mishandling or misinterpreting the Linux driver compute settings], but looking at your CUDA interface code would certainly be helpful. Thanks in advance. |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
Does your project have a source repository somewhere I could browse? I have a suspicion about what might be happening [it might be the client is mishandling or misinterpreting the Linux driver compute settings], but looking at your CUDA interface code would certainly be helpful. BOINC isn't a project, while why the Milkyway scheduler may or may not give you work is something you have to take up with them. It's their server that says that no work is sent, with the reason given (their maximum of 6 tasks per queue). But if you want to look at the BOINC source code, that's possible. Check http://boinc.berkeley.edu/trac/browser/branches/boinc_core_release_6_10 for the 6.10 code. |
Send message Joined: 19 Apr 10 Posts: 4 |
I understand that, but my question is why, when the client has work, it doesn't run it? The task start/stop/report logic is in the client, not the project server, isn't it? I am working on the assumption that as long as the client's own internal settings permit it, it will just start and run tasks until the task queue is empty. I am seeing long pauses between the client starting tasks which I am assuming should not occur. But if you want to look at the BOINC source code, that's possible. Check http://boinc.berkeley.edu/trac/browser/branches/boinc_core_release_6_10 for the 6.10 code. Thank you for the link |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
..but my question is why, when the client has work, it doesn't run it? Set up a cc_config.xml file and add into it these flags: <cc_config> <log_flags> <cpu_sched>1</cpu_sched> <cpu_sched_debug>1</cpu_sched_debug> <coproc_debug>1</coproc_debug> <rr_simulation>1</rr_simulation> <task_debug>1</task_debug> </log_flags> <options> <max_stdout_file_size>18388608</max_stdout_file_size> </options> <cc_config> Using these flags will fill up your stdoutdae.txt log quite quickly, so it may be prudent to increase its size it can use. I have therefore put in that your stdoutdae.txt file may become 18MB. If you want to change it, the value put in must be in bytes, where 1MB = 1024 * 1024 bytes. Forgot something... :-) Since that log will be quite extensive, please do not post it in the forums. Or at least not in this thread... please email it to me, I'll send you my email address in a private message. |
Send message Joined: 19 Apr 10 Posts: 4 |
Indeed it is very verbose (your xml was a bit broken btw, but the schema is pretty straight forward). I have a little snippet which explains both problems I see. A task finishes and the machine is idle. The scheduler runs: 19-Apr-2010 14:40:36 [---] [rr_sim] rr_sim start: work_buf_total 30240.00 on_frac 0.961 active_frac 0.993 19-Apr-2010 14:41:26 [---] [cpu_sched_debug] enforce_schedule(): start 19-Apr-2010 14:41:26 [---] [cpu_sched_debug] preliminary job list: 19-Apr-2010 14:41:26 [---] [cpu_sched_debug] final job list: 19-Apr-2010 14:41:26 [---] [cpu_sched_debug] using 0.00 out of 1 CPUs 19-Apr-2010 14:41:26 [---] [cpu_sched_debug] enforce_schedule: end 19-Apr-2010 14:41:26 [---] [rr_sim] rr_sim start: work_buf_total 30240.00 on_frac 0.961 active_frac 0.993 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 0.00: starting de_new_test2_81566_1271673925_0 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 0.00: starting de_new_test2_44636_1271666330_2 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 0.00: de_new_test2_81566_1271673925_0 finishes after 720.80 (97353.46G/135.06G) 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 720.80: starting de_new_test2_76466_1271672823_1 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 720.80: de_new_test2_44636_1271666330_2 finishes after 0.00 (0.00G/135.06G) 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 720.80: starting de_new_test2_80758_1271673719_1 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 720.80: de_new_test2_76466_1271672823_1 finishes after 720.80 (97353.46G/135.06G) 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 1441.61: starting de_new_test2_59278_1271669297_1 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 1441.61: de_new_test2_80758_1271673719_1 finishes after 0.00 (0.00G/135.06G) 19-Apr-2010 14:41:26 [Milkyway@home] [rr_sim] 1441.61: de_new_test2_59278_1271669297_1 finishes after 720.80 (97353.46G/135.06G) and nothing happens. It reruns the same way at 30 second intervals for 6 minutes, with the machine idle and then: 19-Apr-2010 14:47:30 [---] [cpu_sched_debug] Request CPU reschedule: Idle state change 19-Apr-2010 14:47:30 [---] [cpu_sched_debug] schedule_cpus(): start 19-Apr-2010 14:47:30 [---] [rr_sim] rr_sim start: work_buf_total 30240.00 on_frac 0.961 active_frac 0.993 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 0.00: starting de_new_test2_81566_1271673925_0 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 0.00: starting de_new_test2_44636_1271666330_2 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 0.00: de_new_test2_81566_1271673925_0 finishes after 720.79 (97353.46G/135.06G) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 720.79: starting de_new_test2_76466_1271672823_1 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 720.79: de_new_test2_44636_1271666330_2 finishes after 0.00 (0.00G/135.06G) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 720.79: starting de_new_test2_80758_1271673719_1 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 720.79: de_new_test2_76466_1271672823_1 finishes after 720.79 (97353.46G/135.06G) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 1441.58: starting de_new_test2_59278_1271669297_1 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 1441.58: de_new_test2_80758_1271673719_1 finishes after 0.00 (0.00G/135.06G) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 1441.58: starting de_new_test2_73287_1271672226_2 (0.05 CPU + 1.00 NV) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 1441.58: de_new_test2_59278_1271669297_1 finishes after 720.79 (97353.46G/135.06G) 19-Apr-2010 14:47:30 [Milkyway@home] [rr_sim] 2162.37: de_new_test2_73287_1271672226_2 finishes after 0.00 (0.00G/135.06G) 19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] scheduling de_new_test2_81566_1271673925_0 (coprocessor job, FIFO) 19-Apr-2010 14:47:30 [---] [cpu_sched_debug] reserving 1.000000 of coproc CUDA 19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] scheduling de_new_test2_44636_1271666330_2 (coprocessor job, FIFO) 19-Apr-2010 14:47:30 [---] [cpu_sched_debug] reserving 1.000000 of coproc CUDA 19-Apr-2010 14:47:30 [---] [cpu_sched_debug] Request enforce CPU schedule: schedule_cpus 19-Apr-2010 14:47:30 [---] [cpu_sched_debug] enforce_schedule(): start 19-Apr-2010 14:47:30 [---] [cpu_sched_debug] preliminary job list: 19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] 0: de_new_test2_81566_1271673925_0 (MD: no; UTS: no) 19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] 1: de_new_test2_44636_1271666330_2 (MD: no; UTS: no) 19-Apr-2010 14:47:30 [---] [cpu_sched_debug] final job list: 19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] 0: de_new_test2_81566_1271673925_0 (MD: no; UTS: no) 19-Apr-2010 14:47:30 [Milkyway@home] [cpu_sched_debug] 1: de_new_test2_44636_1271666330_2 (MD: no; UTS: no) 19-Apr-2010 14:47:30 [Milkyway@home] [coproc_debug] Assigning CUDA instance 0 to de_new_test2_81566_1271673925_0 19-Apr-2010 14:47:30 [Milkyway@home] [coproc_debug] Assigning CUDA instance 1 to de_new_test2_44636_1271666330_2 19-Apr-2010 14:47:30 [Milkyway@home] Can't get available GPU RAM: 999 19-Apr-2010 14:47:30 [---] [cpu_sched_debug] Request CPU reschedule: insufficient GPU RAM So it is something in the machine idle logic which is stopping the jobs from being launched, and then, as I thought, it is the compute mode settings which are problematic after that (I do a lot of CUDA development, and I can help you fix that if you want). The first GPU is marked as compute prohibited by the driver, but the boinc scheduler is trying to use it anyway. The job it tries to schedule on the compute prohibited device then gets stuck on the job queue, even though it was never started. We can continue this by email/pm if you like.. |
![]() Send message Joined: 29 Aug 05 Posts: 15626 ![]() |
Indeed it is very verbose (your xml was a bit broken btw, but the schema is pretty straight forward). I noticed. Fixed that, for a future use. I shouldn't be doing 3 things at the same time. :-) 19-Apr-2010 14:47:30 [Milkyway@home] Can't get available GPU RAM: 999 There is a fix for this already, but it won't come until the next BOINC version. It's also comprised of checking available memory on the GPUs before they get work, not as it is now give work, then check memory. Send me the full log anyway and I'll forward it to the developers, just in case there's something else going on as well. Do know, I am not a developer, just a volunteer like you. I just have close contacts with the developers. :-) |
Send message Joined: 19 Apr 10 Posts: 4 |
There is a fix for this already, but it won't come until the next BOINC version. It's also comprised of checking available memory on the GPUs before they get work, not as it is now give work, then check memory. That will fix the symptom, so that jobs won't get wrongly put into an infinite "check every 5 minutes for enough free memory" loop, but not the root cause of the problem, which is actually the act checking the free memory itself. I will email you the log and some other information that the developers should probably look at. Thanks for your help. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.