Message boards : BOINC Manager : Random WU's & Hosts Hanging After Upgrade to 5.4.9
Message board moderation
Author | Message |
---|---|
Send message Joined: 6 May 06 Posts: 287 |
I'm noticing a definite random trend (excuse the oxymoron) since upgrading to 5.4.9. I use BOINCView to manage my little farmlet, and since upgrading to 5.4.9 I'm starting to notice with a sort of regularity a number of hosts being highlighted with cpu efficiency of 0 - ie. hung wu. So far the wu's and hosts are random but it's happening enough for me to recognise a trend. Has anyone else noticed this? CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1 |
Send message Joined: 29 Aug 05 Posts: 15561 |
If those hung results are Seti Enhanced results, then yes, that is known. But you're better off discussing that on the Seti NC forum. The hanging of results isn't so much a BOINC issue. BOINC doesn't crunch, it's a managing program. The science applications under BOINC do the crunching. So if anything hangs, it's not BOINC's fault, but more likely the science application. |
Send message Joined: 6 May 06 Posts: 287 |
Just checking BV and its highlighted two wu's that are "running" with 0% cpu effiency. Both wu's are from LHC and both hosts are running linux. One is hung at 100% and the other at 79.84%. CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1 |
Send message Joined: 29 Aug 05 Posts: 15561 |
Just exit BOINC completely and restart it. If that doesn't help, reboot your computer. |
Send message Joined: 6 May 06 Posts: 287 |
That's what I do. Just had another wu on a different host "hang" - this time a FAAH wu from WCG, again a linux host. I started to suspect Pirates as a cause, they are trying to work out why there are errors on linux boxes when the code is compiled in FC4 but works fine when compiled in FC3. I started suspecting Pirates because the two hosts that hung this morning had both just completed "bad" Pirate wu's, but this latest incident doesn't bear that out. I think it is occurring with the work unit handover, not in the midst of processing a wu, ie it's a BOINC manager problem not an individual app problem. I'm also starting to suspect that its a linux problem only, but I can't be sure yet. CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1 |
Send message Joined: 6 May 06 Posts: 287 |
I'm also starting to suspect that its a linux problem only, but I can't be sure yet. It's not just a linux problem. Just noticed a malaria control wu "hang" on a win2k host. CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1 |
Send message Joined: 6 May 06 Posts: 287 |
The problem seems to be at handover. If you look in the tasks pane in BOINC Manager it shows the workunit as running, but it doesn't progress, check the messages pane and you can see the previous workunit is stopped, but there is no corresponding message for the new workunit commencing/recommencing. In many cases, pausing and then resuming the workunit fixes the problem. Also it's not an exit with 0 result status situation, this is different. CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1 |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.