Message boards : Questions and problems : Projects Running Past Deadline
Message board moderation
Author | Message |
---|---|
Send message Joined: 4 Jul 08 Posts: 82 |
My BOINC manager got into a situation where two projects were running past their deadlines. Is this the way it's supposed to work? Is all that work for naught? If so, why did the manager start/resume work on those WUs? I have since suspended those WUs. I'm wondering now if I should abort them or let them finish. Thanks, Mark |
Send message Joined: 29 Aug 05 Posts: 15549 |
Normally BOINC will try to get all tasks in by the deadline. It won't be able to do this if you constantly tell it to run which projects, or allow projects to fetch work while it really didn't want to get any as it deemed your work load to be too much already. So, how many projects and which are you attached to? How many CPUs/cores do you have? Which tasks are running over their deadline? How much time is still estimated on them, can you get them in before the 3rd person does? For if you can, you can still get credit for them. BOINC won't automatically kill tasks that go over the deadline as there are projects out there that don't care much about the deadline. CPDN and QCN Alpha come to mind. You can keep on running those models till well after the deadline and in the case of CPDN still get credit. QCN doesn't do credits (yet). |
Send message Joined: 4 Jul 08 Posts: 82 |
So, how many projects and which are you attached to? On the host in question, I'm running 12 projects and attached to all, although I don't have work for one of them. How many CPUs/cores do you have? The host has one T7200 2.0 GHz Core 2 Duo Which tasks are running over their deadline? How much time is still estimated on them? Only tasks from Milkway@home are over their deadline. They had been running for about 7 hours with an esitmated 8 hours to go when I suspended them (which was about 6 hours past their deadline). There's a third WU that hasn't started. Can you get them in before the 3rd person does? I'm not sure what you mean by "the 3rd person? BOINC won't automatically kill tasks that go over the deadline as there are projects out there that don't care much about the deadline. CPDN and QCN Alpha come to mind. You can keep on running those models till well after the deadline and in the case of CPDN still get credit. QCN doesn't do credits (yet). Since they're MW@H, do you have any experience indicating that I should abort at this point? This is the first time I've run up against deadlines. I run BOINC on 3 hosts and recently they all got a batch of 6-10 MW@H WUs at the same time. Given the estimated run times and short deadlines, it was immediately apparent that all 3 would have to run virtually nothing but MW@H for 2-3 days to finish them. I did do a little run-time management on my own, but that was because I didn't want to run MW@H exclusively. Other than that, I don't do much tweaking - I let the BOINC manager do it's own thing. Also, I haven't changed resource allocations for the projects for quite a while. |
Send message Joined: 29 Aug 05 Posts: 15549 |
Only tasks from Milkway@home are over their deadline. They had been running for about 7 hours with an esitmated 8 hours to go when I suspended them (which was about 6 hours past their deadline). There's a third WU that hasn't started. OK, Milkyway changed their tasks from the small ones to very big ones, virtually overnight and without changing the deadline. These tasks run long on all computers. I had one and while the old ones took a mere 10 minutes, the new one ran for 10 hours. Can you get them in before the 3rd person does? When tasks go over the deadline, they are sent out again to a third computer. This is to make sure the work gets done true to quorum. if you in the mean time manage to get the task in before the third computer returns this task, you still get credit for it. Check the tasks on the third computer where they've been sent to, if you can see how long it generally takes for these monsters to run on that system. If you doubt they get in on time, first set Milkyway to No New Tasks, then abort the ones you have and report them. You won't get credit, of course. Then check their forums and wait until Travis is back and has changed the deadline, before allowing work fetch on this project again. |
Send message Joined: 29 Aug 05 Posts: 147 |
Only tasks from Milkway@home are over their deadline. They had been running for about 7 hours with an esitmated 8 hours to go when I suspended them (which was about 6 hours past their deadline). There's a third WU that hasn't started. Milkyway also did not change the estimate of how much work was needed. This is a problem that the project has to correct. Cancel one of these two and let the other run to completion. Letting one run to completion will set the Duration Correction Factor for the project (or you can stop BOINC and hand edit the client_state.xml file to make the <duration_correction_factor> for Milkyway about 100 times what it currently is). EDIT: The real kicker that is causing the late returns is the lack of change in fpops_est, not the lack of change in the deadline. BOINC WIKI |
Send message Joined: 29 Aug 05 Posts: 15549 |
(or you can stop BOINC and hand edit the client_state.xml file to make the <duration_correction_factor> for Milkyway about 100 times what it currently is). Wow... just checked mine. I still got it running on one of my hosts, the RDCF moved from around 0.91 to 51.4 after running just one of them big ones. ;-) |
Send message Joined: 4 Jul 08 Posts: 82 |
Check the tasks on the third computer where they've been sent to, if you can see how long it generally takes for these monsters to run on that system. If you doubt they get in on time, first set Milkyway to No New Tasks, then abort the ones you have and report them. You won't get credit, of course. Hmm...Checking the tasks I have at the MW website would seem to indicate that they've only been sent out to one other host. Am I just seeing the host that the WU got re-issued to because mine was past deadline? Or, is it possible that one is the only other one working it and I can "safely" continue to run it? |
Send message Joined: 4 Jul 08 Posts: 82 |
Wow... just checked mine. I still got it running on one of my hosts, the RDCF moved from around 0.91 to 51.4 after running just one of them big ones. ;-) I don't know what it *was*, but mine is over 54 right now. |
Send message Joined: 29 Aug 05 Posts: 15549 |
Hmm...Checking the tasks I have at the MW website would seem to indicate that they've only been sent out to one other host. Am I just seeing the host that the WU got re-issued to because mine was past deadline? No, sorry, my mistake for just typing things without checking them first. The minimum quorum on Milkyway is 1, with initial replication also being 1. So the second host you see is the one it's been resent to. |
Send message Joined: 4 Jul 08 Posts: 82 |
No, sorry, my mistake for just typing things without checking them first. The minimum quorum on Milkyway is 1, with initial replication also being 1. So the second host you see is the one it's been resent to. Ah, okay, I see what you're looking at for quorum and replication in the WU details. It looks like I got lucky on one of the three I have past due - it just finished and got what looks like full credit. It shows a little under 10 hours of CPU time, which is about 5 hours less than the expected time after working 50% of the WU. I guess they can run very fast near the end? A few minutes later...Now my WU status at MW shows only two WUs - the one I finished a bit ago and another that I haven't done much work on. The one I have with over 8-hours invested - and is still running - is no longer listed. :-( |
Send message Joined: 29 Aug 05 Posts: 15549 |
A few minutes later...Now my WU status at MW shows only two WUs - the one I finished a bit ago and another that I haven't done much work on. The one I have with over 8-hours invested - and is still running - is no longer listed. :-( Yes, their database purge is measured in mere minutes, which is very annoying. Hopefully they will fix that as well. |
Send message Joined: 4 Jul 08 Posts: 82 |
Thanks for all the great info...learning a lot here. :-) |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.