Message boards : BOINC Manager : Boinc manager task deadline message
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Aug 06 Posts: 778 |
CPDN is the only project as far as I know where the project doesn't apply the task deadline ie the server accepts results produced after it. The CPDN workunits are also very long, so it really matters to both the project and to members that tasks are completed whenever possible. The boinc manager message warning you that you will miss your model deadline, or you've already missed it, and you should consider aborting the task, is not relevant to CPDN but CPDN members receive it anyway because the messages are generic ie designed for all projects. This message causes a lot of worry, unhappiness, disgust and so on among CPDN members; I'm sure it causes many members to abort well-advanced models. Not everyone realises they should post on the forums for advice. Extending the deadlines even further isn't really an option as in many cases the boinc task scheduler would probably act in such a way that the same problem would occur, but later. This would cause even greater worry, unhappiness and disgust. I am going to ask the CPDN programmers if they can include a scrolling message about this in the model graphics window. Would it also be possible or reasonable to ask for 'Does not apply to CPDN tasks' to be added to the boinc manager message? |
Send message Joined: 13 Aug 06 Posts: 778 |
My question here isn't about how the boinc scheduler in its various incarnations handles one massive task alongside short tasks from other projects. That might be better discussed in a separate thread. The BM message about the climate model being so many days overdue and the member should consider aborting it can be received by a person who crunches CPDN only, but doesn't crunch 24/7 or has a slow computer. |
Send message Joined: 19 Jan 07 Posts: 1179 |
The solution seems to extend the scheduler protocol adding a new flag, saying that deadlines don't matter. Clients / GUIs can do what they want with it, like not showing the deadline warning at all. Note it's not a "Don't show deadline warning" flag, it's a "We don't mind deadlines". It should give information, not tell it what to do. As with any protocol, it's good to give more information than the other end normally uses; because some day it may want to start using it. |
Send message Joined: 29 Aug 05 Posts: 304 |
The solution seems to extend the scheduler protocol adding a new flag, saying that deadlines don't matter. Clients / GUIs can do what they want with it, like not showing the deadline warning at all. Note it's not a "Don't show deadline warning" flag, it's a "We don't mind deadlines". It should give information, not tell it what to do. I like this idea, but I would like to extend it more. Add a tag to the task, <over_deadline> for example. This tag would tell the client to abort the task automatically if it is more than that many seconds over the deadline. If the tag is zero or missing the client drops everything and finishes that task (current behavior). If it is negative then ignore or decrease deadline pressure. It should be by task since some projects have different needs for different tasks/apps. I think decreasing the importance of the deadline is a better idea than ignoring it completely, because I expect the projects would rather have the task returned sooner even when they are not too concerned about it. However a task with a negative value should never block all other tasks, at least one other task should always be allowed. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 16 Apr 06 Posts: 386 |
They sound like reasonable suggestions to me. A large fraction of the 'overdue' messages we get at CPDN are due to the '1901' deadline issue (where a null system clock resets the deadline date in the workunit back to 1st Jan 1901). I gather this is a deliberate design choice on the part of Boinc? This means that the client will think the deadline has been passed by over 100 years (an awful lot of seconds). Off topic: Please use "reply to this post" instead of "reply to this thread" (you can still delete the quotes). A future feature needs it. Replying to the thread will be like replying to the first post (probably not what you want). Some sort of tree-threaded discussion? In that case, could 'reply to this post' not quote the post being replied to, since it'll be visible just above, and remove the 'reply to this thread' link? Alternatively, perhaps 'reply to this thread' should refer to either the last post in the thread or the first post in the thread based on whether the link is at the top or the bottom. Most commonly people *do* want to just reply to the last post in the thread, the behaviour of the system should reflect this. Incidentally, this is going to ruin the flow of existing threads, so perhaps the threading should use a new field instead of the existing field. It has long been the advice on the CPDN forums to always use the 'reply to this thread' link in order to avoid unnecessary quoting (a lot of quote-only posts from newcomers are caused by them clicking on the 'reply to this post' thread and then not knowing what to do with all the quoted text). |
Send message Joined: 19 Jan 07 Posts: 1179 |
Some sort of tree-threaded discussion? Yep, exactly that. Replies on a new thread to avoid going off-topic here. |
Send message Joined: 29 Aug 05 Posts: 147 |
How about: <deadline_misses> <over_deadline> <time>time in seconds</time> <action>no_warn|warn|contact|abort_unstarted|abort</action> <contact_period>duration in seconds</contact_period> // only read if contact is specified. </over_deadline> <over_deadline> <time>time in seconds</time> <action>no_warn|warn|contact|abort</action> </over_deadline> ... <deadline_misses> no_warn would suppress the warning message warn would enable the warning message contact would contact (update) the server every X seconds to see if an abort should be done. This contact would not be forced if the host has the network suspended. abort_unstarted would abort the task if it had not been started. abort would just abort the task. The action(s) in the block with the largest time since the task was due that is less than the difference between now and the due date/time will be executed. This means that if you have an abort at 60, there is no point in having a later action specified. BOINC WIKI |
Send message Joined: 19 Jan 07 Posts: 1179 |
Good idea, not so good XML layout (fits nicely with the rest of BOINC though :P). How about: <deadline_misses> <over_deadline time="time in seconds"> <!-- time could be a sub-element instead of an attribute --> <warn>yes</warn> <!-- yes|no (default 'no' if not specified at all?); makes it clear warn and no_warn can't be BOTH specified --> <contact period="duration in seconds" /> <!-- period could be a sub-element instead of an attribute; I really think an attribute is better here --> <abort_unstarted/> <abort/> </over_deadline> ... </deadline_misses> |
Send message Joined: 16 Apr 06 Posts: 386 |
What about the situation when the deadline is reset by boinc to 1901? This happens a lot at CPDN, and presumably other projects too (I think it is something to do with the PC's CMOS battery running out, which means that the system clock is null until windows resynchronises it when connected to the internet). Is there a risk that Boinc might potentially abort all workunits (in progress or ready-to-run) after each reboot on a PC with this problem? Could there be an invalid deadline flag which would say what action should be taken in this particular case? (the client would think the work unit is 3 billion seconds overdue, possible signed-integer overflow). <over_deadline deadline_invalid="Y"> <warn>yes</warn> <contact period="86400" /> <reload_deadline_from_server/> </over_deadline> |
Send message Joined: 19 Jan 07 Posts: 1179 |
I think if your computer date goes back to 1901, BOINC should be one of your least worries. |
Send message Joined: 16 Apr 06 Posts: 386 |
I don't think the actual computer date goes to 1901, most likely to 1980 or 2000 depending on which bios they have. When they're using the PC they don't notice because the date goes back to normal once the time has been resynced to an ntp server. The bioses tend to go back to a safe configuration these days, so from the user's viewpoint the system still boots and works OK (unlike the systems a few years ago which lost contact with the disk drives). I think the 1901 date comes from Boinc itself (i.e., my guess is that it sees an odd date on the system clock, and for some reason I don't know decides the best thing to do is to reset the deadlines to 1901). |
Send message Joined: 2 Sep 05 Posts: 103 |
I think if your computer date goes back to 1901, BOINC should be one of your least worries. When the CMOS battery is dodgy on a Windows system the time normally goes back to Jan 1st 1980. The deadline being set to 1901 is caused by BOINC's handling of system clock changes of more than 1000 days, as raised by myself on the boinc-dev mailing list in Feb 2006 (see here). I've just checked the source and the adjustment that caused the problem (function RESULT::parse_server() in client_types.C) was removed in version 5.8.16. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 29 Aug 05 Posts: 304 |
Good ideas JM7 and Nicholas. Will it be able to handle the case were we still want to allow more work to be downloaded from other projects? I think this is most important to participants that run into deadline problems with CPDN. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 16 Apr 06 Posts: 386 |
... the adjustment that caused the problem (function RESULT::parse_server() in client_types.C) was removed in version 5.8.16. That's excellent news. 1901 deadlines can be ignored in that case (since there will be very few clients with old WUs with invalid deadlines which have also been upgraded beyond 5.8.16). |
Send message Joined: 29 Aug 05 Posts: 147 |
Good ideas JM7 and Nicholas. Unfortunately, downloading work from other projects is guaranteed to make the task even later. Even CPDN projects do have a termination of the study. BOINC WIKI |
Send message Joined: 29 Aug 05 Posts: 147 |
Good idea, not so good XML layout (fits nicely with the rest of BOINC though :P). How about: As far as I know, the BOINC XML parser cannot read attributes. BOINC WIKI |
Send message Joined: 19 Jan 07 Posts: 1179 |
As far as I know, the BOINC XML parser cannot read attributes. That's not an XML parser, that's a hack. It can only parse a subset of XML that has no attributes, no CDATA blocks, afaik no character entities, no namespaces, etc. ...and David was thinking of making the BOINC protocols into Internet standards... (see workshop slides). |
Send message Joined: 29 Aug 05 Posts: 147 |
As far as I know, the BOINC XML parser cannot read attributes. But until the BOINC XML parser (partial though it may be) is replaced (if ever) the design of the XML used has to be parseable by the current parser. I understand the reasoning behind a roll your own approach. The current standard parsers are slow, bloated, and a pain to use. The architecture for the DOM XML parser (which is what most parsers are based on) as defined by the standards committee quite frankly stinks. Anything but a very small subset of commands is guaranteed to take at least n^2 time, and even the ones that are not guaranteed to take that long are not guaranteed not to take n^2 time (there are some sets of functions that should take linear time but don't do so in all implementations. The BOINC code is small, fast, and much easier to use - even if it is not complete. BOINC WIKI |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.