Thread 'inaccurate estimated durations'

Message boards : BOINC Manager : inaccurate estimated durations
Message board moderation

To post messages, you must log in.

AuthorMessage
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 2844 - Posted: 29 Jan 2006, 0:23:47 UTC
Last modified: 29 Jan 2006, 0:24:39 UTC

BOINC has been working here for a long time. Until recently. Every time I download a work unit from any project, the BOINC Manager Work tab shows 1 sec expected duration for the unit. Each unit. This seems to drive BOINC Manager to decide the computer is overcommitted, so it stops all further downloads (for a while) and uses "earliest-deadline-first scheduling". I've tried a reset on each individual project; for at least one project, units downloaded after reset had the same problem. My BOINC Manager is version 5.2.13 (current). Any ideas? Thanks in advance.
ID: 2844 · Report as offensive
Jim K
Avatar

Send message
Joined: 8 Sep 05
Posts: 168
Message 2851 - Posted: 29 Jan 2006, 3:49:35 UTC

Lower your connect to number to a lower one like 1.
BOINC Wiki
ID: 2851 · Report as offensive
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 2852 - Posted: 29 Jan 2006, 6:14:32 UTC - in response to Message 2851.  

Lower your connect to number to a lower one like 1.


Jim,

Thanks for the prompt reply.

I'm not fully confident I understand you. Maybe you refer to the "General Preferences" page and the setting there for "Connect to network about every". Mine has been set at .75 since before this behavior started.

If you mean something else, might you tell me more?

Regards . . .
ID: 2852 · Report as offensive
Jim K
Avatar

Send message
Joined: 8 Sep 05
Posts: 168
Message 2853 - Posted: 29 Jan 2006, 6:36:16 UTC
Last modified: 29 Jan 2006, 6:46:00 UTC

If you have noticed the estimates of completion will change with each new batch of WUs that you down load, I have not bothered to keep track of this to see if the Devs have figured a way to keep it from changing....



How many projects are you connected to and what is the speed of your processor...

The following lists the basic rules for EDF and may shed some light on your situation.

Work Scheduler
EDF (earliest date first) is caused by:

1) A deadline within 24 hours.
2) A deadline within 2 * the connect time.
3) A failure of the Round Robin simulator to finish a result within 90% of its deadline.

A project not requesting work is caused by:
1) A host that is in NWF (no work fetch)
2) A project that has enough work on a host that has enough work.
3) A project that has a LTD that is negative enough.

NWF (no work fetch) is caused by:
1) A failure of the Round Robin simulator to get a result done within 90% of a deadline if the resource share of the next project to request work from is added to the Round Robin simulation.

Work will always be requested from somewhere, even if that somewhere has a very negative LTD and/or the host is in NWF (no work fetch) if there is a CPU that is idle and there is a network connection.



My comment to lower the connect to is just a basic starting place I used as many users have the connect to, set way to high....
BOINC Wiki
ID: 2853 · Report as offensive
Mr_Fusion

Send message
Joined: 29 Jan 06
Posts: 4
Hungary
Message 2858 - Posted: 29 Jan 2006, 11:19:19 UTC
Last modified: 29 Jan 2006, 11:25:36 UTC

I'm getting the same behavior lately (since about a week or so). Each time the manager downloads new work for various projects, it immediately switches to earliest-deadline-first mode, despite the closest deadline is usually 5-6 days away.
This is causing some of my projects not to get CPU time at all for extended periods, making them stall for days. I've not changed anything in my general or project-specific settings, and the scheduler was working correctly until now.

I'm using 5.2.10 (version history didn't seem to contain fundamental changes since then), Win platform, participating in ClimatePrediction, Rosetta, Predictor and SETI.
At this very moment, Rosetta is running in earliest mode, with a closest report deadline of Feb. 4th, 2006., 3 WUs pending with an average processing time of 3.5 hours/WU... All other projects are stalled. :/
ID: 2858 · Report as offensive
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 2859 - Posted: 29 Jan 2006, 15:02:39 UTC - in response to Message 2853.  

If you have noticed the estimates of completion will change <<clip>>....

How many projects are you connected to and what is the speed of your processor...

The following lists the basic rules for EDF and may shed some light on your situation. <<clip>>



Mr. Fusion,

Good nickname. Thanks for the post. Misery loves company, sorry to say. (grin)




Jim,

As to changing estimates of duration: There's a related symptom of what's happening in that the estimates are low and unchanging. I'm used to BOINC Manager making a reasonable estimate of completion at download time and decrementing every five seconds or so as the unit executes. Current behavior is that as each work unit comes in, the Work tab and "To Completion" column shows a 1 second estimate (00:00:01). That estimate doesn't decrement. (Well . . . maybe at the end, just before completion; dunno. (grin))

This is happening on a Win2K Compaq at roughly 600 MHz. (Slow, so you can giggle. Not quite Stone Age, so don't laugh outright! (he he)) 384 MB memory. Plenty of hard disk.

Like Mr. Fusion's experience, this has been happening for roughly a week. Maybe it started last Tuesday or so, but I'm not sure of that.

I'm connected to Einstein, Predictor, SETI, and Rosetta with corresponding resource shares of 500, 350, 150, and 1. (I normally don't let Rosetta get new work because it doesn't handle leaving memory well. When they fix that, they'll get more time.) That resource share mix and my usual use of this machine typically allows all units to complete well within deadline. When this all started, my General Preferences, "Connect to network about every" was .75 days.

I have no reason to believe "earliest date first" is kicking in for either of the first two reasons you cite. During this time, no unit has been within a 1.5 day of deadline (double .75 days). Maybe that means the Round Robin Scheduler is related to the issue.

Here's what I've done this weekend. I marked all projects for "no new work". I let all units complete (all before deadline). I intended to change "Connect to network about every" to 0.02 (roughly 30 min), but it didn't take. I reset all four projects. I allowed new work for Einstein and SETI. Einstein downloaded one unit; SETI downloaded two. All are due Feb 12. I'll guess they'll take roughly 42 hours of CPU to complete; this computer has been on for roughly 70 hours a week recently and probably will be this week. BOINC Manager marked all three units right away with estimates to complete of 1 second. BOINC Manager reported right after downloads, "Suspending work fetch because computer is overcommitted." and "Using earliest-deadline-first scheduling because computer is overcommitted." The Einstein unit has a deadline a few seconds before the SETI units, so it is getting all the CPU (earliest first).

I hope that helps!

Regards . . .
ID: 2859 · Report as offensive
Jim K
Avatar

Send message
Joined: 8 Sep 05
Posts: 168
Message 2860 - Posted: 29 Jan 2006, 15:57:50 UTC
Last modified: 29 Jan 2006, 16:00:18 UTC

In my opinion there are a couple of items that are affecting the process 1. Slow computer 2. Running four projects 3. and this one could be the key, you do not run the computer all the time.
Boinc has a lot of items it looks at to determine how it will let the Applications process, one of which is the percent of time a computer is running, and the percentage of that time the project has been given. You have not indicated that you have received any error message about percent of time, so I think your problem is the the time est. and running a slow computer..

CPDN recommends at least and 800mhz for the slab and 1000mhz for the sulphur, so if you stop running CPDN I think that solves your problem...


Mr Fusion please list your computer speed and do you run 24/7?
BOINC Wiki
ID: 2860 · Report as offensive
Mr_Fusion

Send message
Joined: 29 Jan 06
Posts: 4
Hungary
Message 2863 - Posted: 29 Jan 2006, 16:59:02 UTC

Intel 2.4 GHz HT, both CPU threads available to BOINC. The system is usually up 6-8 hours a day, except on weekends when it's on for 12-14 hours, like it has been ever since I've switched to BOINC from SETI Classic.
I've been using this setup since mid November or so, and thus far scheduling worked like a charm, CPDN was running constantly on one thread (50% priority), the other 3 pojects were cycled through the other thread, at an output rate of roughly 1 WU/day.
But a short while ago something went nuts, and now any new WU that's downloaded instantly forces BOINC to allocate both threads to that project in earliest-deadline-first scheduling mode. When all the WUs for that project are done, the manager switches to the next project, staying in earliest mode, and so on. And what makes it more odd, that this "deadline panic" doesn't seem to actually relate to the deadline of the current WUs; earlier today a couple of new SETI WUs were downloaded with a deadline of Feb. 12nd 2006., and the shceduler immediately assigned both threads to them staying in earliest mode, suspending two Rosetta WUs with a deadline of Feb. 4th 2006.

Right now I have to manually suspend/resume work on WUs if I want any work done on more than one project, especially on CPDN, since that has the longest deadline and thus not getting any CPU time lately.
ID: 2863 · Report as offensive
Keck_Komputers
Avatar

Send message
Joined: 29 Aug 05
Posts: 304
United States
Message 2865 - Posted: 29 Jan 2006, 19:58:43 UTC

It sounds like mr. fusion has been bitten by the long duration workunit causing EDF on a multi-CPU system bug. This one is being worked on and should have a fix in the next recommended client when released (5.4.x).

The reason it took a bit for it to show up is that when BOINC is first started it assumes that it will be able to work 24/7 then learns how often it actually gets to work over about a month. You can temprarilly fix this by editing the time stats section of the client_state.xml file. Be sure to exit the client before doing this. Change the values there to 0.99999999. If you are uncomfortable doing that you can leave the host on more.
BOINC WIKI

BOINCing since 2002/12/8
ID: 2865 · Report as offensive
Jim K
Avatar

Send message
Joined: 8 Sep 05
Posts: 168
Message 2866 - Posted: 29 Jan 2006, 20:02:35 UTC

That sounds like a debt problem, so maybe something has corrupted the xml, I am not that familiar with the xml, so I will leave that for someone else, but you could try a repair or maybe a reinstall.
BOINC Wiki
ID: 2866 · Report as offensive
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 2868 - Posted: 29 Jan 2006, 23:29:51 UTC - in response to Message 2865.  
Last modified: 29 Jan 2006, 23:40:38 UTC

It sounds like mr. fusion has been bitten by the long duration workunit causing EDF on a multi-CPU system bug. This one is being worked on and should have a fix in the next recommended client when released (5.4.x).

The reason it took a bit for it to show up is that when BOINC is first started it assumes that it will be able to work 24/7 then learns how often it actually gets to work over about a month. You can temprarilly fix this by editing the time stats section of the client_state.xml file. Be sure to exit the client before doing this. Change the values there to 0.99999999. If you are uncomfortable doing that you can leave the host on more.


John and Jim,

John: Thanks for jumping in. It sounds like you have useful information for Mr. Fusion. I'm hoping you or Jim can offer me something, too.

I'm not sure I correctly understand the stream here. CPDN? If that's Climate Prediction, I don't run it. It is too much for my computer.

In effect, I'm running three projects (Einstein, Predictor (distinct from Climate Prediction), and SETI). My configuration is unchanged in every way I know from when the BOINC Manager was working right. (Well . . . I've reset "Connect to network about every" from .75 days to .02 days, but that hasn't helped or hurt so far as I can tell.) I've had this computer on for roughly 70 hours a week for years.

Does the recommendation to edit XML apply to me? If so, my <time_stats> section is
<time_stats>
<on_frac>0.527070</on_frac>
<connected_frac>0.992438</connected_frac>
<active_frac>0.999698</active_frac>
<cpu_efficiency>0.922841</cpu_efficiency>
<last_update>1138576623.969231</last_update>
</time_stats>
I said the computer is on roughly 70 hours a week; 52.7% would be more than 88 hours. That might be right. I should be connected and active all the time; those are close. I wouldn't think you'd want me to change the others. What am I missing?

Might debt be an issue? (Now that I'm looking in client_state.xml!)
Predictor has 0 short debt and 4000+ long debt.
Einstein has -7800 short and -7800 long debt. (Yup. Negatives.)
SETI has over 7500 both short and long.
Rosetta has 0 short and -4200 long debt.
(I guess I expected Rosetta would be different from the others, given that it is typically marked for "no new work". The negatives for Einstein surprise me.)

Thanks in advance.
ID: 2868 · Report as offensive
Jim K
Avatar

Send message
Joined: 8 Sep 05
Posts: 168
Message 2869 - Posted: 30 Jan 2006, 1:18:30 UTC
Last modified: 30 Jan 2006, 1:31:05 UTC


BOINC Enthused....

Sorry about thinking you were running CPDN I had a senior moment there lol;0)

At this point I am over my head, all I can say is you can go into the xml and reset the debt, both long and short to zero, or just let it run at the new connect to time and see if it clears up.
From what you posted it looks OK to but who knows... Keck know more than I do so his advise may be helpful.....
BOINC Wiki
ID: 2869 · Report as offensive
Keck_Komputers
Avatar

Send message
Joined: 29 Aug 05
Posts: 304
United States
Message 2875 - Posted: 30 Jan 2006, 11:47:51 UTC

Sorry I missed something in my last message. After rereading the entire thread let me try again.

I don't see anything obviously wrong in the settings or debts. Debts always add up to zero so there will always be some negative and some positive. Anything under 1 day in the connect setting should translate into under 3 days with 3 projects and therefore should be low enough to avoid the short deadlines at PP@H.

One thing that concerns me is the estimated time of one second. Try manually rerunning the benchmarks and see if that helps. If not post your duration correction factors and we will go from there.

BOINC WIKI

BOINCing since 2002/12/8
ID: 2875 · Report as offensive
BOINC Enthused

Send message
Joined: 29 Oct 05
Posts: 14
United States
Message 2877 - Posted: 30 Jan 2006, 12:04:41 UTC - in response to Message 2875.  

One thing that concerns me is the estimated time of one second. Try manually rerunning the benchmarks and see if that helps. If not post your duration correction factors and we will go from there.


John,

Ding ding ding! You found it. This event appears to be in the past. Many thanks.

I ran benchmarks. The most relevant lines on the Messages tab are:
01/30/2006 5:56:22 AM||Allowing work fetch again.
01/30/2006 5:56:22 AM||Resuming round-robin CPU scheduling.

On the Work tab, all the 00:00:01 estimates are gone, replaced by reasonable estimates of completion.

John and Jim: I appreciate the help a lot!
ID: 2877 · Report as offensive
Jim K
Avatar

Send message
Joined: 8 Sep 05
Posts: 168
Message 2881 - Posted: 30 Jan 2006, 16:03:47 UTC

Duh! that one second thing just flew by me, never even registered, good catch Keck....
BOINC Wiki
ID: 2881 · Report as offensive

Message boards : BOINC Manager : inaccurate estimated durations

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.