DCF Integrator

Message boards : Questions and problems : DCF Integrator
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28118 - Posted: 20 Oct 2009, 4:28:10 UTC

I don't pretend to know the inner workings of Boinc, all I can say is to describe this problem tonight.

All week long my host 3378680 has been crunching CPU and GPU work with the cache level set at 4 days. It has been working and downloading work to keep the cache at that level all week.

Tonight I find this host downloading hundreds of work units. And I do mean many hundred in one fell swoop. I jumped in and set No New Work. My cache level now stands at more than 9 days after all these downloads.

As I said, I do not know what happened or changed. I do know that if there is an integrator on the <duration_correction_factor> it needs to be about 10 times larger if indeed that is what happened. It's the only thing I know of that could have done such a thing.

Also posted on Boinc alpha testing list.

Thanks for listening.
ID: 28118 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15490
Netherlands
Message 28119 - Posted: 20 Oct 2009, 5:15:18 UTC - in response to Message 28118.  

Your host where? On Einstein, GPUGrid, Seti, Milkyway, Collatz, Aqua or some other project?

Which version of BOINC are you using? Or is that a secret?

What is your DCF for that host?

And while we're at it...
REMINDER TO ALL ALPHA TESTERS:
It's far easier for us to fix problems if you send message logs with the appropriate flags set. The main flags are:

<cpu_sched_debug>: problems involving the choice of applications to run.
<work_fetch_debug>: problems involving work fetch (which projects are asked for work, and how much).
<rr_simulation>: problems involving jobs being run in high-priority mode.

ID: 28119 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28120 - Posted: 20 Oct 2009, 6:14:51 UTC


Jord

Of course it's on Seti. Boinc 6.10.15 was running at this time but I have seen this problem occur with every version of Boinc I have ever used.

Giving you the DCF is useless as it changes every time a work unit is completed.
ID: 28120 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15490
Netherlands
Message 28122 - Posted: 20 Oct 2009, 6:54:52 UTC - in response to Message 28120.  

Of course it's on Seti.

Nothing much "of course" about that. You posted the same thing to the Alpha email list. Other than the post itself you gave no information. The developers there ask people to test 6.10 on Collatz, so "of course" just doesn't cut it.

Boinc 6.10.15 was running at this time but I have seen this problem occur with every version of Boinc I have ever used.

Ah and how come then you haven't given those snippets of information? Are we so sentient that we just know what is happening on your system? Or do you think that BOINC, just alike Windows, will send secret information about your use of it to a central database somewhere? (How's that for a conspiracy theory?)

Giving you the DCF is useless as it changes every time a work unit is completed.

That may be, but the DCF you have at this moment plays along with the estimate of work you ask for and get. So having a number would be nice, if you talk about it anyway.

If you can't provide it, post a log with all three cc_config.xml flags, as you may well be in EDF now that you got 9 days worth of work on a 4 day request. For good measure, turn on <dcf_debug> as well, which will (weirdly enough) provide debug information on what is happening to the DCF.

The more information you give, the better help you can get. When you're going to the doctor to tell him it hurts somewhere in your body, you also provide as much information about what kind of pain and where it is, so he can help you better.

So... turn on the debug flags in cc_config.xml, allow new tasks, see if you can reproduce what you saw at the time of posting to the list, then post the log about that to the list. Or here. Or both.

And while you're at it, open client_state.xml, find one or more of the affected Seti tasks and post the numbers of its <rsc_fpops_est> and <rsc_fpops_bound> values.
ID: 28122 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28123 - Posted: 20 Oct 2009, 7:01:11 UTC
Last modified: 20 Oct 2009, 7:02:04 UTC

Well I can see that this is totally a waste of my time.

There is no way of knowing what happened to the DCF or what it's value was at the time it made all the work requests. Perhaps this problem will resolve itself if it's ignored.
ID: 28123 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15490
Netherlands
Message 28124 - Posted: 20 Oct 2009, 7:30:54 UTC - in response to Message 28123.  

There is no way of knowing what happened to the DCF or what it's value was at the time it made all the work requests.

So for the future add the correct debug flags. Then you have a log of them. It's really not that difficult.

Perhaps this problem will resolve itself if it's ignored.

Sure, stick your head into the sand. Don't run alpha software if you cannot be bothered to give more information, run with debug flags and log (massive) logs for longer periods of time. If you just want to run BOINC, stick with the recommended version.

I really don't get it, you willingly want to run the latest development software, you have this problem, you want it fixed, but you won't give more information or run with debug flags. Then how do you think the developers (most of which are in or en-route to Spain for the Workshop) are going to fix it? By magic? Just taking your word for it? Not good enough.
ID: 28124 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28125 - Posted: 20 Oct 2009, 7:34:38 UTC
Last modified: 20 Oct 2009, 7:35:36 UTC

Too late. As I type this that client is busy downloading hundreds more.

I have put it on No new work.

In fact all my boxes are going on NNW.
ID: 28125 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28126 - Posted: 20 Oct 2009, 7:44:53 UTC

Jord,

I have no programing knowledge or data to report to you. I am only reporting unusual behavior I have observed with Boinc.

As I have had a steady diet of VLAR (assigened to CPU) work lately on this machine, now I am crunching VLAR randomely on the work list. It seem that every time a VLAR completes on the CPU suddenly a load of work is requested for the GPU. ALL the incoming work for these 2 download burst requests ended up assigned to the GPU.

Again, NO DATA, just observations. Is that ok to report?
ID: 28126 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 28127 - Posted: 20 Oct 2009, 8:17:43 UTC - in response to Message 28126.  
Last modified: 20 Oct 2009, 8:21:44 UTC

Are you using the rescheduler?

[edit]And since you are using optimised applications, did you set the <flops> directive in app_info.xml?[/edit]
ID: 28127 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15490
Netherlands
Message 28128 - Posted: 20 Oct 2009, 8:36:46 UTC
Last modified: 20 Oct 2009, 8:37:07 UTC

One last try then. Add a cc_config.xml file to your BOINC Data directory.
Add into it these lines:

<cc_config>
<log_flags>
<cpu_sched_debug>1</cpu_sched_debug>
<work_fetch_debug>1</work_fetch_debug>
<rr_simulation>1</rr_simulation>
<dcf_debug>1</dcf_debug>
</log_flags>
</cc_config>


Save the file, make sure it got the .xml extension, not something else.
Exit and restart BOINC.
Let it run like this for a minute.
Run with Allow New Tasks for a minute or two.
Set NNT.
Post the whole log.

There's no programming knowledge needed, it's not rocket scientry.

Should you find some truth in Gundolf's post, then it's app related, not BOINC. Then you should post about it on the Seti forums or the Lunatics forums. And still give them more info than "Help I have a problem".
ID: 28128 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28136 - Posted: 20 Oct 2009, 13:43:36 UTC - in response to Message 28127.  

Are you using the rescheduler?

[edit]And since you are using optimised applications, did you set the <flops> directive in app_info.xml?[/edit]


Yes I reschedule only LAR work to the CPU where it is done more efficiently. Just over 2 hours.

Yes I have <flops> sections in the app_info file to force Boinc to display the actual work time required for each work unit.
ID: 28136 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28138 - Posted: 20 Oct 2009, 13:45:54 UTC - in response to Message 28128.  
Last modified: 20 Oct 2009, 13:50:28 UTC

Jord,

I can do as you asked but what will be the name of the resulting log file?
ID: 28138 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 28139 - Posted: 20 Oct 2009, 13:52:34 UTC - in response to Message 28138.  

Jord,

I can do as you asked but what will be the name of the resulting log file?

They appear in your messages tab.

BOINC WIKI
ID: 28139 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28141 - Posted: 20 Oct 2009, 14:05:44 UTC
Last modified: 20 Oct 2009, 14:06:45 UTC

Ok running as Jord wanted with that file. I see sometheing called time_stats_log.xml I guess that is the file. Right now cpu's are all crunching on VLAR work and the GPU is crunching work about 3 and 12 minute intervals depending on it's work.

I will manually log the DCF when each work unit ends for the next few hours and maybe you guys can find something.

VLAR takes just over 2 hours on the CPU and it looks like they won't finnish for another hour or so.
ID: 28141 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15490
Netherlands
Message 28142 - Posted: 20 Oct 2009, 14:11:58 UTC - in response to Message 28141.  

I see sometheing called time_stats_log.xml I guess that is the file.

No.

BOINC Manager will show 1,000 lines of text in the Messages window (Simple view) or Messages tab (Advanced view). All normal messages are written to and stored in stdoutdae.txt in your data directory, which by default will grow to 2MB before it switches over to a new file and saves the old one as a *.old file.
ID: 28142 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28143 - Posted: 20 Oct 2009, 14:16:09 UTC - in response to Message 28142.  

I see sometheing called time_stats_log.xml I guess that is the file.

No.

BOINC Manager will show 1,000 lines of text in the Messages window (Simple view) or Messages tab (Advanced view). All normal messages are written to and stored in stdoutdae.txt in your data directory, which by default will grow to 2MB before it switches over to a new file and saves the old one as a *.old file.


Ok.........it's running and making logs then. I will let it go. As I said right now the 4 CPU's will complete 4 VLAR work in about one hour. Meanwhile the GPU is crunching work in 3 to 11 minute work units. I will keep an eye on it. Will the logs show the DCF or should I manually record it?
ID: 28143 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15490
Netherlands
Message 28145 - Posted: 20 Oct 2009, 14:21:40 UTC - in response to Message 28143.  

With the <dcf_debug> flag on, it will record changes to the DCF after each task has finished.

It shows in lines like this: 20-Oct-09 15:07:54 Milkyway@home [dcf] DCF: 1.148090->1.130739, raw_ratio 0.974578, adj_ratio 0.848870


ID: 28145 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28146 - Posted: 20 Oct 2009, 14:24:16 UTC - in response to Message 28145.  
Last modified: 20 Oct 2009, 14:28:53 UTC

Ok will let it go as is and see what happens in 1 to 1.5 hours as these 4 VLAR work units complete.

By the way not uploading completed work at this time. Is that normal with these flags set?

edit.....never mind. I usually run with return results immediately on. With this config it.s off.

Waiting............
ID: 28146 · Report as offensive
Geek@Play
Avatar

Send message
Joined: 20 Jan 09
Posts: 70
United States
Message 28150 - Posted: 20 Oct 2009, 15:26:15 UTC

Jorg,

This may take days to capture this event in the logs. Is there any problem in running this configuration long term until it occurs?
ID: 28150 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 28157 - Posted: 20 Oct 2009, 17:15:19 UTC - in response to Message 28150.  
Last modified: 20 Oct 2009, 17:29:52 UTC

The only problem will be that the log file(s) will grow dramatically.

Gruß,
Gundolf
[edit]Since SETI is currently shut down for maintenance, I can't check, but there should be a thread (by Richard Haselgrove?) with a warning that your problem might occur when using the rescheduler.
Yes I reschedule only LAR work to the CPU where it is done more efficiently. Just over 2 hours.

Perhaps the value you have set for <flops> isn't right, since it should prevent the DCF from varying that much.
Yes I have <flops> sections in the app_info file to force Boinc to display the actual work time required for each work unit.
[/edit]
ID: 28157 · Report as offensive
1 · 2 · 3 · Next

Message boards : Questions and problems : DCF Integrator

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.