Time to completion all over the place

Message boards : Questions and problems : Time to completion all over the place
Message board moderation

To post messages, you must log in.

AuthorMessage
Fred - efmer.com
Avatar

Send message
Joined: 8 Aug 08
Posts: 570
Netherlands
Message 25106 - Posted: 29 May 2009, 9:11:14 UTC

Using BOINC 6.6.28. Windows XP 64.

I'm running AP CPU and Seti Enh GPU
After an AP is completed the CPU and GPU time to completion change.

The GPU takes about 6-8 minutes but after an AP it goes to 20 of even 40 Minutes
After that it works it way down to the 6-8 minute range and switches back to a way lager number.

And the CPU AP time changes from 7 - 15 hours and back where 15 is the right value.

Why is the completion time of the CPU and GPU dependent on each other and are they moving up and down all the time.
ID: 25106 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 25108 - Posted: 29 May 2009, 9:31:28 UTC - in response to Message 25106.  

Because they all interact with a single project-wide "Durection Correction Factor". BOINC tries to compensate for the differing processing efficiencies of four different applications - in the SETI case, MB/CPU, MB/CUDA, AP, and AP_v5 - using a single variable. It just can't be done.

In the long term, the answer for BOINC is to switch to using DCF variables for each separate application version. This has been requested, mulled over, and at one stage semi-promised for v6.8: but with yesterday's announcement that v6.8 has been hijacked by GridRepublic, I guess this has been punted into the long grass again.

In the short term, there are well-tested techniques for coping with the SETI manifestation of the issue. Go back to the SETI boards, and read e.g. message 894765. The flops multipliers were originally calibrated on my Q9300, and should ideally be re-worked for your own hardware - sounds like you have plenty of raw data - but they work as a quick'n'dirty approximation.
ID: 25108 · Report as offensive
Fred - efmer.com
Avatar

Send message
Joined: 8 Aug 08
Posts: 570
Netherlands
Message 25113 - Posted: 29 May 2009, 14:36:08 UTC - in response to Message 25108.  

Because they all interact with a single project-wide "Durection Correction Factor". BOINC tries to compensate for the differing processing efficiencies of four different applications - in the SETI case, MB/CPU, MB/CUDA, AP, and AP_v5 - using a single variable. It just can't be done.

In the long term, the answer for BOINC is to switch to using DCF variables for each separate application version. This has been requested, mulled over, and at one stage semi-promised for v6.8: but with yesterday's announcement that v6.8 has been hijacked by GridRepublic, I guess this has been punted into the long grass again.

In the short term, there are well-tested techniques for coping with the SETI manifestation of the issue. Go back to the SETI boards, and read e.g. message 894765. The flops multipliers were originally calibrated on my Q9300, and should ideally be re-worked for your own hardware - sounds like you have plenty of raw data - but they work as a quick'n'dirty approximation.
Thanks Richard, but that only solves the startup after that you get the same problem very fast...
This can only be solved by threating CPU and GPU differently. Or a way to lock the time, because the time is rather constant.

And the GPU scheduler has some rather odd behavior. I have about 10 Cuda tasks stopped at 3 seconds before the end, go figure. That's a real waste of time.
A task that takes about 6 minutes is hardly worth stopping at all.
ID: 25113 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 25114 - Posted: 29 May 2009, 15:14:37 UTC - in response to Message 25113.  

Thanks Richard, but that only solves the startup after that you get the same problem very fast...

No, if you're using an app_info it's a permanent fix: well, permanent insofar as I haven't seen these wild fluctuations since installing the flop correction values in early March: though, of course, you do have to do something to keep the VLARs at bay. Nothing can compensate for bad estimates within a single app-version.

This can only be solved by threating CPU and GPU differently. Or a way to lock the time, because the time is rather constant.

Sure, a true 'solution' needs better programming, but flop compensation is a highly effective surrogate.

And the GPU scheduler has some rather odd behavior. I have about 10 Cuda tasks stopped at 3 seconds before the end, go figure. That's a real waste of time. A task that takes about 6 minutes is hardly worth stopping at all.

Be careful with that. All recent BOINC versions have problems with pre-empted tasks:

a) Up to v6.6.20 - preempted tasks stay in memory. If you have too many tasks stacked up in graphics memory, the next one to start can't squeeze in, and fails with a malloc error. Once you reach that stage, you need to reboot the computer.

b) v6.6.23 onwards - preempted tasks are supposed to be given time to vacate memory and clean up behind themselves - but if that takes more than a second (as it can), the new task isn't initialised properly and errors.

A fix for bug (b) was coded yesterday, and Rom has just told us the fix will be released on Monday (hopefully there'll be time for some testing in between...). If that timetable sticks, it'll be worth upgrading to the new version next week.
ID: 25114 · Report as offensive
Fred - efmer.com
Avatar

Send message
Joined: 8 Aug 08
Posts: 570
Netherlands
Message 25116 - Posted: 29 May 2009, 16:29:23 UTC - in response to Message 25114.  

Ok I saw CUDA task still at 20 Minutes, but there where a lot of old onces, stopped a couple of seconds from the start.
Got a flop value that is reasonable correct.
I take a look at the next version when it comes.
Haven't any errors yet, the card has plenty of memory.
I use the VLAR kill...
ID: 25116 · Report as offensive
Fred - efmer.com
Avatar

Send message
Joined: 8 Aug 08
Posts: 570
Netherlands
Message 25134 - Posted: 31 May 2009, 9:09:43 UTC - in response to Message 25116.  

I thought it was fixed.
But now the AP time is back to 50 hours and CUDA 28 Minutes.
Probably a couple of short Seti Enh that took longer than expected.
But it works it way down in a couple of hours. Until it......
ID: 25134 · Report as offensive

Message boards : Questions and problems : Time to completion all over the place

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.