Thread '6.4.1 scheduler miscalc completion when gpugrid present'

Message boards : BOINC Manager : 6.4.1 scheduler miscalc completion when gpugrid present
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileJoseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 21486 - Posted: 24 Nov 2008, 13:11:03 UTC
Last modified: 24 Nov 2008, 13:27:11 UTC

I noticed this shortly after I added an nvidia gpu to a Q6700 quad cpu and got into gpugrid: ABChome (for example, 35 in backlog) shows 5.14:33 hours to complete any of the 35 tasks, but it is taking only 2:15 hours. The high priority then kicks in and I have 4 ABC tasks on my Q6700 cpu all taking just over 2 hours to complete. In the meantime, no other tasks other than the CUDA one can running.

I set the %cpu utilization to %75 and the completion time was much more accurate but the CUDA task did not run any faster even though it now had a full CPU core to work with (the other %25). At this point is when I started getting about 50 or more ABC work units. Since it was a complete waste of CPU time to assign a full CPU core to support CUDA, I then reset the utilization back to %100 and that is when the high priority kicked in. When I rebooted the completion time when from under 3 hours (i dont remember exactly) up to 5:14:33 which is not accurate: They still take just over 2 hours.

Anyway, the manager needs to be upgraded to handle CUDA tasks better. The "(0.90 CPUs, 1 CUDA)" does not seem to be indicative of the GPU power. I am probably using less than %1 of a cpu core for CUDA support. ie: A 5 hour CPU completion estimate only takes 2 minutes cpu time, not 5 hours, because the 9800GTX+ handles all the processing.

Is the "0.90 CPUs" phrase a percentage? ie; 0.9 percent? That would more closely match what I am seeing. if it 9/10 of a CPU core I am not seeing anything like that. For the CUDA task, its estimated time (the 5 hours) is more like wall clock time. I am getting on average 3 CUDA tasks completed in 24 hours with a total CPU time of under 8 minutes.

This affects all other tasks, not just ABC, but since ABC hit the high priority first, the other tasks are not running.

FOLLOWUP: Shortly after posting this, one of the ABC tasks completed a WU and all the remaining ABC tasks had their completion re-calculated downward to 4:56:19 from 5:14:33 It will appear the scheduler will eventually get down to a more accurate 2 hour per ABC WU estimate. In the meant time, no other tasks can get a share because of the high priority ABC is stuck in.
Joseph "Beemer Biker" Stateson
ID: 21486 · Report as offensive
ProfileStefan Ledwina
Avatar

Send message
Joined: 25 Nov 05
Posts: 55
Austria
Message 21492 - Posted: 24 Nov 2008, 14:52:57 UTC - in response to Message 21486.  

I set the %cpu utilization to %75 and the completion time was much more accurate but the CUDA task did not run any faster even though it now had a full CPU core to work with (the other %25)


How did you compare if the GPUGRID task was faster or not with CPUs set to 75%?
Have you only compared the CPU time? To see if the task was faster you would have to look at the task itself at the stderr.out. There you can see the speed of the GPU in ms/step.

If you are using Windows, the GPUGRID task should run faster if it gets a full core (depands on the graphics card, but it should be noticeable faster).

On Linux it doesn't matter if the GPUGRID task gets a full core or not, because it needs much less CPU, and the app is also a little bit different...
ID: 21492 · Report as offensive
ProfileJoseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 21496 - Posted: 24 Nov 2008, 16:20:10 UTC - in response to Message 21492.  
Last modified: 24 Nov 2008, 16:30:00 UTC

I am getting 3 work units a day on a vista-64 system. If I allocate %75 then the work units show that 24,627 (avg) cpu seconds were allocated for each task (with standard deviation of 2800 seconds). If I allocate %100 then only about 1771 seconds are "used" by the cpu. However, I still get 3 work units a day whether 1771 seconds are used or 24,627 seconds are used. If I allocate 100% then more of the cpu bound tasks are "running" and I get more credit.

This was just a guesstimate from looking at the "cpu time" back two weeks ago when I was running %75 (or 3 + 1) and changed to %100 (4 + 1)

However, I will look at the ms/step and review my stats. BTW, I am working on comparing GPU statistics using ms/step but am not there yet. I do have this statistical printout from my program
http://swri.info/cpustats/5_stats.txt I did a getHTTP of boincstats about 2 weeks ago and put the results in a MySql database at swri.info/cpustats but have since realized that gpugrid make statistics for various cpu's meaningless as the gpu easily adds 10k or more credits every 24 hours to any cpu that that a PCIe slot for a GPU.

Look at the bottom where there are 27 workunits at 1771 cpu seconds. The StdDev is 5000 seconds because after about 3-4 units at 27,627 I switched to %100 (4 + 1 cpus) and the time dropped down to 1771 cpu seconds (avg). Not shown is the wall clock time but I am estimating the same 3 WU's per 24 hours.

The program I am working on does a getHTTP of the statistics pages of the various projects I am working on and compares them. I ran into a problem when I got to the gpugrid (and milkway also but for a different reason) as gpugrid shows cpu time and that does not seem applicable. You are correct that the ms/step is better but I also need some type of wall clock time to know how long a wu takes as the cpu time is not indicative of work being done.
Joseph "Beemer Biker" Stateson
ID: 21496 · Report as offensive
ProfileStefan Ledwina
Avatar

Send message
Joined: 25 Nov 05
Posts: 55
Austria
Message 21498 - Posted: 24 Nov 2008, 16:51:00 UTC - in response to Message 21496.  

...
I ran into a problem when I got to the gpugrid (and milkway also but for a different reason) as gpugrid shows cpu time and that does not seem applicable. You are correct that the ms/step is better but I also need some type of wall clock time to know how long a wu takes as the cpu time is not indicative of work being done.


Looks like an interesting project you are working on...

Well the GPUGRID application does not only print the ms/step in the stderr.out of the task, but also the approximately elapsed wall clock time, GPU model, number of cores and the shader speed. - Maybe this will be a little help for you?
ID: 21498 · Report as offensive
ProfileJoseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 21514 - Posted: 25 Nov 2008, 7:20:06 UTC - in response to Message 21498.  

I finally got statistical info on my GPU results and there is no more guessing. I was mistaken about the 3 WU per day, it is slightly over 2 per day (11 hours). However, it makes very little difference for GPUgrid credits whether I allocate all the cpus or reserve one for the GPU. It still takes about 11 hours of wall time to get a WU done and the credit is the same irregardless. For my 9800GTX+ I am better off using all 4 cpu's and not setting one aside for it.

Updated results are here
http://swri.info/cpustats/5_stats.txt

I have not run this on anyone elses hostid's except mine so far.

ID: 21514 · Report as offensive

Message boards : BOINC Manager : 6.4.1 scheduler miscalc completion when gpugrid present

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.