Changes between Initial Version and Version 1 of SchedMatch


Ignore:
Timestamp:
Jan 25, 2008, 1:50:44 PM (16 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SchedMatch

    v1 v1  
     1= Scheduler job matchmaking =
     2
     3Currently the scheduler's work-sending algorithm is:
     4 * if host is reliable, scan entire job array, looking for retries
     5 * if using HR, scan entire job array, looking for jobs committed to this HR class
     6 * if still need work, scan entire job array with no restrictions
     7
     8This is bad for several reasons:
     9 * inefficient: repeated array scans
     10 * inflexible policy: why should reliable be more important than HR?
     11 * hard to add new policies, like sending "hard" jobs to fasts hosts
     12
     13To solve these problems we're going to change this part
     14of the scheduler.
     15Basic idea: given a job J and a host H,
     16there is a function V(J, H) that represents the value of sending J to H.
     17V(J, H) might reflect various factors:
     18
     19 * the computational "hardness" of J
     20  * CPU/deadline ratio
     21  * RAM or disk requirements
     22 * H already has files required by J(or jobs already planned for sending have such files)
     23 * J is a retry and H is a fast/reliable host
     24 * J has already been committed to H's HR class
     25
     26BOINC includes a default value function.
     27Projects can tweak its weights, or define their own value function.
     28
     29== Details ==
     30
     31functions:
     32        fast_filter():: feasibility checks that can be done with no DB access
     33        slow_filter():: feasibility checks that need DB access
     34
     35When we scan a slot, the possibilities are:
     36 * slot is empty
     37 * slot is locked by another scheduler
     38 * job fails fast filter
     39 * job fails slow filter
     40 * none of the above
     41
     42Parameters:
     43        N:: scan at least this many slots
     44                (if scan N slots and have enough jobs to send, stop)
     45        M:: scan at most this many slots
     46                (even if don't have enough jobs to send yet)
     47        L:: if scan this many locked slots, print warning msg
     48                (should increase shmem size)
     49
     50logic:
     51{{{
     52acquire semaphore
     53i = random index in shmem
     54x = ordered list of jobs to send (empty)
     55slots_scanned = 0
     56slots_locked = 0
     57loop
     58        i = i+1 % array_size
     59        slots_scanned++
     60        if slots_scanned > M
     61                break
     62        if shmem[i] is empty
     63                continue
     64        if shmem[i] is locked
     65                slots_locked++
     66                continue
     67        j = shmem[i]
     68        if !fast_filter(j) continue;
     69        v = v(h, j)
     70        if v > lowest score in x
     71                lock j
     72                release semaphore
     73                if !slow_filter(j)
     74                        acquire semaphore
     75                        unlock j
     76                        continue
     77                acquire semaphore
     78                if x satisfies client's work request
     79                        unlock and remove lowest-score element of x
     80                add j to x
     81        if x satisfies client's work request and slots_scanned >= N
     82                break;
     83for each job j in x
     84        mark slot j as empty
     85release semaphore
     86if slots_locked > L
     87        print "need bigger array" message
     88}}}