Changes between Initial Version and Version 1 of AdaptiveReplication


Ignore:
Timestamp:
Jun 3, 2008, 10:12:09 AM (16 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AdaptiveReplication

    v1 v1  
     1= Adaptive replication =
     2
     3BOINC's current replication policy replicates a job even if
     4one of the hosts is known to be highly reliable.
     5The overhead of replication is high - at least 50% of total CPU time
     6is spent checking validity.
     7
     8'''Adaptive replication''' is an optional policy that avoids replicating a job
     9if it has been sent to a highly reliable host.
     10The goal of this policy is to provide a target level of confidence
     11with minimal overhead - perhaps only 5% or 10% of total CPU time.
     12
     13== Policy ==
     14
     15BOINC maintains an estimate E(H) of host H's recent error rate.
     16This is maintained as follows:
     17
     18 * It is initialized to 0.1
     19 * It is multiplied by 0.95 when H reports a correct (replicated) result.
     20 * It is incremented by 0.05 when H reports an incorrect (replicated) result.
     21
     22Thus, it takes a long time to earn a good reputation
     23and a short time to lose it.
     24
     25The adaptive replication policy is as follows.
     26
     27 * Each job is initially marked as unreplicated.
     28 * On each request, the scheduler decides whether to trust the host as follows:
     29  * If E(H) > A, don't trust the host.
     30  * Otherwise, trust the host with probability 1 - E(H)/A.
     31 * If we decide to trust the host, preferentially send it unreplicated jobs.
     32 * Otherwise, preferentially send it replicated jobs.  If we have to send it an unreplicated job, mark it as replicated and create new instances accordingly.
     33
     34== Implementation ==
     35
     36Database:
     37 * Add "target_nresults" field to app table.  Default is zero (app doesn't use adaptive replication).
     38
     39Scheduler:
     40 * Decide whether to trust host as described above.
     41 * If we send an unreplicated job (i.e., target_nresults=1 and app.target_nresults>1) to an untrusted host, set wu.target_nresults = app.target_nresults and flag the WU for transitioning.
     42
     43Validator:
     44 * Don't update host.error_rate for unreplicated results (i.e., wu.target_nresults=1 and app.target_nresults>1).