|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.manifoldcf.crawler.interfaces.QueueTracker
public class QueueTracker
This class attempts to provide document priorities in order to acheive as much balance as possible between documents having different bins. A document's priority assignment takes place at the time the document is added to the queue, and will be recalculated when a job is aborted, or when the crawler daemon is started. The document priorities are strictly obeyed when documents are chosen from the queue and handed to worker threads; higher-priority documents always have precedence, except due to deliberate priority adjustment specified by the job priority. The priority values themselves are logarithmic: 0.0 is the highest, and the larger the number, the lower the priority. The basis for the calculation for each document priority handed out by this module are: - number of documents having a given bin (tracked) - performance of a connection (gathered through statistics) - throttling that applies to the each document bin The queuing prioritization model hooks into the document lifecycle in the following places: (1) When a document is added to the queue (and thus when its priority is handed out) (2) When documents that were *supposed* to be added to the queue turned out to already be there and already have an established priority, (in which case the priority that was handed out before is returned to the pool for reuse) (3) When a document is pulled from the database queue (which sets the current highest priority level that should not be exceeded in step (1)) The assignment prioritization model is largely independent of the queuing prioritization model, and is used to select among documents that have been marked "active" as they are handed to worker threads. These events cause information to be logged: (1) When a document is handed to a worker thread (2) When the worker thread completes the document
| Nested Class Summary | |
|---|---|
protected static class |
QueueTracker.BinCount
This is the class which allows a mutable integer count value to be saved in the bincount table. |
protected static class |
QueueTracker.DoubleBinCount
This is the class which allows a mutable integer count value to be saved in the bincount table. |
protected static class |
QueueTracker.PriorityKey
This is the key class for the availablePriorities table |
protected static class |
QueueTracker.ThrottleLimits
This class represents the throttle limits out of the connection specification |
protected static class |
QueueTracker.ThrottleLimitSpec
This is a class which describes an individual throttle limit, in fetches per millisecond. |
| Field Summary | |
|---|---|
static java.lang.String |
_rcsid
|
protected java.util.HashMap |
activeBinCounts
These are the bin counts for active threads |
protected java.util.HashMap |
availablePriorities
This hash table is keyed by PriorityKey objects, and contains ArrayList objects containing Doubles, in sorted order. |
protected java.util.HashMap |
binCounts
These are the bin counts for a prioritization pass. |
protected java.util.HashMap |
binDependencies
This hash table is keyed by a String (which is the bin name), and contains a HashMap of PriorityKey objects containing that String as a bin |
protected static double |
binReductionFactor
Factor by which bins are reduced |
protected double |
currentMinimumDepth
The "minimum depth" - which is the smallest bin count of the last document queued. |
protected PerformanceStatistics |
performanceStatistics
These are the accumulated performance averages for all connections etc. |
protected java.util.HashMap |
queuedBinCounts
These are the bin counts for tracking the documents that are on the active queue, but are not being processed yet |
protected boolean |
resetInProgress
This flag, when set, indicates that a reset is in progress, so queuetracker bincount updates are ignored. |
| Constructor Summary | |
|---|---|
QueueTracker()
Constructor |
|
| Method Summary | |
|---|---|
void |
addRecord(java.lang.String[] binNames)
Add an access record to the queue tracker. |
void |
assessMinimumDepth(java.lang.Double[] binNamesSet)
Assess the current minimum depth. |
void |
beginProcessing(java.lang.String[] binNames)
Note that we are beginning processing for a document with a particular set of bins. |
void |
beginReset()
Reset the queue tracker. |
double |
calculateAssignmentRating(java.lang.String[] binNames,
IRepositoryConnection connection)
Calculate an assignment rating for a set of bins based on what's currently in use. |
protected double[] |
calculateMaxFetchRates(java.lang.String[] binNames,
IRepositoryConnection connection)
Calculate the maximum fetch rate for a given set of bins for a given connection. |
double |
calculatePriority(java.lang.String[] binNames,
IRepositoryConnection connection)
Calculate a document priority value. |
void |
endProcessing(java.lang.String[] binNames)
Note that we have completed processing of a document with a given set of bins. |
void |
endReset()
Finish the reset operation |
PerformanceStatistics |
getCurrentStatistics()
Obtain the current performance statistics object |
void |
noteConnectionPerformance(int docCount,
java.lang.String connectionName,
long elapsedTime)
Note the time required to successfully complete a set of documents. |
void |
notePriorityNotUsed(java.lang.String[] binNames,
IRepositoryConnection connection,
double priority)
Note that a priority which was previously allocated was not used, and needs to be released. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String _rcsid
protected static final double binReductionFactor
protected PerformanceStatistics performanceStatistics
protected java.util.HashMap binCounts
protected java.util.HashMap queuedBinCounts
protected java.util.HashMap activeBinCounts
protected double currentMinimumDepth
protected boolean resetInProgress
protected java.util.HashMap availablePriorities
protected java.util.HashMap binDependencies
| Constructor Detail |
|---|
public QueueTracker()
| Method Detail |
|---|
public void beginReset()
public void endReset()
public void addRecord(java.lang.String[] binNames)
binNames - are the set of bins, as returned from the connector in question, for
the document that is being queued. These bins are considered global in nature.
public void notePriorityNotUsed(java.lang.String[] binNames,
IRepositoryConnection connection,
double priority)
public void noteConnectionPerformance(int docCount,
java.lang.String connectionName,
long elapsedTime)
public PerformanceStatistics getCurrentStatistics()
public void beginProcessing(java.lang.String[] binNames)
public void assessMinimumDepth(java.lang.Double[] binNamesSet)
binNamesSet - is the current set of priorities we see on the queuing operation.public void endProcessing(java.lang.String[] binNames)
public double calculateAssignmentRating(java.lang.String[] binNames,
IRepositoryConnection connection)
public double calculatePriority(java.lang.String[] binNames,
IRepositoryConnection connection)
binNames - are the global bins to which the document belongs.connection - is the connection, from which the throttles may be obtained. More highly throttled connections are given
less favorable priority.
protected double[] calculateMaxFetchRates(java.lang.String[] binNames,
IRepositoryConnection connection)
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||