org.apache.manifoldcf.crawler.system
Class WorkerThread

java.lang.Object
  extended by java.lang.Thread
      extended by org.apache.manifoldcf.crawler.system.WorkerThread
All Implemented Interfaces:
java.lang.Runnable

public class WorkerThread
extends java.lang.Thread

This class represents a worker thread. Hundreds of these threads are instantiated in order to perform crawling and extraction.


Nested Class Summary
protected static class WorkerThread.DocumentBin
          DocumentBin class
protected static class WorkerThread.DocumentReference
          Class describing document reference.
protected static class WorkerThread.DocumentToProcess
          Class that represents a decision to process a document.
protected static class WorkerThread.OutputActivity
          The ingest logger class
protected static class WorkerThread.ProcessActivity
          Process activity class wraps access to the ingester and job queue.
protected static class WorkerThread.VersionActivity
          Version activity class wraps access to activity history.
 
Nested classes/interfaces inherited from class java.lang.Thread
java.lang.Thread.State, java.lang.Thread.UncaughtExceptionHandler
 
Field Summary
static java.lang.String _rcsid
           
protected  DocumentQueue documentQueue
           
protected  java.lang.String id
           
protected static int MAX_ADDS_IN_TRANSACTION
          The maximum number of adds that happen in a single transaction
protected  QueueTracker queueTracker
          Queue tracker
protected  WorkerResetManager resetManager
          Worker thread pool reset manager
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Constructor Summary
WorkerThread(java.lang.String id, DocumentQueue documentQueue, WorkerResetManager resetManager, QueueTracker queueTracker)
          Constructor.
 
Method Summary
protected static boolean compareArrays(java.lang.String[] array1, java.lang.String[] array2)
          Compare two sorted collection names lists.
protected static void processDeleteLists(java.lang.String outputName, IRepositoryConnector connector, IRepositoryConnection connection, IJobManager jobManager, java.util.ArrayList jobmanagerDeleteList, IIncrementalIngester ingester, java.util.ArrayList ingesterDeleteList, java.util.ArrayList ingesterDeleteListUnhashed, java.lang.Long jobID, java.lang.String[] legalLinkTypes, WorkerThread.OutputActivity ingestLogger, int hopcountMethod, QueueTracker queueTracker, long currentTime)
          Clear specified documents out of the job queue and from the appliance.
protected static void requeueDocuments(IJobManager jobManager, java.util.ArrayList requeueList, long retryTime, long failTime, int failCount)
          Requeue documents after a service interruption was detected.
 void run()
           
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

_rcsid

public static final java.lang.String _rcsid
See Also:
Constant Field Values

id

protected java.lang.String id

documentQueue

protected DocumentQueue documentQueue

resetManager

protected WorkerResetManager resetManager
Worker thread pool reset manager


queueTracker

protected QueueTracker queueTracker
Queue tracker


MAX_ADDS_IN_TRANSACTION

protected static final int MAX_ADDS_IN_TRANSACTION
The maximum number of adds that happen in a single transaction

See Also:
Constant Field Values
Constructor Detail

WorkerThread

public WorkerThread(java.lang.String id,
                    DocumentQueue documentQueue,
                    WorkerResetManager resetManager,
                    QueueTracker queueTracker)
             throws ManifoldCFException
Constructor.

Parameters:
id - is the worker thread id.
Throws:
ManifoldCFException
Method Detail

run

public void run()
Specified by:
run in interface java.lang.Runnable
Overrides:
run in class java.lang.Thread

compareArrays

protected static boolean compareArrays(java.lang.String[] array1,
                                       java.lang.String[] array2)
Compare two sorted collection names lists.


processDeleteLists

protected static void processDeleteLists(java.lang.String outputName,
                                         IRepositoryConnector connector,
                                         IRepositoryConnection connection,
                                         IJobManager jobManager,
                                         java.util.ArrayList jobmanagerDeleteList,
                                         IIncrementalIngester ingester,
                                         java.util.ArrayList ingesterDeleteList,
                                         java.util.ArrayList ingesterDeleteListUnhashed,
                                         java.lang.Long jobID,
                                         java.lang.String[] legalLinkTypes,
                                         WorkerThread.OutputActivity ingestLogger,
                                         int hopcountMethod,
                                         QueueTracker queueTracker,
                                         long currentTime)
                                  throws ManifoldCFException
Clear specified documents out of the job queue and from the appliance.

Parameters:
outputName - is the output connection name.
jobManager - is the job manager.
jobmanagerDeleteList - is a list of QueuedDocument objects to clean out.
ingester - is the handle to the incremental ingestion API control object.
ingesterDeleteList - is a list of document id's to delete.
Throws:
ManifoldCFException

requeueDocuments

protected static void requeueDocuments(IJobManager jobManager,
                                       java.util.ArrayList requeueList,
                                       long retryTime,
                                       long failTime,
                                       int failCount)
                                throws ManifoldCFException
Requeue documents after a service interruption was detected.

Parameters:
jobManager - is the job manager object.
requeueList - is a list of QueuedDocument objects describing what needs to be requeued.
retryTime - is the time that the first retry ought to be scheduled for.
failTime - is the time beyond which retries lead to hard failure.
failCount - is the number of retries allowed until hard failure.
Throws:
ManifoldCFException