org.apache.manifoldcf.crawler.jobs
Class JobQueue

java.lang.Object
  extended by org.apache.manifoldcf.core.database.BaseTable
      extended by org.apache.manifoldcf.crawler.jobs.JobQueue

public class JobQueue
extends BaseTable

This is the job queue manager class. It is responsible for managing the jobqueue database table.


Nested Class Summary
protected static class JobQueue.DuplicateFinder
           
 
Field Summary
static java.lang.String _rcsid
           
static int ACTION_REMOVE
           
static int ACTION_RESCAN
           
protected static java.util.Map actionMap
           
static java.lang.String checkActionField
           
static java.lang.String checkTimeField
           
static java.lang.String docHashField
           
static java.lang.String docIDField
           
static java.lang.String docPriorityField
           
static java.lang.String failCountField
           
static java.lang.String failTimeField
           
static java.lang.String idField
           
static java.lang.String isSeedField
           
static java.lang.String jobIDField
           
protected  PrereqEventManager prereqEventManager
          Prerequisite event manager
static java.lang.String prioritySetField
           
static int SEEDSTATUS_NEWSEED
           
static int SEEDSTATUS_NOTSEED
           
static int SEEDSTATUS_SEED
           
protected static java.util.Map seedstatusMap
           
static int STATUS_ACTIVE
           
static int STATUS_ACTIVENEEDRESCAN
           
static int STATUS_ACTIVENEEDRESCANPURGATORY
           
static int STATUS_ACTIVEPURGATORY
           
static int STATUS_BEINGCLEANED
           
static int STATUS_BEINGDELETED
           
static int STATUS_COMPLETE
           
static int STATUS_ELIGIBLEFORDELETE
           
static int STATUS_PENDING
           
static int STATUS_PENDINGPURGATORY
           
static int STATUS_PURGATORY
           
static java.lang.String statusField
           
protected static java.util.Map statusMap
           
protected  IThreadContext threadContext
          Thread context
 
Fields inherited from class org.apache.manifoldcf.core.database.BaseTable
dbInterface, tableName
 
Constructor Summary
JobQueue(IThreadContext tc, IDBInterface database)
          Constructor.
 
Method Summary
static java.lang.String actionToString(int action)
          Convert integer to action string
 void addRemainingDocumentsInitial(java.lang.Long jobID, java.lang.String[] docIDHashes)
          Note the remaining documents that do NOT need to be queued.
 boolean checkJobBusy(java.lang.Long jobID)
          Check if there are any outstanding active documents for a job
 void clearFailTimes(java.lang.Long jobID)
          Clear the failtimes for all documents associated with a job.
 void deinstall()
          Uninstall.
 void deleteAllJobRecords(java.lang.Long jobID)
          For a job deletion: Delete all records for a job.
 void deleteIngestedDocumentIdentifiers(DocumentDescription[] identifiers)
          Delete ingested document identifiers (as part of deleting the owning job).
 void deleteRecord(java.lang.Long id)
          Remove a record entirely.
 void deleteRecordMultiple(java.lang.Long[] ids)
          Remove multiple records entirely.
protected  void doDeletes(java.util.ArrayList list, java.lang.String queryPart)
          Do a batch of deletes.
 void doneDocumentsInitial(java.lang.Long jobID, boolean isPartial)
          Complete the initial set of documents.
 java.lang.String[] getAllSeeds(java.lang.Long jobID)
          Get all the current seeds.
static java.lang.String getHashCode(java.lang.String documentIdentifier)
          Get a hash value from a document id string.
 void insertNewRecord(java.lang.Long jobID, java.lang.String docIDHash, java.lang.String docID, double desiredDocPriority, long desiredExecuteTime, long currentTime, java.lang.String[] prereqEvents)
          Insert a new record into the jobqueue table (as part of adding a child reference).
 void insertNewRecordInitial(java.lang.Long jobID, java.lang.String docHash, java.lang.String docID, double desiredDocPriority, long desiredExecuteTime, long currentTime, java.lang.String[] prereqEvents)
          Insert a new record into the jobqueue table (as part of adding an initial reference).
 void install(java.lang.String jobsTable, java.lang.String jobsColumn)
          Install or upgrade.
 void prepareDeleteScan(java.lang.Long jobID)
          Prepare for a job delete pass.
 void prepareFullScan(java.lang.Long jobID)
          Prepare for a "full scan" job.
 void prepareIncrementalScan(java.lang.Long jobID)
          Prepare for an "incremental" job.
protected  void processRemainingDocuments(java.util.Map idMap, java.lang.String query, java.util.ArrayList list, java.util.Map inSet)
          Process the specified set of documents.
 void resetDocCleanupWorkerStatus()
          Reset doc cleaning worker status.
 void resetDocDeleteWorkerStatus()
          Reset doc delete worker status.
 void resetDocumentWorkerStatus()
          Reset as part of restoring document worker threads.
 void restart()
          Restart.
static java.lang.String seedstatusToString(int status)
          Convert seedstatus value to a string.
 void setCleaningStatus(java.lang.Long id)
          Set the status of a document to "being cleaned".
 void setDeletingStatus(java.lang.Long id)
          Set the status of a document to "being deleted".
 void setStatus(java.lang.Long id, int status, java.lang.Long checkTime, int action, long failTime, int failCount)
          Set the status on a record, including check time and priority.
 void setUncleaningStatus(java.lang.Long id, long checkTime)
          Set the status of a document to be "no longer cleaning"
 void setUndeletingStatus(java.lang.Long id, long checkTime)
          Set the status of a document to be "no longer deleting"
static java.lang.String statusToString(int status)
          Convert status to string.
static int stringToAction(java.lang.String value)
          Convert action field value to integer.
static int stringToSeedstatus(java.lang.String x)
          Convert seedstatus field value to a boolean.
static int stringToStatus(java.lang.String value)
          Convert status field value to integer.
 void unconditionallyAnalyzeTables()
          Analyze job tables due to major event
 void updateActiveRecord(java.lang.Long id, int currentStatus)
          Set the status to active on a record, leaving alone priority or check time.
 void updateCompletedRecord(java.lang.Long recID, int currentStatus)
          Set the "completed" status for a record.
 boolean updateExistingRecord(java.lang.Long recordID, int currentStatus, java.lang.Long checkTimeValue, long desiredExecuteTime, long currentTime, boolean otherChangesSeen, double desiredPriority, java.lang.String[] prereqEvents)
          Update an existing record (as the result of a reference add).
 boolean updateExistingRecordInitial(java.lang.Long recordID, int currentStatus, java.lang.Long checkTimeValue, long desiredExecuteTime, long currentTime, double desiredPriority, java.lang.String[] prereqEvents)
          Update an existing record (as the result of an initial add).
protected  void updateRemainingDocuments(java.lang.String query, java.util.ArrayList list)
          Update the specified set of documents to be "NEWSEED"
 void writeDocPriority(long currentTime, java.lang.Long rowID, double priority)
          Write out a document priority
 
Methods inherited from class org.apache.manifoldcf.core.database.BaseTable
addTableIndex, analyzeTable, beginTransaction, constructDistinctOnClause, constructOffsetLimitClause, constructRegexpClause, constructSubstringClause, endTransaction, getDatabaseCacheKey, getDBInterface, getMaxInClause, getMaxOrClause, getTableIndexes, getTableName, getTableSchema, getTransactionID, makeTableKey, noteModifications, performAddIndex, performAlter, performCreate, performDelete, performDrop, performInsert, performLock, performModification, performQuery, performQuery, performRemoveIndex, performUpdate, prepareRowForSave, readRow, reindexTable, signalRollback
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_rcsid

public static final java.lang.String _rcsid
See Also:
Constant Field Values

SEEDSTATUS_NOTSEED

public static final int SEEDSTATUS_NOTSEED
See Also:
Constant Field Values

SEEDSTATUS_SEED

public static final int SEEDSTATUS_SEED
See Also:
Constant Field Values

SEEDSTATUS_NEWSEED

public static final int SEEDSTATUS_NEWSEED
See Also:
Constant Field Values

STATUS_PENDING

public static final int STATUS_PENDING
See Also:
Constant Field Values

STATUS_ACTIVE

public static final int STATUS_ACTIVE
See Also:
Constant Field Values

STATUS_COMPLETE

public static final int STATUS_COMPLETE
See Also:
Constant Field Values

STATUS_PENDINGPURGATORY

public static final int STATUS_PENDINGPURGATORY
See Also:
Constant Field Values

STATUS_ACTIVEPURGATORY

public static final int STATUS_ACTIVEPURGATORY
See Also:
Constant Field Values

STATUS_PURGATORY

public static final int STATUS_PURGATORY
See Also:
Constant Field Values

STATUS_BEINGDELETED

public static final int STATUS_BEINGDELETED
See Also:
Constant Field Values

STATUS_ACTIVENEEDRESCAN

public static final int STATUS_ACTIVENEEDRESCAN
See Also:
Constant Field Values

STATUS_ACTIVENEEDRESCANPURGATORY

public static final int STATUS_ACTIVENEEDRESCANPURGATORY
See Also:
Constant Field Values

STATUS_BEINGCLEANED

public static final int STATUS_BEINGCLEANED
See Also:
Constant Field Values

STATUS_ELIGIBLEFORDELETE

public static final int STATUS_ELIGIBLEFORDELETE
See Also:
Constant Field Values

ACTION_RESCAN

public static final int ACTION_RESCAN
See Also:
Constant Field Values

ACTION_REMOVE

public static final int ACTION_REMOVE
See Also:
Constant Field Values

idField

public static final java.lang.String idField
See Also:
Constant Field Values

jobIDField

public static final java.lang.String jobIDField
See Also:
Constant Field Values

docHashField

public static final java.lang.String docHashField
See Also:
Constant Field Values

docIDField

public static final java.lang.String docIDField
See Also:
Constant Field Values

checkTimeField

public static final java.lang.String checkTimeField
See Also:
Constant Field Values

statusField

public static final java.lang.String statusField
See Also:
Constant Field Values

failTimeField

public static final java.lang.String failTimeField
See Also:
Constant Field Values

failCountField

public static final java.lang.String failCountField
See Also:
Constant Field Values

isSeedField

public static final java.lang.String isSeedField
See Also:
Constant Field Values

docPriorityField

public static final java.lang.String docPriorityField
See Also:
Constant Field Values

prioritySetField

public static final java.lang.String prioritySetField
See Also:
Constant Field Values

checkActionField

public static final java.lang.String checkActionField
See Also:
Constant Field Values

statusMap

protected static java.util.Map statusMap

seedstatusMap

protected static java.util.Map seedstatusMap

actionMap

protected static java.util.Map actionMap

prereqEventManager

protected PrereqEventManager prereqEventManager
Prerequisite event manager


threadContext

protected IThreadContext threadContext
Thread context

Constructor Detail

JobQueue

public JobQueue(IThreadContext tc,
                IDBInterface database)
         throws ManifoldCFException
Constructor.

Parameters:
database - is the database handle.
Throws:
ManifoldCFException
Method Detail

install

public void install(java.lang.String jobsTable,
                    java.lang.String jobsColumn)
             throws ManifoldCFException
Install or upgrade.

Throws:
ManifoldCFException

unconditionallyAnalyzeTables

public void unconditionallyAnalyzeTables()
                                  throws ManifoldCFException
Analyze job tables due to major event

Throws:
ManifoldCFException

deinstall

public void deinstall()
               throws ManifoldCFException
Uninstall.

Throws:
ManifoldCFException

restart

public void restart()
             throws ManifoldCFException
Restart. This method should be called at initial startup time. It resets the status of all documents to something reasonable, so the jobs can be restarted and work properly to completion.

Throws:
ManifoldCFException

clearFailTimes

public void clearFailTimes(java.lang.Long jobID)
                    throws ManifoldCFException
Clear the failtimes for all documents associated with a job. This method is called when the system detects that a significant delaying event has occurred, and therefore the "failure clock" needs to be reset.

Parameters:
jobID - is the job identifier.
Throws:
ManifoldCFException

resetDocumentWorkerStatus

public void resetDocumentWorkerStatus()
                               throws ManifoldCFException
Reset as part of restoring document worker threads. This will get called if something went wrong that could have screwed up the status of a worker thread. The threads all die/end, and this method resets any active documents back to the right state (waiting for stuffing).

Throws:
ManifoldCFException

resetDocDeleteWorkerStatus

public void resetDocDeleteWorkerStatus()
                                throws ManifoldCFException
Reset doc delete worker status.

Throws:
ManifoldCFException

resetDocCleanupWorkerStatus

public void resetDocCleanupWorkerStatus()
                                 throws ManifoldCFException
Reset doc cleaning worker status.

Throws:
ManifoldCFException

prepareDeleteScan

public void prepareDeleteScan(java.lang.Long jobID)
                       throws ManifoldCFException
Prepare for a job delete pass. This will not be called unless the job is in an INACTIVE state. Does the following: (1) Delete PENDING entries (2) Maps PENDINGPURGATORY, PURGATORY, and COMPLETED entries to ELIGIBLEFORDELETE

Parameters:
jobID - is the job identifier.
Throws:
ManifoldCFException

prepareFullScan

public void prepareFullScan(java.lang.Long jobID)
                     throws ManifoldCFException
Prepare for a "full scan" job. This will not be called unless the job is in the "INACTIVE" state. This does the following: (1) get rid of all PENDING entries. (2) map PENDINGPURGATORY entries to PURGATORY. (4) map COMPLETED entries to PURGATORY.

Parameters:
jobID - is the job identifier.
Throws:
ManifoldCFException

prepareIncrementalScan

public void prepareIncrementalScan(java.lang.Long jobID)
                            throws ManifoldCFException
Prepare for an "incremental" job. This is called ONLY when the job is inactive; that is, there should be no ACTIVE or ACTIVEPURGATORY entries at all. The preparation for starting an incremental job is to requeue all documents that are currently in the system that are marked "COMPLETE". These get marked as "PENDINGPURGATORY", since the idea is to queue them in such a way that we know they were ingested before.

Parameters:
jobID - is the job identifier.
Throws:
ManifoldCFException

deleteIngestedDocumentIdentifiers

public void deleteIngestedDocumentIdentifiers(DocumentDescription[] identifiers)
                                       throws ManifoldCFException
Delete ingested document identifiers (as part of deleting the owning job). The number of identifiers specified is guaranteed to be less than the maxInClauseCount for the database.

Parameters:
identifiers - is the set of document identifiers.
Throws:
ManifoldCFException

checkJobBusy

public boolean checkJobBusy(java.lang.Long jobID)
                     throws ManifoldCFException
Check if there are any outstanding active documents for a job

Throws:
ManifoldCFException

deleteAllJobRecords

public void deleteAllJobRecords(java.lang.Long jobID)
                         throws ManifoldCFException
For a job deletion: Delete all records for a job.

Parameters:
jobID - is the job identifier.
Throws:
ManifoldCFException

writeDocPriority

public void writeDocPriority(long currentTime,
                             java.lang.Long rowID,
                             double priority)
                      throws ManifoldCFException
Write out a document priority

Throws:
ManifoldCFException

updateCompletedRecord

public void updateCompletedRecord(java.lang.Long recID,
                                  int currentStatus)
                           throws ManifoldCFException
Set the "completed" status for a record.

Throws:
ManifoldCFException

updateActiveRecord

public void updateActiveRecord(java.lang.Long id,
                               int currentStatus)
                        throws ManifoldCFException
Set the status to active on a record, leaving alone priority or check time.

Parameters:
id - is the job queue id.
currentStatus - is the current status
Throws:
ManifoldCFException

setStatus

public void setStatus(java.lang.Long id,
                      int status,
                      java.lang.Long checkTime,
                      int action,
                      long failTime,
                      int failCount)
               throws ManifoldCFException
Set the status on a record, including check time and priority.

Parameters:
id - is the job queue id.
status - is the desired status
checkTime - is the check time.
Throws:
ManifoldCFException

setDeletingStatus

public void setDeletingStatus(java.lang.Long id)
                       throws ManifoldCFException
Set the status of a document to "being deleted".

Throws:
ManifoldCFException

setUndeletingStatus

public void setUndeletingStatus(java.lang.Long id,
                                long checkTime)
                         throws ManifoldCFException
Set the status of a document to be "no longer deleting"

Throws:
ManifoldCFException

setCleaningStatus

public void setCleaningStatus(java.lang.Long id)
                       throws ManifoldCFException
Set the status of a document to "being cleaned".

Throws:
ManifoldCFException

setUncleaningStatus

public void setUncleaningStatus(java.lang.Long id,
                                long checkTime)
                         throws ManifoldCFException
Set the status of a document to be "no longer cleaning"

Throws:
ManifoldCFException

deleteRecordMultiple

public void deleteRecordMultiple(java.lang.Long[] ids)
                          throws ManifoldCFException
Remove multiple records entirely.

Parameters:
ids - is the set of job queue id's
Throws:
ManifoldCFException

doDeletes

protected void doDeletes(java.util.ArrayList list,
                         java.lang.String queryPart)
                  throws ManifoldCFException
Do a batch of deletes.

Throws:
ManifoldCFException

deleteRecord

public void deleteRecord(java.lang.Long id)
                  throws ManifoldCFException
Remove a record entirely.

Parameters:
id - is the job queue id.
Throws:
ManifoldCFException

updateExistingRecordInitial

public boolean updateExistingRecordInitial(java.lang.Long recordID,
                                           int currentStatus,
                                           java.lang.Long checkTimeValue,
                                           long desiredExecuteTime,
                                           long currentTime,
                                           double desiredPriority,
                                           java.lang.String[] prereqEvents)
                                    throws ManifoldCFException
Update an existing record (as the result of an initial add). The record is presumed to exist and have been locked, via "FOR UPDATE".

Throws:
ManifoldCFException

insertNewRecordInitial

public void insertNewRecordInitial(java.lang.Long jobID,
                                   java.lang.String docHash,
                                   java.lang.String docID,
                                   double desiredDocPriority,
                                   long desiredExecuteTime,
                                   long currentTime,
                                   java.lang.String[] prereqEvents)
                            throws ManifoldCFException
Insert a new record into the jobqueue table (as part of adding an initial reference).

Parameters:
jobID - is the job identifier.
docHash - is the hash of the local document identifier.
docID - is the local document identifier.
Throws:
ManifoldCFException

addRemainingDocumentsInitial

public void addRemainingDocumentsInitial(java.lang.Long jobID,
                                         java.lang.String[] docIDHashes)
                                  throws ManifoldCFException
Note the remaining documents that do NOT need to be queued. These are noted so that the doneDocumentsInitial() method does not clean up seeds from previous runs wrongly.

Throws:
ManifoldCFException

processRemainingDocuments

protected void processRemainingDocuments(java.util.Map idMap,
                                         java.lang.String query,
                                         java.util.ArrayList list,
                                         java.util.Map inSet)
                                  throws ManifoldCFException
Process the specified set of documents.

Throws:
ManifoldCFException

updateRemainingDocuments

protected void updateRemainingDocuments(java.lang.String query,
                                        java.util.ArrayList list)
                                 throws ManifoldCFException
Update the specified set of documents to be "NEWSEED"

Throws:
ManifoldCFException

doneDocumentsInitial

public void doneDocumentsInitial(java.lang.Long jobID,
                                 boolean isPartial)
                          throws ManifoldCFException
Complete the initial set of documents. This method converts the seeding statuses for the job to their steady-state values. SEEDSTATUS_SEED becomes SEEDSTATUS_NOTSEED, and SEEDSTATUS_NEWSEED becomes SEEDSTATUS_SEED. If the seeding was partial, then all previous seeds are preserved as such.

Parameters:
jobID - is the job identifier.
isPartial - is true of the passed list of seeds is not complete.
Throws:
ManifoldCFException

getAllSeeds

public java.lang.String[] getAllSeeds(java.lang.Long jobID)
                               throws ManifoldCFException
Get all the current seeds. Returns the seed document identifiers for a job.

Parameters:
jobID - is the job identifier.
Returns:
the document identifier hashes that are currently considered to be seeds.
Throws:
ManifoldCFException

updateExistingRecord

public boolean updateExistingRecord(java.lang.Long recordID,
                                    int currentStatus,
                                    java.lang.Long checkTimeValue,
                                    long desiredExecuteTime,
                                    long currentTime,
                                    boolean otherChangesSeen,
                                    double desiredPriority,
                                    java.lang.String[] prereqEvents)
                             throws ManifoldCFException
Update an existing record (as the result of a reference add). The record is presumed to exist and have been locked, via "FOR UPDATE".

Throws:
ManifoldCFException

insertNewRecord

public void insertNewRecord(java.lang.Long jobID,
                            java.lang.String docIDHash,
                            java.lang.String docID,
                            double desiredDocPriority,
                            long desiredExecuteTime,
                            long currentTime,
                            java.lang.String[] prereqEvents)
                     throws ManifoldCFException
Insert a new record into the jobqueue table (as part of adding a child reference).

Throws:
ManifoldCFException

seedstatusToString

public static java.lang.String seedstatusToString(int status)
                                           throws ManifoldCFException
Convert seedstatus value to a string.

Throws:
ManifoldCFException

stringToSeedstatus

public static int stringToSeedstatus(java.lang.String x)
                              throws ManifoldCFException
Convert seedstatus field value to a boolean.

Throws:
ManifoldCFException

stringToAction

public static int stringToAction(java.lang.String value)
                          throws ManifoldCFException
Convert action field value to integer.

Throws:
ManifoldCFException

actionToString

public static java.lang.String actionToString(int action)
                                       throws ManifoldCFException
Convert integer to action string

Throws:
ManifoldCFException

stringToStatus

public static int stringToStatus(java.lang.String value)
                          throws ManifoldCFException
Convert status field value to integer.

Parameters:
value - is the string.
Returns:
the integer.
Throws:
ManifoldCFException

statusToString

public static java.lang.String statusToString(int status)
                                       throws ManifoldCFException
Convert status to string.

Parameters:
status - is the status value.
Returns:
the database string.
Throws:
ManifoldCFException

getHashCode

public static java.lang.String getHashCode(java.lang.String documentIdentifier)
                                    throws ManifoldCFException
Get a hash value from a document id string. This will convert the string into something that can fit in 20 characters. (Someday this will be an MD5 hash, but for now just use java hashing.)

Parameters:
documentIdentifier - is the input document id string.
Returns:
the hash code.
Throws:
ManifoldCFException