org.apache.manifoldcf.crawler.jobs
Class JobManager

java.lang.Object
  extended by org.apache.manifoldcf.crawler.jobs.JobManager
All Implemented Interfaces:
IJobManager

public class JobManager
extends java.lang.Object
implements IJobManager

This is the main job manager. It provides methods that support both job definition, and the threads that execute the jobs.


Nested Class Summary
protected static class JobManager.JobqueueRecord
          Class for tracking existing jobqueue row data
protected static class JobManager.MutableInteger
          Mutable integer class.
protected static class JobManager.QueueHashItem
          This class contains information per job on how many queue items have so far been accumulated.
protected static class JobManager.ThrottleJobItem
          This class represents the information stored PER JOB in the throttling structure.
protected static class JobManager.ThrottleLimit
          This class provides the throttling limits for the job queueing query.
protected static class JobManager.ThrottleLimitSpec
          This is a class which describes an individual throttle limit, in fetches.
 
Field Summary
static java.lang.String _rcsid
           
protected  Carrydown carryDown
           
protected  IRepositoryConnectionManager connectionMgr
           
protected  IDBInterface database
           
protected  EventManager eventManager
           
protected  HopCount hopCount
           
protected static java.lang.String hopLock
           
protected  JobQueue jobQueue
           
protected  Jobs jobs
           
protected  ILockManager lockManager
           
protected  IOutputConnectionManager outputMgr
           
protected static java.util.Random random
           
protected  IThreadContext threadContext
           
 
Fields inherited from interface org.apache.manifoldcf.crawler.interfaces.IJobManager
ACTION_REMOVE, ACTION_RESCAN, DOCSTATE_NEVERPROCESSED, DOCSTATE_PREVIOUSLYPROCESSED, DOCSTATUS_DELETING, DOCSTATUS_EXPIRING, DOCSTATUS_INACTIVE, DOCSTATUS_PROCESSING, DOCSTATUS_READYFOREXPIRATION, DOCSTATUS_READYFORPROCESSING, DOCSTATUS_WAITINGFOREVER, DOCSTATUS_WAITINGFOREXPIRATION, DOCSTATUS_WAITINGFORPROCESSING
 
Constructor Summary
JobManager(IThreadContext threadContext, IDBInterface database)
          Constructor.
 
Method Summary
protected  void addBucketExtract(java.lang.StringBuffer sb, java.util.ArrayList list, java.lang.String columnPrefix, java.lang.String columnName, BucketDescription bucketDesc)
          Turn a bucket description into a return column.
protected  boolean addCriteria(java.lang.StringBuffer sb, java.util.ArrayList list, java.lang.String fieldPrefix, java.lang.String connectionName, StatusFilterCriteria criteria, boolean whereEmitted)
          Add criteria clauses to query.
 boolean addDocument(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String docIDHash, java.lang.String docID, java.lang.String parentIdentifierHash, java.lang.String relationshipType, int hopcountMethod, java.lang.String[] dataNames, java.lang.Object[][] dataValues, long currentTime, double priority, java.lang.String[] prereqEventNames)
          Add a document to the queue.
protected  void addDocumentCriteria(java.lang.StringBuffer sb, java.util.ArrayList list, java.lang.Long currentTimeValue, java.lang.Long currentPriorityValue)
           
 boolean[] addDocuments(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] docIDHashes, java.lang.String[] docIDs, java.lang.String parentIdentifierHash, java.lang.String relationshipType, int hopcountMethod, java.lang.String[][] dataNames, java.lang.Object[][][] dataValues, long currentTime, double[] documentPriorities, java.lang.String[][] prereqEventNames)
          Add documents to the queue in bulk.
 boolean[] addDocumentsInitial(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] docIDHashes, java.lang.String[] docIDs, boolean overrideSchedule, int hopcountMethod, long currentTime, double[] documentPriorities, java.lang.String[][] prereqEventNames)
          Add an initial set of documents to the queue.
protected  void addLimits(java.lang.StringBuffer sb, int startRow, int maxRowCount)
          Add limit and offset.
protected  void addOrdering(java.lang.StringBuffer sb, java.lang.String[] completeFieldList, SortOrder sort)
          Add ordering.
 void addRemainingDocumentsInitial(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] docIDHashes, int hopcountMethod)
          Add an initial set of remaining documents to the queue.
 boolean beginEventSequence(java.lang.String eventName)
          Begin an event sequence.
protected static java.util.HashMap buildReorderMap(java.lang.String[] originalIDHashes, java.lang.String[] reorderedIDHashes)
          Build a reorder map, describing how to convert an original index into a reordered index.
protected  DocumentDescription[] calculateAffectedDeleteCarrydownChildren(java.lang.Long jobID, java.lang.String[] docIDHashes)
          Helper method: Find the document descriptions that will be affected due to carrydown row deletions.
protected  DocumentDescription[] calculateAffectedRestoreCarrydownChildren(java.lang.Long jobID, java.lang.String[] parentIDHashes)
          Helper method: Calculate the unique set of affected carrydown children resulting from a "restoreRecords" operation.
 boolean carrydownChangeDocument(DocumentDescription documentDescription, long currentTime, double docPriority)
          Requeue a document because of carrydown changes.
 boolean[] carrydownChangeDocumentMultiple(DocumentDescription[] documentDescriptions, long currentTime, double[] docPriorities)
          Requeue a document set because of carrydown changes.
 boolean checkIfOutputReference(java.lang.String connectionName)
          See if there's a reference to an output connection name.
 boolean checkIfReference(java.lang.String connectionName)
          See if there's a reference to a connection name.
 boolean checkJobActive(java.lang.Long jobID)
          Verify that a specific job is indeed still active.
 boolean checkJobBusy(java.lang.Long jobID)
          Verify if a job is still processing documents, or no longer has any outstanding active documents
protected static java.lang.Long checkTimeMatch(long startTime, long currentTimestamp, EnumeratedValues daysOfWeek, EnumeratedValues daysOfMonth, EnumeratedValues months, EnumeratedValues years, EnumeratedValues hours, EnumeratedValues minutes, java.lang.String timezone, java.lang.Long duration)
          Check if the specified job parameters have a 'hit' within the specified interval.
 void completeEventSequence(java.lang.String eventName)
          Complete an event sequence.
 IJobDescription createJob()
          Create a new job.
 void deinstall()
          Uninstall.
 void deleteIngestedDocumentIdentifiers(DocumentDescription[] identifiers)
          Delete ingested document identifiers (as part of deleting the owning job).
 void deleteJob(java.lang.Long id)
          Delete a job.
 void deleteJobsReadyForDelete()
          Delete jobs in need of being deleted (which are marked "ready for delete").
 void doneDocumentsInitial(java.lang.Long jobID, java.lang.String[] legalLinkTypes, boolean isPartial, int hopcountMethod)
          Signal that a seeding pass has been done.
protected static java.lang.String[] eliminateDuplicates(java.lang.String[] docIDHashes)
          Eliminate duplicates, and sort
protected  boolean emitClauseStart(java.lang.StringBuffer sb, boolean whereEmitted)
          Emit a WHERE or an AND, depending...
 boolean errorAbort(java.lang.Long jobID, java.lang.String errorText)
          Abort a running job due to a fatal error condition.
 void exportConfiguration(java.io.OutputStream os)
          Export configuration
protected  void fetchAndProcessDocuments(java.util.ArrayList answers, java.lang.Long currentTimeValue, java.lang.Long currentPriorityValue, JobManager.ThrottleLimit vList, IRepositoryConnection[] connections)
          Fetch and process documents matching the passed-in criteria
 boolean[] findHopCounts(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] docIDHashes, java.lang.String linkType, int limit, int hopcountMethod)
          Get the specified hop counts, with the limit as described.
 IJobDescription[] findJobsForConnection(java.lang.String connectionName)
          Get the job IDs associated with a given connection name.
 DocumentDescription[] finishDocuments(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] parentIdentifierHashes, int hopcountMethod)
          Complete adding child documents to the queue, for a set of documents.
 void finishJobAborts(long timestamp, java.util.ArrayList abortJobs)
          Complete the sequence that aborts jobs and makes them runnable again.
 void finishJobs()
          Put all eligible jobs in the "shutting down" state.
 IResultSet genDocumentStatus(java.lang.String connectionName, StatusFilterCriteria filterCriteria, SortOrder sortOrder, int startRow, int rowCount)
          Run a 'document status' report.
 IResultSet genQueueStatus(java.lang.String connectionName, StatusFilterCriteria filterCriteria, SortOrder sortOrder, BucketDescription idBucketDescription, int startRow, int rowCount)
          Run a 'queue status' report.
 IJobDescription[] getAllJobs()
          Load a sorted list of job descriptions.
 java.lang.String[] getAllSeeds(java.lang.Long jobID)
          Get all the current seeds.
 JobStatus[] getAllStatus()
          Get a list of all jobs, and their status information.
 DocumentSetAndFlags getExpiredDocuments(int n, long currentTime)
          Get up to the next n documents to be expired.
 JobStatus[] getFinishedJobs()
          Get a list of completed jobs, and their statistics.
protected  java.lang.String getHopLockName(java.lang.Long jobID)
          Get the hoplock for a given job ID
 JobStartRecord[] getJobsReadyForDelete()
          Get the list of jobs that are ready for deletion.
 JobStartRecord[] getJobsReadyForInactivity()
          Find the list of jobs that need to have their connectors notified of job completion.
 JobStartRecord[] getJobsReadyForSeeding(long currentTime)
          Get the list of jobs that are ready for seeding.
 JobStartRecord[] getJobsReadyForStartup()
          Get the list of jobs that are ready for startup.
 DocumentDescription[] getNextAlreadyProcessedReprioritizationDocuments(long currentTime, int n)
          Get a list of already-processed documents to reprioritize.
 DocumentSetAndFlags getNextCleanableDocuments(int maxCount, long currentTime)
          Get list of cleanable document descriptions.
 DocumentDescription[] getNextDeletableDocuments(int maxCount, long currentTime)
          Get list of deletable document descriptions.
 DocumentDescription[] getNextDocuments(int n, long currentTime, long interval, BlockingDocuments blockingDocuments, PerformanceStatistics statistics, DepthStatistics scanRecord)
          /** Get up to the next n document(s) to be fetched and processed.
 DocumentDescription[] getNextNotYetProcessedReprioritizationDocuments(long currentTime, int n)
          Get a list of not-yet-processed documents to reprioritize.
protected  long getRandomAmount()
          Sleep a random amount of time after a transaction abort.
 JobStatus[] getRunningJobs()
          Get a list of running jobs.
 JobStatus getStatus(java.lang.Long jobID)
          Get the status of a job.
protected  java.lang.String[] getUnindexableDocumentIdentifiers(DocumentDescription[] documentIdentifiers, java.lang.String connectionName, java.lang.String outputConnectionName)
          Get a list of document identifiers that should actually be deleted from the index, from a list that might contain identifiers that are shared with other jobs, which are targeted to the same output connection.
 void importConfiguration(java.io.InputStream is)
          Import configuration
 void inactivateJob(java.lang.Long jobID)
          Inactivate a job, from the notification state.
 void install()
          Install.
 IJobDescription load(java.lang.Long id)
          Load a job for editing.
 IJobDescription load(java.lang.Long id, boolean readOnly)
          Load a job.
protected static java.lang.String makeCompositeID(java.lang.String docIDHash, java.lang.String connectionName)
          Create a composite document hash key.
protected  JobStatus[] makeJobStatus(java.lang.String whereClause, java.util.ArrayList whereParams)
          Make a job status array from a query result.
 void manualAbort(java.lang.Long jobID)
          Manually abort a running job.
 void manualAbortRestart(java.lang.Long jobID)
          Manually restart a running job.
 void manualStart(java.lang.Long jobID)
          Manually start a job.
 void markDocumentCompleted(DocumentDescription documentDescription)
          Note completion of document processing by a job thread of a document.
 void markDocumentCompletedMultiple(DocumentDescription[] documentDescriptions)
          Note completion of document processing by a job thread of a document.
 DocumentDescription[] markDocumentDeleted(java.lang.Long jobID, java.lang.String[] legalLinkTypes, DocumentDescription documentDescription, int hopcountMethod)
          Note deletion as result of document processing by a job thread of a document.
 DocumentDescription[] markDocumentDeletedMultiple(java.lang.Long jobID, java.lang.String[] legalLinkTypes, DocumentDescription[] documentDescriptions, int hopcountMethod)
          Note deletion as result of document processing by a job thread of a document.
 void noteConnectionChange(java.lang.String connectionName)
          Note a change in connection configuration.
protected  void noteConnectionDeregistration(java.lang.String query, java.util.ArrayList list)
          Note deregistration for a batch of connection names.
protected  void noteConnectionRegistration(java.lang.String query, java.util.ArrayList list)
          Note registration for a batch of connection names.
 void noteConnectorDeregistration(java.lang.String[] connectionNames)
          Note the deregistration of a connector used by the specified connections.
 void noteConnectorRegistration(java.lang.String[] connectionNames)
          Note the registration of a connector used by the specified connections.
 void noteJobDeleteStarted(java.lang.Long jobID, long startTime)
          Note job delete started.
 void noteJobSeeded(java.lang.Long jobID, long seedTime)
          Note job seeded.
 void noteJobStarted(java.lang.Long jobID, long startTime)
          Note job started.
 void noteOutputConnectionChange(java.lang.String connectionName)
          Note a change in output connection configuration.
protected  void noteOutputConnectionDeregistration(java.lang.String query, java.util.ArrayList list)
          Note deregistration for a batch of output connection names.
protected  void noteOutputConnectionRegistration(java.lang.String query, java.util.ArrayList list)
          Note registration for a batch of output connection names.
 void noteOutputConnectorDeregistration(java.lang.String[] connectionNames)
          Note the deregistration of an output connector used by the specified connections.
 void noteOutputConnectorRegistration(java.lang.String[] connectionNames)
          Note the registration of an output connector used by the specified connections.
 void pauseJob(java.lang.Long jobID)
          Pause a job.
 void prepareDeleteScan(java.lang.Long jobID)
          Prepare for a delete scan.
 void prepareForStart()
          Reset the job queue immediately after starting up.
 void prepareFullScan(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod)
          Prepare for a full scan.
 void prepareIncrementalScan(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod)
          Prepare for an incremental scan.
protected  void processDeleteHashSet(java.lang.Long jobID, java.util.HashMap resultHash, java.lang.String queryPart, java.util.ArrayList list)
          Helper method: look up rows affected by a deleteRecords operation.
protected  void processParentHashSet(java.lang.Long jobID, java.util.HashMap resultHash, java.lang.String queryPart, java.util.ArrayList list)
          Helper method: look up rows affected by a restoreRecords operation.
protected  EnumeratedValues readEnumeratedValues(java.io.InputStream is)
           
 void requeueDocument(DocumentDescription documentDescription, java.lang.Long executeTime, int action)
          Requeue a document for further processing in the future.
 void requeueDocumentMultiple(DocumentDescription[] documentDescriptions, java.lang.Long[] executeTimes, int[] actions)
          Requeue a document for further processing in the future.
 void resetCleaningDocument(DocumentDescription documentDescription, long checkTime)
          Reset a cleaning document back to its former state.
 void resetCleaningDocumentMultiple(DocumentDescription[] documentDescriptions, long checkTime)
          Reset a set of cleaning documents for further processing in the future.
 void resetDeleteStartupWorkerStatus()
          Reset as part of restoring delete startup threads.
 void resetDeletingDocument(DocumentDescription documentDescription, long checkTime)
          Reset a deleting document back to its former state.
 void resetDeletingDocumentMultiple(DocumentDescription[] documentDescriptions, long checkTime)
          Reset a set of deleting documents for further processing in the future.
 void resetDocCleanupWorkerStatus()
          Reset as part of restoring doc cleanup threads.
 void resetDocDeleteWorkerStatus()
          Reset as part of restoring doc delete threads.
 void resetDocument(DocumentDescription documentDescription, long executeTime, int action, long failTime, int failCount)
          Reset an active document back to its former state.
 void resetDocumentMultiple(DocumentDescription[] documentDescriptions, long executeTime, int action, long failTime, int failCount)
          Reset a set of documents for further processing in the future.
 void resetDocumentWorkerStatus()
          Reset as part of restoring document worker threads.
 void resetJobs(long currentTime, java.util.ArrayList resetJobs)
          Reset eligible jobs either back to the "inactive" state, or make them active again.
 void resetJobSchedule(java.lang.Long jobID)
          Reset job schedule.
 void resetNotificationWorkerStatus()
          Reset as part of restoring notification threads.
 void resetNotifyJob(java.lang.Long jobID)
          Reset a job that is notifying back to "ready for notify" state.
 void resetSeedingWorkerStatus()
          Reset as part of restoring seeding threads.
 void resetSeedJob(java.lang.Long jobID)
          Reset a seeding job back to "active" state.
 void resetStartDeleteJob(java.lang.Long jobID)
          Reset a job starting for delete back to "ready for delete" state.
 void resetStartupJob(java.lang.Long jobID)
          Reset a starting job back to "ready for startup" state.
 void resetStartupWorkerStatus()
          Reset as part of restoring startup threads.
 void restartJob(java.lang.Long jobID)
          Restart a paused job.
 java.lang.String[] retrieveParentData(java.lang.Long jobID, java.lang.String docIDHash, java.lang.String dataName)
          Retrieve specific parent data for a given document.
 CharacterInput[] retrieveParentDataAsFiles(java.lang.Long jobID, java.lang.String docIDHash, java.lang.String dataName)
          Retrieve specific parent data for a given document.
 void save(IJobDescription jobDescription)
          Save a job.
protected  void sleepFor(long amt)
           
 void startJobs(long currentTime, java.util.ArrayList unwaitList)
          Start all jobs in need of starting.
 void waitJobs(long currentTime, java.util.ArrayList waitList)
          Put active or paused jobs in wait state, if they've exceeded their window.
 void writeDocumentPriorities(long currentTime, DocumentDescription[] documentDescriptions, double[] priorities)
          Save a set of document priorities.
protected static void writeEnumeratedValues(java.io.OutputStream os, EnumeratedValues ev)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_rcsid

public static final java.lang.String _rcsid
See Also:
Constant Field Values

hopLock

protected static final java.lang.String hopLock
See Also:
Constant Field Values

database

protected IDBInterface database

outputMgr

protected IOutputConnectionManager outputMgr

connectionMgr

protected IRepositoryConnectionManager connectionMgr

lockManager

protected ILockManager lockManager

threadContext

protected IThreadContext threadContext

jobQueue

protected JobQueue jobQueue

jobs

protected Jobs jobs

hopCount

protected HopCount hopCount

carryDown

protected Carrydown carryDown

eventManager

protected EventManager eventManager

random

protected static java.util.Random random
Constructor Detail

JobManager

public JobManager(IThreadContext threadContext,
                  IDBInterface database)
           throws ManifoldCFException
Constructor.

Parameters:
threadContext - is the thread context.
database - is the database.
Throws:
ManifoldCFException
Method Detail

install

public void install()
             throws ManifoldCFException
Install.

Specified by:
install in interface IJobManager
Throws:
ManifoldCFException

deinstall

public void deinstall()
               throws ManifoldCFException
Uninstall.

Specified by:
deinstall in interface IJobManager
Throws:
ManifoldCFException

exportConfiguration

public void exportConfiguration(java.io.OutputStream os)
                         throws java.io.IOException,
                                ManifoldCFException
Export configuration

Specified by:
exportConfiguration in interface IJobManager
Throws:
java.io.IOException
ManifoldCFException

writeEnumeratedValues

protected static void writeEnumeratedValues(java.io.OutputStream os,
                                            EnumeratedValues ev)
                                     throws java.io.IOException
Throws:
java.io.IOException

importConfiguration

public void importConfiguration(java.io.InputStream is)
                         throws java.io.IOException,
                                ManifoldCFException
Import configuration

Specified by:
importConfiguration in interface IJobManager
Throws:
java.io.IOException
ManifoldCFException

readEnumeratedValues

protected EnumeratedValues readEnumeratedValues(java.io.InputStream is)
                                         throws java.io.IOException
Throws:
java.io.IOException

noteConnectorDeregistration

public void noteConnectorDeregistration(java.lang.String[] connectionNames)
                                 throws ManifoldCFException
Note the deregistration of a connector used by the specified connections. This method will be called when the connector is deregistered. Jobs that use these connections must therefore enter appropriate states.

Specified by:
noteConnectorDeregistration in interface IJobManager
Parameters:
connectionNames - is the set of connection names.
Throws:
ManifoldCFException

noteConnectionDeregistration

protected void noteConnectionDeregistration(java.lang.String query,
                                            java.util.ArrayList list)
                                     throws ManifoldCFException
Note deregistration for a batch of connection names.

Throws:
ManifoldCFException

noteConnectorRegistration

public void noteConnectorRegistration(java.lang.String[] connectionNames)
                               throws ManifoldCFException
Note the registration of a connector used by the specified connections. This method will be called when a connector is registered, on which the specified connections depend.

Specified by:
noteConnectorRegistration in interface IJobManager
Parameters:
connectionNames - is the set of connection names.
Throws:
ManifoldCFException

noteConnectionRegistration

protected void noteConnectionRegistration(java.lang.String query,
                                          java.util.ArrayList list)
                                   throws ManifoldCFException
Note registration for a batch of connection names.

Throws:
ManifoldCFException

noteConnectionChange

public void noteConnectionChange(java.lang.String connectionName)
                          throws ManifoldCFException
Note a change in connection configuration. This method will be called whenever a connection's configuration is modified, or when an external repository change is signalled.

Specified by:
noteConnectionChange in interface IJobManager
Throws:
ManifoldCFException

noteOutputConnectorDeregistration

public void noteOutputConnectorDeregistration(java.lang.String[] connectionNames)
                                       throws ManifoldCFException
Note the deregistration of an output connector used by the specified connections. This method will be called when the connector is deregistered. Jobs that use these connections must therefore enter appropriate states.

Specified by:
noteOutputConnectorDeregistration in interface IJobManager
Parameters:
connectionNames - is the set of connection names.
Throws:
ManifoldCFException

noteOutputConnectionDeregistration

protected void noteOutputConnectionDeregistration(java.lang.String query,
                                                  java.util.ArrayList list)
                                           throws ManifoldCFException
Note deregistration for a batch of output connection names.

Throws:
ManifoldCFException

noteOutputConnectorRegistration

public void noteOutputConnectorRegistration(java.lang.String[] connectionNames)
                                     throws ManifoldCFException
Note the registration of an output connector used by the specified connections. This method will be called when a connector is registered, on which the specified connections depend.

Specified by:
noteOutputConnectorRegistration in interface IJobManager
Parameters:
connectionNames - is the set of connection names.
Throws:
ManifoldCFException

noteOutputConnectionRegistration

protected void noteOutputConnectionRegistration(java.lang.String query,
                                                java.util.ArrayList list)
                                         throws ManifoldCFException
Note registration for a batch of output connection names.

Throws:
ManifoldCFException

noteOutputConnectionChange

public void noteOutputConnectionChange(java.lang.String connectionName)
                                throws ManifoldCFException
Note a change in output connection configuration. This method will be called whenever a connection's configuration is modified, or when an external target config change is signalled.

Specified by:
noteOutputConnectionChange in interface IJobManager
Throws:
ManifoldCFException

getAllJobs

public IJobDescription[] getAllJobs()
                             throws ManifoldCFException
Load a sorted list of job descriptions.

Specified by:
getAllJobs in interface IJobManager
Returns:
the list, sorted by description.
Throws:
ManifoldCFException

createJob

public IJobDescription createJob()
                          throws ManifoldCFException
Create a new job.

Specified by:
createJob in interface IJobManager
Returns:
the new job.
Throws:
ManifoldCFException

getHopLockName

protected java.lang.String getHopLockName(java.lang.Long jobID)
Get the hoplock for a given job ID


deleteJob

public void deleteJob(java.lang.Long id)
               throws ManifoldCFException
Delete a job.

Specified by:
deleteJob in interface IJobManager
Parameters:
id - is the job's identifier. This method will purge all the records belonging to the job from the database, as well as remove all documents indexed by the job from the index.
Throws:
ManifoldCFException

load

public IJobDescription load(java.lang.Long id)
                     throws ManifoldCFException
Load a job for editing.

Specified by:
load in interface IJobManager
Parameters:
id - is the job's identifier.
Returns:
null if the job doesn't exist.
Throws:
ManifoldCFException

load

public IJobDescription load(java.lang.Long id,
                            boolean readOnly)
                     throws ManifoldCFException
Load a job.

Specified by:
load in interface IJobManager
Parameters:
id - is the job's identifier.
readOnly - is true if a read-only object is desired.
Returns:
null if the job doesn't exist.
Throws:
ManifoldCFException

save

public void save(IJobDescription jobDescription)
          throws ManifoldCFException
Save a job.

Specified by:
save in interface IJobManager
Parameters:
jobDescription - is the job description.
Throws:
ManifoldCFException

checkIfReference

public boolean checkIfReference(java.lang.String connectionName)
                         throws ManifoldCFException
See if there's a reference to a connection name.

Specified by:
checkIfReference in interface IJobManager
Parameters:
connectionName - is the name of the connection.
Returns:
true if there is a reference, false otherwise.
Throws:
ManifoldCFException

checkIfOutputReference

public boolean checkIfOutputReference(java.lang.String connectionName)
                               throws ManifoldCFException
See if there's a reference to an output connection name.

Specified by:
checkIfOutputReference in interface IJobManager
Parameters:
connectionName - is the name of the connection.
Returns:
true if there is a reference, false otherwise.
Throws:
ManifoldCFException

findJobsForConnection

public IJobDescription[] findJobsForConnection(java.lang.String connectionName)
                                        throws ManifoldCFException
Get the job IDs associated with a given connection name.

Specified by:
findJobsForConnection in interface IJobManager
Parameters:
connectionName - is the name of the connection.
Returns:
the set of job id's associated with that connection.
Throws:
ManifoldCFException

prepareForStart

public void prepareForStart()
                     throws ManifoldCFException
Reset the job queue immediately after starting up. If the system was shut down in the middle of a job, sufficient information should be around in the database to allow it to restart. However, BEFORE all the job threads are spun up, there needs to be a pass over the queue to bring things back to a "normal" state. Also, if a job's status is in a state that indicates it was being processed by a thread (which is now dead), then we have to set that status back to previous value.

Specified by:
prepareForStart in interface IJobManager
Throws:
ManifoldCFException

resetDocumentWorkerStatus

public void resetDocumentWorkerStatus()
                               throws ManifoldCFException
Reset as part of restoring document worker threads.

Specified by:
resetDocumentWorkerStatus in interface IJobManager
Throws:
ManifoldCFException

resetSeedingWorkerStatus

public void resetSeedingWorkerStatus()
                              throws ManifoldCFException
Reset as part of restoring seeding threads.

Specified by:
resetSeedingWorkerStatus in interface IJobManager
Throws:
ManifoldCFException

resetDocDeleteWorkerStatus

public void resetDocDeleteWorkerStatus()
                                throws ManifoldCFException
Reset as part of restoring doc delete threads.

Specified by:
resetDocDeleteWorkerStatus in interface IJobManager
Throws:
ManifoldCFException

resetDocCleanupWorkerStatus

public void resetDocCleanupWorkerStatus()
                                 throws ManifoldCFException
Reset as part of restoring doc cleanup threads.

Specified by:
resetDocCleanupWorkerStatus in interface IJobManager
Throws:
ManifoldCFException

resetDeleteStartupWorkerStatus

public void resetDeleteStartupWorkerStatus()
                                    throws ManifoldCFException
Reset as part of restoring delete startup threads.

Specified by:
resetDeleteStartupWorkerStatus in interface IJobManager
Throws:
ManifoldCFException

resetNotificationWorkerStatus

public void resetNotificationWorkerStatus()
                                   throws ManifoldCFException
Reset as part of restoring notification threads.

Specified by:
resetNotificationWorkerStatus in interface IJobManager
Throws:
ManifoldCFException

resetStartupWorkerStatus

public void resetStartupWorkerStatus()
                              throws ManifoldCFException
Reset as part of restoring startup threads.

Specified by:
resetStartupWorkerStatus in interface IJobManager
Throws:
ManifoldCFException

deleteIngestedDocumentIdentifiers

public void deleteIngestedDocumentIdentifiers(DocumentDescription[] identifiers)
                                       throws ManifoldCFException
Delete ingested document identifiers (as part of deleting the owning job). The number of identifiers specified is guaranteed to be less than the maxInClauseCount for the database.

Specified by:
deleteIngestedDocumentIdentifiers in interface IJobManager
Parameters:
identifiers - is the set of document identifiers.
Throws:
ManifoldCFException

getNextCleanableDocuments

public DocumentSetAndFlags getNextCleanableDocuments(int maxCount,
                                                     long currentTime)
                                              throws ManifoldCFException
Get list of cleanable document descriptions. This list will take into account multiple jobs that may own the same document. All documents for which a description is returned will be transitioned to the "beingcleaned" state. Documents which are not in transition and are eligible, but are owned by other jobs, will have their jobqueue entries deleted by this method.

Specified by:
getNextCleanableDocuments in interface IJobManager
Parameters:
maxCount - is the maximum number of documents to return.
currentTime - is the current time; some fetches do not occur until a specific time.
Returns:
the document descriptions for these documents.
Throws:
ManifoldCFException

makeCompositeID

protected static java.lang.String makeCompositeID(java.lang.String docIDHash,
                                                  java.lang.String connectionName)
Create a composite document hash key. This consists of the document id hash plus the connection name.


getNextDeletableDocuments

public DocumentDescription[] getNextDeletableDocuments(int maxCount,
                                                       long currentTime)
                                                throws ManifoldCFException
Get list of deletable document descriptions. This list will take into account multiple jobs that may own the same document. All documents for which a description is returned will be transitioned to the "beingdeleted" state. Documents which are not in transition and are eligible, but are owned by other jobs, will have their jobqueue entries deleted by this method.

Specified by:
getNextDeletableDocuments in interface IJobManager
Parameters:
maxCount - is the maximum number of documents to return.
currentTime - is the current time; some fetches do not occur until a specific time.
Returns:
the document descriptions for these documents.
Throws:
ManifoldCFException

getUnindexableDocumentIdentifiers

protected java.lang.String[] getUnindexableDocumentIdentifiers(DocumentDescription[] documentIdentifiers,
                                                               java.lang.String connectionName,
                                                               java.lang.String outputConnectionName)
                                                        throws ManifoldCFException
Get a list of document identifiers that should actually be deleted from the index, from a list that might contain identifiers that are shared with other jobs, which are targeted to the same output connection. The input list is guaranteed to be smaller in size than maxInClauseCount for the database.

Parameters:
documentIdentifiers - is the set of document identifiers to consider.
connectionName - is the connection name for ALL the document identifiers.
outputConnectionName - is the output connection name for ALL the document identifiers.
Returns:
the set of documents which should be removed from the index.
Throws:
ManifoldCFException

getNextAlreadyProcessedReprioritizationDocuments

public DocumentDescription[] getNextAlreadyProcessedReprioritizationDocuments(long currentTime,
                                                                              int n)
                                                                       throws ManifoldCFException
Get a list of already-processed documents to reprioritize. Documents in all jobs will be returned by this method. Up to n document descriptions will be returned.

Specified by:
getNextAlreadyProcessedReprioritizationDocuments in interface IJobManager
Parameters:
currentTime - is the current time stamp for this prioritization pass. Avoid picking up any documents that are labeled with this timestamp or after.
n - is the maximum number of document descriptions desired.
Returns:
the document descriptions.
Throws:
ManifoldCFException

getNextNotYetProcessedReprioritizationDocuments

public DocumentDescription[] getNextNotYetProcessedReprioritizationDocuments(long currentTime,
                                                                             int n)
                                                                      throws ManifoldCFException
Get a list of not-yet-processed documents to reprioritize. Documents in all jobs will be returned by this method. Up to n document descriptions will be returned.

Specified by:
getNextNotYetProcessedReprioritizationDocuments in interface IJobManager
Parameters:
currentTime - is the current time stamp for this prioritization pass. Avoid picking up any documents that are labeled with this timestamp or after.
n - is the maximum number of document descriptions desired.
Returns:
the document descriptions.
Throws:
ManifoldCFException

writeDocumentPriorities

public void writeDocumentPriorities(long currentTime,
                                    DocumentDescription[] documentDescriptions,
                                    double[] priorities)
                             throws ManifoldCFException
Save a set of document priorities. In the case where a document was eligible to have its priority set, but it no longer is eligible, then the provided priority will not be written.

Specified by:
writeDocumentPriorities in interface IJobManager
Parameters:
currentTime - is the time in milliseconds since epoch.
documentDescriptions - are the document descriptions.
priorities - are the desired priorities.
Throws:
ManifoldCFException

getExpiredDocuments

public DocumentSetAndFlags getExpiredDocuments(int n,
                                               long currentTime)
                                        throws ManifoldCFException
Get up to the next n documents to be expired. This method marks the documents whose descriptions have been returned as "being processed", or active. The same marking is used as is used for documents that have been queued for worker threads. The model is thus identical.

Specified by:
getExpiredDocuments in interface IJobManager
Parameters:
n - is the maximum number of records desired.
currentTime - is the current time.
Returns:
the array of document descriptions to expire.
Throws:
ManifoldCFException

getNextDocuments

public DocumentDescription[] getNextDocuments(int n,
                                              long currentTime,
                                              long interval,
                                              BlockingDocuments blockingDocuments,
                                              PerformanceStatistics statistics,
                                              DepthStatistics scanRecord)
                                       throws ManifoldCFException
/** Get up to the next n document(s) to be fetched and processed. This fetch returns records that contain the document identifier, plus all instructions pertaining to the document's handling (e.g. whether it should be refetched if the version has not changed). This method also marks the documents whose descriptions have be returned as "being processed".

Specified by:
getNextDocuments in interface IJobManager
Parameters:
n - is the maximum number of records desired.
currentTime - is the current time; some fetches do not occur until a specific time.
interval - is the number of milliseconds that this set of documents should represent (for throttling).
blockingDocuments - is the place to record documents that were encountered, are eligible for reprioritization, but could not be queued due to throttling considerations.
statistics - are the current performance statistics per connection, which are used to balance the queue stuffing so that individual connections are not overwhelmed.
scanRecord - retains the bins from all documents encountered from the query, even those that were skipped due to being overcommitted.
Returns:
the array of document descriptions to fetch and process.
Throws:
ManifoldCFException

addDocumentCriteria

protected void addDocumentCriteria(java.lang.StringBuffer sb,
                                   java.util.ArrayList list,
                                   java.lang.Long currentTimeValue,
                                   java.lang.Long currentPriorityValue)
                            throws ManifoldCFException
Throws:
ManifoldCFException

fetchAndProcessDocuments

protected void fetchAndProcessDocuments(java.util.ArrayList answers,
                                        java.lang.Long currentTimeValue,
                                        java.lang.Long currentPriorityValue,
                                        JobManager.ThrottleLimit vList,
                                        IRepositoryConnection[] connections)
                                 throws ManifoldCFException
Fetch and process documents matching the passed-in criteria

Throws:
ManifoldCFException

checkJobActive

public boolean checkJobActive(java.lang.Long jobID)
                       throws ManifoldCFException
Verify that a specific job is indeed still active. This is used to permit abort or pause to be relatively speedy. The query done within MUST be cached in order to not cause undue performance degradation.

Specified by:
checkJobActive in interface IJobManager
Parameters:
jobID - is the job identifier.
Returns:
true if the job is in one of the "active" states.
Throws:
ManifoldCFException

checkJobBusy

public boolean checkJobBusy(java.lang.Long jobID)
                     throws ManifoldCFException
Verify if a job is still processing documents, or no longer has any outstanding active documents

Specified by:
checkJobBusy in interface IJobManager
Throws:
ManifoldCFException

markDocumentCompletedMultiple

public void markDocumentCompletedMultiple(DocumentDescription[] documentDescriptions)
                                   throws ManifoldCFException
Note completion of document processing by a job thread of a document. This method causes the state of the document to be marked as "completed".

Specified by:
markDocumentCompletedMultiple in interface IJobManager
Parameters:
documentDescriptions - are the description objects for the documents that were processed.
Throws:
ManifoldCFException

markDocumentCompleted

public void markDocumentCompleted(DocumentDescription documentDescription)
                           throws ManifoldCFException
Note completion of document processing by a job thread of a document. This method causes the state of the document to be marked as "completed".

Specified by:
markDocumentCompleted in interface IJobManager
Parameters:
documentDescription - is the description object for the document that was processed.
Throws:
ManifoldCFException

markDocumentDeletedMultiple

public DocumentDescription[] markDocumentDeletedMultiple(java.lang.Long jobID,
                                                         java.lang.String[] legalLinkTypes,
                                                         DocumentDescription[] documentDescriptions,
                                                         int hopcountMethod)
                                                  throws ManifoldCFException
Note deletion as result of document processing by a job thread of a document.

Specified by:
markDocumentDeletedMultiple in interface IJobManager
Parameters:
documentDescriptions - are the set of description objects for the documents that were processed.
hopcountMethod - describes how to handle deletions for hopcount purposes.
Returns:
the set of documents for which carrydown data was changed by this operation. These documents are likely to be requeued as a result of the change.
Throws:
ManifoldCFException

calculateAffectedDeleteCarrydownChildren

protected DocumentDescription[] calculateAffectedDeleteCarrydownChildren(java.lang.Long jobID,
                                                                         java.lang.String[] docIDHashes)
                                                                  throws ManifoldCFException
Helper method: Find the document descriptions that will be affected due to carrydown row deletions.

Throws:
ManifoldCFException

processDeleteHashSet

protected void processDeleteHashSet(java.lang.Long jobID,
                                    java.util.HashMap resultHash,
                                    java.lang.String queryPart,
                                    java.util.ArrayList list)
                             throws ManifoldCFException
Helper method: look up rows affected by a deleteRecords operation.

Throws:
ManifoldCFException

markDocumentDeleted

public DocumentDescription[] markDocumentDeleted(java.lang.Long jobID,
                                                 java.lang.String[] legalLinkTypes,
                                                 DocumentDescription documentDescription,
                                                 int hopcountMethod)
                                          throws ManifoldCFException
Note deletion as result of document processing by a job thread of a document.

Specified by:
markDocumentDeleted in interface IJobManager
Parameters:
documentDescription - is the description object for the document that was processed.
hopcountMethod - describes how to handle deletions for hopcount purposes.
Returns:
the set of documents for which carrydown data was changed by this operation. These documents are likely to be requeued as a result of the change.
Throws:
ManifoldCFException

requeueDocumentMultiple

public void requeueDocumentMultiple(DocumentDescription[] documentDescriptions,
                                    java.lang.Long[] executeTimes,
                                    int[] actions)
                             throws ManifoldCFException
Requeue a document for further processing in the future. This method is called after a document is processed, when the job is a "continuous" one. It is essentially equivalent to noting that the document processing is complete, except the document remains on the queue.

Specified by:
requeueDocumentMultiple in interface IJobManager
Parameters:
documentDescriptions - is the set of description objects for the document that was processed.
executeTimes - are the times that the documents should be rescanned. Null indicates "never".
actions - are what should be done when the time arrives. Choices are ACTION_RESCAN or ACTION_REMOVE.
Throws:
ManifoldCFException

requeueDocument

public void requeueDocument(DocumentDescription documentDescription,
                            java.lang.Long executeTime,
                            int action)
                     throws ManifoldCFException
Requeue a document for further processing in the future. This method is called after a document is processed, when the job is a "continuous" one. It is essentially equivalent to noting that the document processing is complete, except the document remains on the queue.

Specified by:
requeueDocument in interface IJobManager
Parameters:
documentDescription - is the description object for the document that was processed.
executeTime - is the time that the document should be rescanned. Null indicates "never".
action - is what should be done when the time arrives. Choices include ACTION_RESCAN or ACTION_REMOVE.
Throws:
ManifoldCFException

resetDocumentMultiple

public void resetDocumentMultiple(DocumentDescription[] documentDescriptions,
                                  long executeTime,
                                  int action,
                                  long failTime,
                                  int failCount)
                           throws ManifoldCFException
Reset a set of documents for further processing in the future. This method is called after some unknown number of the documents were processed, but then a service interruption occurred. Note well: The logic here basically presumes that we cannot know whether the documents were indeed processed or not. If we knew for a fact that none of the documents had been handled, it would be possible to look at the document's current status and decide what the new status ought to be, based on a true rollback scenario. Such cases, however, are rare enough so that special logic is probably not worth it.

Specified by:
resetDocumentMultiple in interface IJobManager
Parameters:
documentDescriptions - is the set of description objects for the document that was processed.
executeTime - is the time that the documents should be rescanned.
failTime - is the time beyond which a service interruption will be considered a hard failure.
failCount - is the number of retries beyond which a service interruption will be considered a hard failure.
Throws:
ManifoldCFException

resetCleaningDocumentMultiple

public void resetCleaningDocumentMultiple(DocumentDescription[] documentDescriptions,
                                          long checkTime)
                                   throws ManifoldCFException
Reset a set of cleaning documents for further processing in the future. This method is called after some unknown number of the documents were cleaned, but then an ingestion service interruption occurred. Note well: The logic here basically presumes that we cannot know whether the documents were indeed cleaned or not. If we knew for a fact that none of the documents had been handled, it would be possible to look at the document's current status and decide what the new status ought to be, based on a true rollback scenario. Such cases, however, are rare enough so that special logic is probably not worth it.

Specified by:
resetCleaningDocumentMultiple in interface IJobManager
Parameters:
documentDescriptions - is the set of description objects for the document that was cleaned.
checkTime - is the minimum time for the next cleaning attempt.
Throws:
ManifoldCFException

resetCleaningDocument

public void resetCleaningDocument(DocumentDescription documentDescription,
                                  long checkTime)
                           throws ManifoldCFException
Reset a cleaning document back to its former state. This gets done when a deleting thread sees a service interruption, etc., from the ingestion system.

Specified by:
resetCleaningDocument in interface IJobManager
Parameters:
documentDescription - is the description of the document that was cleaned.
checkTime - is the minimum time for the next cleaning attempt.
Throws:
ManifoldCFException

resetDeletingDocumentMultiple

public void resetDeletingDocumentMultiple(DocumentDescription[] documentDescriptions,
                                          long checkTime)
                                   throws ManifoldCFException
Reset a set of deleting documents for further processing in the future. This method is called after some unknown number of the documents were deleted, but then an ingestion service interruption occurred. Note well: The logic here basically presumes that we cannot know whether the documents were indeed processed or not. If we knew for a fact that none of the documents had been handled, it would be possible to look at the document's current status and decide what the new status ought to be, based on a true rollback scenario. Such cases, however, are rare enough so that special logic is probably not worth it.

Specified by:
resetDeletingDocumentMultiple in interface IJobManager
Parameters:
documentDescriptions - is the set of description objects for the document that was processed.
checkTime - is the minimum time for the next cleaning attempt.
Throws:
ManifoldCFException

resetDeletingDocument

public void resetDeletingDocument(DocumentDescription documentDescription,
                                  long checkTime)
                           throws ManifoldCFException
Reset a deleting document back to its former state. This gets done when a deleting thread sees a service interruption, etc., from the ingestion system.

Specified by:
resetDeletingDocument in interface IJobManager
Parameters:
documentDescription - is the description object for the document that was cleaned.
checkTime - is the minimum time for the next cleaning attempt.
Throws:
ManifoldCFException

resetDocument

public void resetDocument(DocumentDescription documentDescription,
                          long executeTime,
                          int action,
                          long failTime,
                          int failCount)
                   throws ManifoldCFException
Reset an active document back to its former state. This gets done when there's a service interruption and the document cannot be processed yet. Note well: This method formerly presumed that a perfect rollback was possible, and that there was zero chance of any processing activity occuring before it got called. That assumption appears incorrect, however, so I've opted to now presume that processing has perhaps occurred. Perfect rollback is thus no longer possible.

Specified by:
resetDocument in interface IJobManager
Parameters:
documentDescription - is the description object for the document that was processed.
executeTime - is the time that the document should be rescanned.
failTime - is the time that the document should be considered to have failed, if it has not been successfully read until then.
failCount - is the number of permitted failures before a hard error is signalled.
Throws:
ManifoldCFException

eliminateDuplicates

protected static java.lang.String[] eliminateDuplicates(java.lang.String[] docIDHashes)
Eliminate duplicates, and sort


buildReorderMap

protected static java.util.HashMap buildReorderMap(java.lang.String[] originalIDHashes,
                                                   java.lang.String[] reorderedIDHashes)
Build a reorder map, describing how to convert an original index into a reordered index.


addDocumentsInitial

public boolean[] addDocumentsInitial(java.lang.Long jobID,
                                     java.lang.String[] legalLinkTypes,
                                     java.lang.String[] docIDHashes,
                                     java.lang.String[] docIDs,
                                     boolean overrideSchedule,
                                     int hopcountMethod,
                                     long currentTime,
                                     double[] documentPriorities,
                                     java.lang.String[][] prereqEventNames)
                              throws ManifoldCFException
Add an initial set of documents to the queue. This method is called during job startup, when the queue is being loaded. A set of document references is passed to this method, which updates the status of the document in the specified job's queue, according to specific state rules.

Specified by:
addDocumentsInitial in interface IJobManager
Parameters:
jobID - is the job identifier.
legalLinkTypes - is the set of legal link types that this connector generates.
docIDs - are the local document identifiers.
overrideSchedule - is true if any existing document schedule should be overridden.
hopcountMethod - is either accurate, nodelete, or neverdelete.
currentTime - is the current time in milliseconds since epoch.
documentPriorities - are the document priorities corresponding to the document identifiers.
prereqEventNames - are the events that must be completed before each document can be processed.
docIDHashes - are the hashes of the local document identifiers (primary key).
Returns:
true if the priority value(s) were used, false otherwise.
Throws:
ManifoldCFException

addRemainingDocumentsInitial

public void addRemainingDocumentsInitial(java.lang.Long jobID,
                                         java.lang.String[] legalLinkTypes,
                                         java.lang.String[] docIDHashes,
                                         int hopcountMethod)
                                  throws ManifoldCFException
Add an initial set of remaining documents to the queue. This method is called during job startup, when the queue is being loaded, to list documents that were NOT included by calling addDocumentsInitial(). Documents listed here are simply designed to enable the framework to get rid of old, invalid seeds. They are not queued for processing.

Specified by:
addRemainingDocumentsInitial in interface IJobManager
Parameters:
jobID - is the job identifier.
legalLinkTypes - is the set of legal link types that this connector generates.
docIDHashes - are the local document identifier hashes.
hopcountMethod - is either accurate, nodelete, or neverdelete.
Throws:
ManifoldCFException

doneDocumentsInitial

public void doneDocumentsInitial(java.lang.Long jobID,
                                 java.lang.String[] legalLinkTypes,
                                 boolean isPartial,
                                 int hopcountMethod)
                          throws ManifoldCFException
Signal that a seeding pass has been done. Call this method at the end of a seeding pass. It is used to perform the bookkeeping necessary to maintain the hopcount table.

Specified by:
doneDocumentsInitial in interface IJobManager
Parameters:
jobID - is the job identifier.
legalLinkTypes - is the set of legal link types that this connector generates.
isPartial - is set if the seeds provided are only a partial list. Some connectors cannot supply a full list of seeds on every seeding iteration; this acknowledges that limitation.
hopcountMethod - describes how to handle deletions for hopcount purposes.
Throws:
ManifoldCFException

findHopCounts

public boolean[] findHopCounts(java.lang.Long jobID,
                               java.lang.String[] legalLinkTypes,
                               java.lang.String[] docIDHashes,
                               java.lang.String linkType,
                               int limit,
                               int hopcountMethod)
                        throws ManifoldCFException
Get the specified hop counts, with the limit as described.

Specified by:
findHopCounts in interface IJobManager
Parameters:
jobID - is the job identifier.
legalLinkTypes - is the set of legal link types that this connector generates.
docIDHashes - are the hashes for the set of documents to find the hopcount for.
linkType - is the kind of link to find the hopcount for.
limit - is the limit, beyond which a negative distance may be returned.
hopcountMethod - is the method for managing hopcounts that is in effect.
Returns:
a vector of booleans corresponding to the documents requested. A true value is returned if the document is within the specified limit, false otherwise.
Throws:
ManifoldCFException

getAllSeeds

public java.lang.String[] getAllSeeds(java.lang.Long jobID)
                               throws ManifoldCFException
Get all the current seeds. Returns the seed document identifiers for a job.

Specified by:
getAllSeeds in interface IJobManager
Parameters:
jobID - is the job identifier.
Returns:
the document identifiers that are currently considered to be seeds.
Throws:
ManifoldCFException

addDocuments

public boolean[] addDocuments(java.lang.Long jobID,
                              java.lang.String[] legalLinkTypes,
                              java.lang.String[] docIDHashes,
                              java.lang.String[] docIDs,
                              java.lang.String parentIdentifierHash,
                              java.lang.String relationshipType,
                              int hopcountMethod,
                              java.lang.String[][] dataNames,
                              java.lang.Object[][][] dataValues,
                              long currentTime,
                              double[] documentPriorities,
                              java.lang.String[][] prereqEventNames)
                       throws ManifoldCFException
Add documents to the queue in bulk. This method is called during document processing, when a set of document references are discovered. The document references are passed to this method, which updates the status of the document(s) in the specified job's queue, according to specific state rules.

Specified by:
addDocuments in interface IJobManager
Parameters:
jobID - is the job identifier.
legalLinkTypes - is the set of legal link types that this connector generates.
docIDHashes - are the local document identifier hashes.
parentIdentifierHash - is the optional parent identifier hash of this document. Pass null if none.
relationshipType - is the optional link type between this document and its parent. Pass null if there is no relationship with a parent.
hopcountMethod - is the desired method for managing hopcounts.
dataNames - are the names of the data to carry down to the child from this parent.
dataValues - are the values to carry down to the child from this parent, corresponding to dataNames above. If CharacterInput objects are passed in here, it is the caller's responsibility to clean these up.
currentTime - is the time in milliseconds since epoch that will be recorded for this operation.
documentPriorities - are the desired document priorities for the documents.
prereqEventNames - are the events that must be completed before a document can be queued.
docIDs - are the local document identifiers.
Returns:
an array of boolean values indicating whether or not the passed-in priority value was used or not for each doc id (true if used).
Throws:
ManifoldCFException

addDocument

public boolean addDocument(java.lang.Long jobID,
                           java.lang.String[] legalLinkTypes,
                           java.lang.String docIDHash,
                           java.lang.String docID,
                           java.lang.String parentIdentifierHash,
                           java.lang.String relationshipType,
                           int hopcountMethod,
                           java.lang.String[] dataNames,
                           java.lang.Object[][] dataValues,
                           long currentTime,
                           double priority,
                           java.lang.String[] prereqEventNames)
                    throws ManifoldCFException
Add a document to the queue. This method is called during document processing, when a document reference is discovered. The document reference is passed to this method, which updates the status of the document in the specified job's queue, according to specific state rules.

Specified by:
addDocument in interface IJobManager
Parameters:
jobID - is the job identifier.
legalLinkTypes - is the set of legal link types that this connector generates.
docIDHash - is the local document identifier hash value.
parentIdentifierHash - is the optional parent identifier hash of this document. Pass null if none.
relationshipType - is the optional link type between this document and its parent. Pass null if there is no relationship with a parent.
hopcountMethod - is the desired method for managing hopcounts.
dataNames - are the names of the data to carry down to the child from this parent.
dataValues - are the values to carry down to the child from this parent, corresponding to dataNames above.
currentTime - is the time in milliseconds since epoch that will be recorded for this operation.
priority - is the desired document priority for the document.
prereqEventNames - are the events that must be completed before the document can be processed.
Returns:
true if the priority value was used, false otherwise.
Throws:
ManifoldCFException

finishDocuments

public DocumentDescription[] finishDocuments(java.lang.Long jobID,
                                             java.lang.String[] legalLinkTypes,
                                             java.lang.String[] parentIdentifierHashes,
                                             int hopcountMethod)
                                      throws ManifoldCFException
Complete adding child documents to the queue, for a set of documents. This method is called at the end of document processing, to help the hopcount tracking engine do its bookkeeping.

Specified by:
finishDocuments in interface IJobManager
Parameters:
jobID - is the job identifier.
legalLinkTypes - is the set of legal link types that this connector generates.
parentIdentifierHashes - are the document identifier hashes for whom child link extraction just took place.
hopcountMethod - describes how to handle deletions for hopcount purposes.
Returns:
the set of documents for which carrydown data was changed by this operation. These documents are likely to be requeued as a result of the change.
Throws:
ManifoldCFException

calculateAffectedRestoreCarrydownChildren

protected DocumentDescription[] calculateAffectedRestoreCarrydownChildren(java.lang.Long jobID,
                                                                          java.lang.String[] parentIDHashes)
                                                                   throws ManifoldCFException
Helper method: Calculate the unique set of affected carrydown children resulting from a "restoreRecords" operation.

Throws:
ManifoldCFException

processParentHashSet

protected void processParentHashSet(java.lang.Long jobID,
                                    java.util.HashMap resultHash,
                                    java.lang.String queryPart,
                                    java.util.ArrayList list)
                             throws ManifoldCFException
Helper method: look up rows affected by a restoreRecords operation.

Throws:
ManifoldCFException

beginEventSequence

public boolean beginEventSequence(java.lang.String eventName)
                           throws ManifoldCFException
Begin an event sequence.

Specified by:
beginEventSequence in interface IJobManager
Parameters:
eventName - is the name of the event.
Returns:
true if the event could be created, or false if it's already there.
Throws:
ManifoldCFException

completeEventSequence

public void completeEventSequence(java.lang.String eventName)
                           throws ManifoldCFException
Complete an event sequence.

Specified by:
completeEventSequence in interface IJobManager
Parameters:
eventName - is the name of the event.
Throws:
ManifoldCFException

carrydownChangeDocumentMultiple

public boolean[] carrydownChangeDocumentMultiple(DocumentDescription[] documentDescriptions,
                                                 long currentTime,
                                                 double[] docPriorities)
                                          throws ManifoldCFException
Requeue a document set because of carrydown changes. This method is called when carrydown data is modified for a set of documents. The documents must be requeued for immediate reprocessing, even to the extent that if one is *already* being processed, it will need to be done over again.

Specified by:
carrydownChangeDocumentMultiple in interface IJobManager
Parameters:
documentDescriptions - is the set of description objects for the documents that have had their parent carrydown information changed.
docPriorities - are the document priorities to assign to the documents, if needed.
Returns:
a flag for each document priority, true if it was used, false otherwise.
Throws:
ManifoldCFException

carrydownChangeDocument

public boolean carrydownChangeDocument(DocumentDescription documentDescription,
                                       long currentTime,
                                       double docPriority)
                                throws ManifoldCFException
Requeue a document because of carrydown changes. This method is called when carrydown data is modified for a document. The document must be requeued for immediate reprocessing, even to the extent that if it is *already* being processed, it will need to be done over again.

Specified by:
carrydownChangeDocument in interface IJobManager
Parameters:
documentDescription - is the description object for the document that has had its parent carrydown information changed.
docPriority - is the document priority to assign to the document, if needed.
Returns:
a flag for the document priority, true if it was used, false otherwise.
Throws:
ManifoldCFException

getRandomAmount

protected long getRandomAmount()
Sleep a random amount of time after a transaction abort.


sleepFor

protected void sleepFor(long amt)
                 throws ManifoldCFException
Throws:
ManifoldCFException

retrieveParentData

public java.lang.String[] retrieveParentData(java.lang.Long jobID,
                                             java.lang.String docIDHash,
                                             java.lang.String dataName)
                                      throws ManifoldCFException
Retrieve specific parent data for a given document.

Specified by:
retrieveParentData in interface IJobManager
Parameters:
jobID - is the job identifier.
docIDHash - is the document identifier hash value.
dataName - is the kind of data to retrieve.
Returns:
the unique data values.
Throws:
ManifoldCFException

retrieveParentDataAsFiles

public CharacterInput[] retrieveParentDataAsFiles(java.lang.Long jobID,
                                                  java.lang.String docIDHash,
                                                  java.lang.String dataName)
                                           throws ManifoldCFException
Retrieve specific parent data for a given document.

Specified by:
retrieveParentDataAsFiles in interface IJobManager
Parameters:
jobID - is the job identifier.
docIDHash - is the document identifier hash value.
dataName - is the kind of data to retrieve.
Returns:
the unique data values.
Throws:
ManifoldCFException

startJobs

public void startJobs(long currentTime,
                      java.util.ArrayList unwaitList)
               throws ManifoldCFException
Start all jobs in need of starting. This method marks all the appropriate jobs as "in progress", which is all that should be needed to start them. It's also the case that the start event should be logged in the event log. In order to make it possible for the caller to do this logging, a set of job ID's will be returned containing the jobs that were started.

Specified by:
startJobs in interface IJobManager
Parameters:
currentTime - is the current time in milliseconds since epoch.
unwaitList - is filled in with the set of job ID objects that were resumed.
Throws:
ManifoldCFException

waitJobs

public void waitJobs(long currentTime,
                     java.util.ArrayList waitList)
              throws ManifoldCFException
Put active or paused jobs in wait state, if they've exceeded their window.

Specified by:
waitJobs in interface IJobManager
Parameters:
currentTime - is the current time in milliseconds since epoch.
waitList - is filled in with the set of job ID's that were put into a wait state.
Throws:
ManifoldCFException

resetJobSchedule

public void resetJobSchedule(java.lang.Long jobID)
                      throws ManifoldCFException
Reset job schedule. This re-evaluates whether the job should be started now. This method would typically be called after a job's scheduling window has been changed.

Specified by:
resetJobSchedule in interface IJobManager
Parameters:
jobID - is the job identifier.
Throws:
ManifoldCFException

checkTimeMatch

protected static java.lang.Long checkTimeMatch(long startTime,
                                               long currentTimestamp,
                                               EnumeratedValues daysOfWeek,
                                               EnumeratedValues daysOfMonth,
                                               EnumeratedValues months,
                                               EnumeratedValues years,
                                               EnumeratedValues hours,
                                               EnumeratedValues minutes,
                                               java.lang.String timezone,
                                               java.lang.Long duration)
Check if the specified job parameters have a 'hit' within the specified interval.

Parameters:
startTime - is the start time.
currentTimestamp - is the end time.
daysOfWeek - is the enumerated days of the week, or null.
daysOfMonth - is the enumerated days of the month, or null.
months - is the enumerated months, or null.
years - is the enumerated years, or null.
hours - is the enumerated hours, or null.
minutes - is the enumerated minutes, or null.
Returns:
null if there is NO hit within the interval; otherwise the actual time of the hit in milliseconds from epoch is returned.

manualStart

public void manualStart(java.lang.Long jobID)
                 throws ManifoldCFException
Manually start a job. The specified job will be run REGARDLESS of the timed windows, and will not cease until complete. If the job is already running, this operation will assure that the job does not pause when its window ends. The job can be manually paused, or manually aborted.

Specified by:
manualStart in interface IJobManager
Parameters:
jobID - is the ID of the job to start.
Throws:
ManifoldCFException

noteJobDeleteStarted

public void noteJobDeleteStarted(java.lang.Long jobID,
                                 long startTime)
                          throws ManifoldCFException
Note job delete started.

Specified by:
noteJobDeleteStarted in interface IJobManager
Parameters:
jobID - is the job id.
startTime - is the job delete start time.
Throws:
ManifoldCFException

noteJobStarted

public void noteJobStarted(java.lang.Long jobID,
                           long startTime)
                    throws ManifoldCFException
Note job started.

Specified by:
noteJobStarted in interface IJobManager
Parameters:
jobID - is the job id.
startTime - is the job start time.
Throws:
ManifoldCFException

noteJobSeeded

public void noteJobSeeded(java.lang.Long jobID,
                          long seedTime)
                   throws ManifoldCFException
Note job seeded.

Specified by:
noteJobSeeded in interface IJobManager
Parameters:
jobID - is the job id.
seedTime - is the job seed time.
Throws:
ManifoldCFException

prepareDeleteScan

public void prepareDeleteScan(java.lang.Long jobID)
                       throws ManifoldCFException
Prepare for a delete scan.

Specified by:
prepareDeleteScan in interface IJobManager
Parameters:
jobID - is the job id.
Throws:
ManifoldCFException

prepareFullScan

public void prepareFullScan(java.lang.Long jobID,
                            java.lang.String[] legalLinkTypes,
                            int hopcountMethod)
                     throws ManifoldCFException
Prepare for a full scan.

Specified by:
prepareFullScan in interface IJobManager
Parameters:
jobID - is the job id.
legalLinkTypes - are the link types allowed for the job.
hopcountMethod - describes how to handle deletions for hopcount purposes.
Throws:
ManifoldCFException

prepareIncrementalScan

public void prepareIncrementalScan(java.lang.Long jobID,
                                   java.lang.String[] legalLinkTypes,
                                   int hopcountMethod)
                            throws ManifoldCFException
Prepare for an incremental scan.

Specified by:
prepareIncrementalScan in interface IJobManager
Parameters:
jobID - is the job id.
legalLinkTypes - are the link types allowed for the job.
hopcountMethod - describes how to handle deletions for hopcount purposes.
Throws:
ManifoldCFException

manualAbort

public void manualAbort(java.lang.Long jobID)
                 throws ManifoldCFException
Manually abort a running job. The job will be permanently stopped, and will not run again until automatically started based on schedule, or manually started.

Specified by:
manualAbort in interface IJobManager
Parameters:
jobID - is the job to abort.
Throws:
ManifoldCFException

manualAbortRestart

public void manualAbortRestart(java.lang.Long jobID)
                        throws ManifoldCFException
Manually restart a running job. The job will be stopped and restarted. Any schedule affinity will be lost, until the job finishes on its own.

Specified by:
manualAbortRestart in interface IJobManager
Parameters:
jobID - is the job to abort.
Throws:
ManifoldCFException

errorAbort

public boolean errorAbort(java.lang.Long jobID,
                          java.lang.String errorText)
                   throws ManifoldCFException
Abort a running job due to a fatal error condition.

Specified by:
errorAbort in interface IJobManager
Parameters:
jobID - is the job to abort.
errorText - is the error text.
Returns:
true if this is the first logged abort request for this job.
Throws:
ManifoldCFException

pauseJob

public void pauseJob(java.lang.Long jobID)
              throws ManifoldCFException
Pause a job.

Specified by:
pauseJob in interface IJobManager
Parameters:
jobID - is the job identifier to pause.
Throws:
ManifoldCFException

restartJob

public void restartJob(java.lang.Long jobID)
                throws ManifoldCFException
Restart a paused job.

Specified by:
restartJob in interface IJobManager
Parameters:
jobID - is the job identifier to restart.
Throws:
ManifoldCFException

getJobsReadyForSeeding

public JobStartRecord[] getJobsReadyForSeeding(long currentTime)
                                        throws ManifoldCFException
Get the list of jobs that are ready for seeding.

Specified by:
getJobsReadyForSeeding in interface IJobManager
Parameters:
currentTime - is the current time in milliseconds since epoch.
Returns:
jobs that are active and are running in adaptive mode. These will be seeded based on what the connector says should be added to the queue.
Throws:
ManifoldCFException

getJobsReadyForDelete

public JobStartRecord[] getJobsReadyForDelete()
                                       throws ManifoldCFException
Get the list of jobs that are ready for deletion.

Specified by:
getJobsReadyForDelete in interface IJobManager
Returns:
jobs that were in the "readyfordelete" state.
Throws:
ManifoldCFException

getJobsReadyForStartup

public JobStartRecord[] getJobsReadyForStartup()
                                        throws ManifoldCFException
Get the list of jobs that are ready for startup.

Specified by:
getJobsReadyForStartup in interface IJobManager
Returns:
jobs that were in the "readyforstartup" state. These will be marked as being in the "starting up" state.
Throws:
ManifoldCFException

inactivateJob

public void inactivateJob(java.lang.Long jobID)
                   throws ManifoldCFException
Inactivate a job, from the notification state.

Specified by:
inactivateJob in interface IJobManager
Parameters:
jobID - is the ID of the job to inactivate.
Throws:
ManifoldCFException

resetStartDeleteJob

public void resetStartDeleteJob(java.lang.Long jobID)
                         throws ManifoldCFException
Reset a job starting for delete back to "ready for delete" state.

Specified by:
resetStartDeleteJob in interface IJobManager
Parameters:
jobID - is the job id.
Throws:
ManifoldCFException

resetNotifyJob

public void resetNotifyJob(java.lang.Long jobID)
                    throws ManifoldCFException
Reset a job that is notifying back to "ready for notify" state.

Specified by:
resetNotifyJob in interface IJobManager
Parameters:
jobID - is the job id.
Throws:
ManifoldCFException

resetStartupJob

public void resetStartupJob(java.lang.Long jobID)
                     throws ManifoldCFException
Reset a starting job back to "ready for startup" state.

Specified by:
resetStartupJob in interface IJobManager
Parameters:
jobID - is the job id.
Throws:
ManifoldCFException

resetSeedJob

public void resetSeedJob(java.lang.Long jobID)
                  throws ManifoldCFException
Reset a seeding job back to "active" state.

Specified by:
resetSeedJob in interface IJobManager
Parameters:
jobID - is the job id.
Throws:
ManifoldCFException

deleteJobsReadyForDelete

public void deleteJobsReadyForDelete()
                              throws ManifoldCFException
Delete jobs in need of being deleted (which are marked "ready for delete"). This method is meant to be called periodically to perform delete processing on jobs.

Specified by:
deleteJobsReadyForDelete in interface IJobManager
Throws:
ManifoldCFException

finishJobs

public void finishJobs()
                throws ManifoldCFException
Put all eligible jobs in the "shutting down" state.

Specified by:
finishJobs in interface IJobManager
Throws:
ManifoldCFException

getJobsReadyForInactivity

public JobStartRecord[] getJobsReadyForInactivity()
                                           throws ManifoldCFException
Find the list of jobs that need to have their connectors notified of job completion.

Specified by:
getJobsReadyForInactivity in interface IJobManager
Returns:
the ID's of jobs that need their output connectors notified in order to become inactive.
Throws:
ManifoldCFException

finishJobAborts

public void finishJobAborts(long timestamp,
                            java.util.ArrayList abortJobs)
                     throws ManifoldCFException
Complete the sequence that aborts jobs and makes them runnable again.

Specified by:
finishJobAborts in interface IJobManager
Parameters:
timestamp - is the current time.
abortJobs - is the set of IJobDescription objects that were aborted (and stopped).
Throws:
ManifoldCFException

resetJobs

public void resetJobs(long currentTime,
                      java.util.ArrayList resetJobs)
               throws ManifoldCFException
Reset eligible jobs either back to the "inactive" state, or make them active again. The latter will occur if the cleanup phase of the job generated more pending documents. This method is used to pick up all jobs in the shutting down state whose purgatory or being-cleaned records have been all processed.

Specified by:
resetJobs in interface IJobManager
Parameters:
currentTime - is the current time in milliseconds since epoch.
resetJobs - is filled in with the set of IJobDescription objects that were reset.
Throws:
ManifoldCFException

getStatus

public JobStatus getStatus(java.lang.Long jobID)
                    throws ManifoldCFException
Get the status of a job.

Specified by:
getStatus in interface IJobManager
Returns:
the status object for the specified job.
Throws:
ManifoldCFException

getAllStatus

public JobStatus[] getAllStatus()
                         throws ManifoldCFException
Get a list of all jobs, and their status information.

Specified by:
getAllStatus in interface IJobManager
Returns:
an ordered array of job status objects.
Throws:
ManifoldCFException

getRunningJobs

public JobStatus[] getRunningJobs()
                           throws ManifoldCFException
Get a list of running jobs. This is for status reporting.

Specified by:
getRunningJobs in interface IJobManager
Returns:
an array of the job status objects.
Throws:
ManifoldCFException

getFinishedJobs

public JobStatus[] getFinishedJobs()
                            throws ManifoldCFException
Get a list of completed jobs, and their statistics.

Specified by:
getFinishedJobs in interface IJobManager
Returns:
an array of the job status objects.
Throws:
ManifoldCFException

makeJobStatus

protected JobStatus[] makeJobStatus(java.lang.String whereClause,
                                    java.util.ArrayList whereParams)
                             throws ManifoldCFException
Make a job status array from a query result.

Parameters:
whereClause - is the where clause for the jobs we are interested in.
Returns:
the status array.
Throws:
ManifoldCFException

genDocumentStatus

public IResultSet genDocumentStatus(java.lang.String connectionName,
                                    StatusFilterCriteria filterCriteria,
                                    SortOrder sortOrder,
                                    int startRow,
                                    int rowCount)
                             throws ManifoldCFException
Run a 'document status' report.

Specified by:
genDocumentStatus in interface IJobManager
Parameters:
connectionName - is the name of the connection.
filterCriteria - are the criteria used to limit the records considered for the report.
sortOrder - is the specified sort order of the final report.
startRow - is the first row to include.
rowCount - is the number of rows to include.
Returns:
the results, with the following columns: identifier, job, state, status, scheduled, action, retrycount, retrylimit. The "scheduled" column and the "retrylimit" column are long values representing a time; all other values will be user-friendly strings.
Throws:
ManifoldCFException

genQueueStatus

public IResultSet genQueueStatus(java.lang.String connectionName,
                                 StatusFilterCriteria filterCriteria,
                                 SortOrder sortOrder,
                                 BucketDescription idBucketDescription,
                                 int startRow,
                                 int rowCount)
                          throws ManifoldCFException
Run a 'queue status' report.

Specified by:
genQueueStatus in interface IJobManager
Parameters:
connectionName - is the name of the connection.
filterCriteria - are the criteria used to limit the records considered for the report.
sortOrder - is the specified sort order of the final report.
idBucketDescription - is the bucket description for generating the identifier class.
startRow - is the first row to include.
rowCount - is the number of rows to include.
Returns:
the results, with the following columns: idbucket, inactive, processing, expiring, deleting, processready, expireready, processwaiting, expirewaiting
Throws:
ManifoldCFException

addBucketExtract

protected void addBucketExtract(java.lang.StringBuffer sb,
                                java.util.ArrayList list,
                                java.lang.String columnPrefix,
                                java.lang.String columnName,
                                BucketDescription bucketDesc)
Turn a bucket description into a return column. This is complicated by the fact that the extraction code is inherently case sensitive. So if case insensitive is desired, that means we whack the whole thing to lower case before doing the match.


addCriteria

protected boolean addCriteria(java.lang.StringBuffer sb,
                              java.util.ArrayList list,
                              java.lang.String fieldPrefix,
                              java.lang.String connectionName,
                              StatusFilterCriteria criteria,
                              boolean whereEmitted)
                       throws ManifoldCFException
Add criteria clauses to query.

Throws:
ManifoldCFException

emitClauseStart

protected boolean emitClauseStart(java.lang.StringBuffer sb,
                                  boolean whereEmitted)
Emit a WHERE or an AND, depending...


addOrdering

protected void addOrdering(java.lang.StringBuffer sb,
                           java.lang.String[] completeFieldList,
                           SortOrder sort)
Add ordering.


addLimits

protected void addLimits(java.lang.StringBuffer sb,
                         int startRow,
                         int maxRowCount)
Add limit and offset.