org.apache.manifoldcf.crawler.interfaces
Interface IProcessActivity

All Superinterfaces:
IAbortActivity, IEventActivity, IFingerprintActivity, IHistoryActivity, INamingActivity
All Known Implementing Classes:
WorkerThread.ProcessActivity

public interface IProcessActivity
extends IHistoryActivity, IEventActivity, IAbortActivity, IFingerprintActivity

This interface abstracts from the activities that a fetched document processor can do.


Field Summary
static java.lang.String _rcsid
           
 
Method Summary
 void addDocumentReference(java.lang.String localIdentifier)
          Add a document description to the current job's queue.
 void addDocumentReference(java.lang.String localIdentifier, java.lang.String parentIdentifier, java.lang.String relationshipType)
          Add a document description to the current job's queue.
 void addDocumentReference(java.lang.String localIdentifier, java.lang.String parentIdentifier, java.lang.String relationshipType, java.lang.String[] dataNames, java.lang.Object[][] dataValues)
          Add a document description to the current job's queue.
 void addDocumentReference(java.lang.String localIdentifier, java.lang.String parentIdentifier, java.lang.String relationshipType, java.lang.String[] dataNames, java.lang.Object[][] dataValues, java.lang.Long originationTime)
          Add a document description to the current job's queue.
 void addDocumentReference(java.lang.String localIdentifier, java.lang.String parentIdentifier, java.lang.String relationshipType, java.lang.String[] dataNames, java.lang.Object[][] dataValues, java.lang.Long originationTime, java.lang.String[] prereqEventNames)
          Add a document description to the current job's queue.
 void deleteDocument(java.lang.String localIdentifier)
          Delete the current document from the search engine index.
 void ingestDocument(java.lang.String localIdentifier, java.lang.String version, java.lang.String documentURI, RepositoryDocument data)
          Ingest the current document.
 void recordDocument(java.lang.String localIdentifier, java.lang.String version)
          Record a document version, but don't ingest it.
 java.lang.String[] retrieveParentData(java.lang.String localIdentifier, java.lang.String dataName)
          Retrieve data passed from parents to a specified child document.
 CharacterInput[] retrieveParentDataAsFiles(java.lang.String localIdentifier, java.lang.String dataName)
          Retrieve data passed from parents to a specified child document.
 void setDocumentOriginationTime(java.lang.String localIdentifier, java.lang.Long originationTime)
          Override a document's origination time.
 void setDocumentScheduleBounds(java.lang.String localIdentifier, java.lang.Long lowerRecrawlBoundTime, java.lang.Long upperRecrawlBoundTime, java.lang.Long lowerExpireBoundTime, java.lang.Long upperExpireBoundTime)
          Override the schedule for the next time a document is crawled.
 
Methods inherited from interface org.apache.manifoldcf.crawler.interfaces.IHistoryActivity
recordActivity
 
Methods inherited from interface org.apache.manifoldcf.crawler.interfaces.IEventActivity
beginEventSequence, completeEventSequence, retryDocumentProcessing
 
Methods inherited from interface org.apache.manifoldcf.crawler.interfaces.INamingActivity
createConnectionSpecificString, createGlobalString, createJobSpecificString
 
Methods inherited from interface org.apache.manifoldcf.crawler.interfaces.IAbortActivity
checkJobStillActive
 
Methods inherited from interface org.apache.manifoldcf.crawler.interfaces.IFingerprintActivity
checkDocumentIndexable, checkMimeTypeIndexable
 

Field Detail

_rcsid

static final java.lang.String _rcsid
See Also:
Constant Field Values
Method Detail

addDocumentReference

void addDocumentReference(java.lang.String localIdentifier,
                          java.lang.String parentIdentifier,
                          java.lang.String relationshipType,
                          java.lang.String[] dataNames,
                          java.lang.Object[][] dataValues,
                          java.lang.Long originationTime,
                          java.lang.String[] prereqEventNames)
                          throws ManifoldCFException
Add a document description to the current job's queue.

Parameters:
localIdentifier - is the local document identifier to add (for the connector that fetched the document).
parentIdentifier - is the document identifier that is considered to be the "parent" of this identifier. May be null, if no hopcount filtering desired for this kind of relationship.
relationshipType - is the string describing the kind of relationship described by this reference. This must be one of the strings returned by the IRepositoryConnector method "getRelationshipTypes()". May be null.
dataNames - is the list of carry-down data from the parent to the child. May be null. Each name is limited to 255 characters!
dataValues - are the values that correspond to the data names in the dataNames parameter. May be null only if dataNames is null. The type of each object must either be a String, or a CharacterInput.
originationTime - is the time, in ms since epoch, that the document originated. Pass null if none or unknown.
prereqEventNames - are the names of the prerequisite events which this document requires prior to processing. Pass null if none.
Throws:
ManifoldCFException

addDocumentReference

void addDocumentReference(java.lang.String localIdentifier,
                          java.lang.String parentIdentifier,
                          java.lang.String relationshipType,
                          java.lang.String[] dataNames,
                          java.lang.Object[][] dataValues,
                          java.lang.Long originationTime)
                          throws ManifoldCFException
Add a document description to the current job's queue.

Parameters:
localIdentifier - is the local document identifier to add (for the connector that fetched the document).
parentIdentifier - is the document identifier that is considered to be the "parent" of this identifier. May be null, if no hopcount filtering desired for this kind of relationship.
relationshipType - is the string describing the kind of relationship described by this reference. This must be one of the strings returned by the IRepositoryConnector method "getRelationshipTypes()". May be null.
dataNames - is the list of carry-down data from the parent to the child. May be null. Each name is limited to 255 characters!
dataValues - are the values that correspond to the data names in the dataNames parameter. May be null only if dataNames is null. The type of each object must either be a String, or a CharacterInput.
originationTime - is the time, in ms since epoch, that the document originated. Pass null if none or unknown.
Throws:
ManifoldCFException

addDocumentReference

void addDocumentReference(java.lang.String localIdentifier,
                          java.lang.String parentIdentifier,
                          java.lang.String relationshipType,
                          java.lang.String[] dataNames,
                          java.lang.Object[][] dataValues)
                          throws ManifoldCFException
Add a document description to the current job's queue.

Parameters:
localIdentifier - is the local document identifier to add (for the connector that fetched the document).
parentIdentifier - is the document identifier that is considered to be the "parent" of this identifier. May be null, if no hopcount filtering desired for this kind of relationship.
relationshipType - is the string describing the kind of relationship described by this reference. This must be one of the strings returned by the IRepositoryConnector method "getRelationshipTypes()". May be null.
dataNames - is the list of carry-down data from the parent to the child. May be null. Each name is limited to 255 characters!
dataValues - are the values that correspond to the data names in the dataNames parameter. May be null only if dataNames is null. The type of each object must either be a String, or a CharacterInput.
Throws:
ManifoldCFException

addDocumentReference

void addDocumentReference(java.lang.String localIdentifier,
                          java.lang.String parentIdentifier,
                          java.lang.String relationshipType)
                          throws ManifoldCFException
Add a document description to the current job's queue.

Parameters:
localIdentifier - is the local document identifier to add (for the connector that fetched the document).
parentIdentifier - is the document identifier that is considered to be the "parent" of this identifier. May be null, if no hopcount filtering desired for this kind of relationship.
relationshipType - is the string describing the kind of relationship described by this reference. This must be one of the strings returned by the IRepositoryConnector method "getRelationshipTypes()". May be null.
Throws:
ManifoldCFException

addDocumentReference

void addDocumentReference(java.lang.String localIdentifier)
                          throws ManifoldCFException
Add a document description to the current job's queue. This method is equivalent to addDocumentReference(localIdentifier,null,null).

Parameters:
localIdentifier - is the local document identifier to add (for the connector that fetched the document).
Throws:
ManifoldCFException

retrieveParentData

java.lang.String[] retrieveParentData(java.lang.String localIdentifier,
                                      java.lang.String dataName)
                                      throws ManifoldCFException
Retrieve data passed from parents to a specified child document.

Parameters:
localIdentifier - is the document identifier of the document we want the recorded data for.
dataName - is the name of the data items to retrieve.
Returns:
an array containing the unique data values passed from ALL parents. Note that these are in no particular order, and there will not be any duplicates.
Throws:
ManifoldCFException

retrieveParentDataAsFiles

CharacterInput[] retrieveParentDataAsFiles(java.lang.String localIdentifier,
                                           java.lang.String dataName)
                                           throws ManifoldCFException
Retrieve data passed from parents to a specified child document.

Parameters:
localIdentifier - is the document identifier of the document we want the recorded data for.
dataName - is the name of the data items to retrieve.
Returns:
an array containing the unique data values passed from ALL parents. Note that these are in no particular order, and there will not be any duplicates.
Throws:
ManifoldCFException

recordDocument

void recordDocument(java.lang.String localIdentifier,
                    java.lang.String version)
                    throws ManifoldCFException,
                           ServiceInterruption
Record a document version, but don't ingest it.

Parameters:
localIdentifier - is the document identifier.
version - is the document version.
Throws:
ManifoldCFException
ServiceInterruption

ingestDocument

void ingestDocument(java.lang.String localIdentifier,
                    java.lang.String version,
                    java.lang.String documentURI,
                    RepositoryDocument data)
                    throws ManifoldCFException,
                           ServiceInterruption
Ingest the current document.

Parameters:
localIdentifier - is the document's local identifier.
version - is the version of the document, as reported by the getDocumentVersions() method of the corresponding repository connector.
documentURI - is the URI to use to retrieve this document from the search interface (and is also the unique key in the index).
data - is the document data. The data is closed after ingestion is complete.
Throws:
ManifoldCFException
ServiceInterruption

deleteDocument

void deleteDocument(java.lang.String localIdentifier)
                    throws ManifoldCFException,
                           ServiceInterruption
Delete the current document from the search engine index.

Parameters:
localIdentifier - is the document's local identifier.
Throws:
ManifoldCFException
ServiceInterruption

setDocumentScheduleBounds

void setDocumentScheduleBounds(java.lang.String localIdentifier,
                               java.lang.Long lowerRecrawlBoundTime,
                               java.lang.Long upperRecrawlBoundTime,
                               java.lang.Long lowerExpireBoundTime,
                               java.lang.Long upperExpireBoundTime)
                               throws ManifoldCFException
Override the schedule for the next time a document is crawled. Calling this method allows you to set an upper recrawl bound, lower recrawl bound, upper expire bound, lower expire bound, or a combination of these, on a specific document. This method is only effective if the job is a continuous one, and if the identifier you pass in is being processed.

Parameters:
localIdentifier - is the document's local identifier.
lowerRecrawlBoundTime - is the time in ms since epoch that the reschedule time should not fall BELOW, or null if none.
upperRecrawlBoundTime - is the time in ms since epoch that the reschedule time should not rise ABOVE, or null if none.
lowerExpireBoundTime - is the time in ms since epoch that the expire time should not fall BELOW, or null if none.
upperExpireBoundTime - is the time in ms since epoch that the expire time should not rise ABOVE, or null if none.
Throws:
ManifoldCFException

setDocumentOriginationTime

void setDocumentOriginationTime(java.lang.String localIdentifier,
                                java.lang.Long originationTime)
                                throws ManifoldCFException
Override a document's origination time. Use this method to signal the framework that a document's origination time is something other than the first time it was crawled.

Parameters:
localIdentifier - is the document's local identifier.
originationTime - is the document's origination time, or null if unknown.
Throws:
ManifoldCFException