org.apache.manifoldcf.agents.incrementalingest
Class IncrementalIngester

java.lang.Object
  extended by org.apache.manifoldcf.core.database.BaseTable
      extended by org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester
All Implemented Interfaces:
IIncrementalIngester

public class IncrementalIngester
extends BaseTable
implements IIncrementalIngester

Incremental ingestion API implementation. This class is responsible for keeping track of what has been sent where, and also the corresponding version of each document so indexed. The space over which this takes place is defined by the individual output connection - that is, the output connection seems to "remember" what documents were handed to it. A secondary purpose of this module is to provide a mapping between the key by which a document is described internally (by an identifier hash, plus the name of an identifier space), and the way the document is identified in the output space (by the name of an output connection, plus a URI which is considered local to that output connection space).


Nested Class Summary
protected static class IncrementalIngester.DeleteInfo
          This class contains the information necessary to delete a document
 
Field Summary
static java.lang.String _rcsid
           
protected static java.lang.String authorityNameField
           
protected static java.lang.String changeCountField
           
protected  IOutputConnectionManager connectionManager
           
protected static java.lang.String docKeyField
           
protected static java.lang.String docURIField
           
protected static java.lang.String firstIngestField
           
protected static java.lang.String idField
           
protected static java.lang.String lastIngestField
           
protected static java.lang.String lastOutputVersionField
           
protected static java.lang.String lastVersionField
           
protected  ILockManager lockManager
           
protected static java.lang.String outputConnNameField
           
protected  IThreadContext threadContext
           
protected static java.lang.String uriHashField
           
 
Fields inherited from class org.apache.manifoldcf.core.database.BaseTable
dbInterface, tableName
 
Constructor Summary
IncrementalIngester(IThreadContext threadContext, IDBInterface database)
          Constructor.
 
Method Summary
protected  int addOrReplaceDocument(IOutputConnection connection, java.lang.String documentURI, java.lang.String outputDescription, RepositoryDocument document, java.lang.String authorityNameString, IOutputAddActivity activities)
          Add or replace document, using the specified output connection, via the standard pool.
 boolean checkDocumentIndexable(java.lang.String outputConnectionName, java.io.File localFile)
          Check if a file is indexable.
 boolean checkMimeTypeIndexable(java.lang.String outputConnectionName, java.lang.String mimeType)
          Check if a mime type is indexable.
 void clearAll()
          Flush all knowledge of what was ingested before.
 void deinstall()
          Uninstall the incremental ingestion manager.
protected  void deleteRowIds(java.util.ArrayList list, java.lang.String queryPart)
          Delete a chunk of row ids.
 void documentCheck(java.lang.String outputConnectionName, java.lang.String identifierClass, java.lang.String identifierHash, long checkTime)
          Note the fact that we checked a document (and found that it did not need to be ingested, because the versions agreed).
 void documentCheckMultiple(java.lang.String outputConnectionName, java.lang.String[] identifierClasses, java.lang.String[] identifierHashes, long checkTime)
          Note the fact that we checked a document (and found that it did not need to be ingested, because the versions agreed).
 void documentDelete(java.lang.String outputConnectionName, java.lang.String identifierClass, java.lang.String identifierHash, IOutputRemoveActivity activities)
          Delete a document from the search engine index.
 void documentDeleteMultiple(java.lang.String[] outputConnectionNames, java.lang.String[] identifierClasses, java.lang.String[] identifierHashes, IOutputRemoveActivity activities)
          Delete multiple documents from the search engine index.
 void documentDeleteMultiple(java.lang.String outputConnectionName, java.lang.String[] identifierClasses, java.lang.String[] identifierHashes, IOutputRemoveActivity activities)
          Delete multiple documents from the search engine index.
 boolean documentIngest(java.lang.String outputConnectionName, java.lang.String identifierClass, java.lang.String identifierHash, java.lang.String documentVersion, java.lang.String outputVersion, java.lang.String authorityName, RepositoryDocument data, long ingestTime, java.lang.String documentURI, IOutputActivity activities)
          Ingest a document.
 void documentRecord(java.lang.String outputConnectionName, java.lang.String identifierClass, java.lang.String identifierHash, java.lang.String documentVersion, long recordTime, IOutputActivity activities)
          Record a document version, but don't ingest it.
protected  void findRowIdsForDocIds(java.lang.String outputConnectionName, java.util.HashMap rowIDSet, java.util.ArrayList paramValues, java.lang.String paramList)
          Given values and parameters corresponding to a set of hash values, add corresponding table row id's to the output map.
protected  void findRowIdsForURIs(java.lang.String outputConnectionName, java.util.HashMap rowIDSet, java.util.HashMap uris, java.util.ArrayList hashParamValues, java.lang.String paramList)
          Given values and parameters corresponding to a set of hash values, add corresponding table row id's to the output map.
 DocumentIngestStatus getDocumentIngestData(java.lang.String outputConnectionName, java.lang.String identifierClass, java.lang.String identifierHash)
          Look up ingestion data for a documents.
protected  void getDocumentIngestDataChunk(DocumentIngestStatus[] rval, java.util.Map map, java.lang.String outputConnectionName, java.lang.String clause, java.util.ArrayList list)
          Get a chunk of document ingest data records.
 DocumentIngestStatus[] getDocumentIngestDataMultiple(java.lang.String[] outputConnectionNames, java.lang.String[] identifierClasses, java.lang.String[] identifierHashes)
          Look up ingestion data for a SET of documents.
 DocumentIngestStatus[] getDocumentIngestDataMultiple(java.lang.String outputConnectionName, java.lang.String[] identifierClasses, java.lang.String[] identifierHashes)
          Look up ingestion data for a SET of documents.
 long getDocumentUpdateInterval(java.lang.String outputConnectionName, java.lang.String identifierClass, java.lang.String identifierHash)
          Calculate the average time interval between changes for a document.
 long[] getDocumentUpdateIntervalMultiple(java.lang.String outputConnectionName, java.lang.String[] identifierClasses, java.lang.String[] identifierHashes)
          Calculate the average time interval between changes for a document.
protected  void getDocumentURIChunk(IncrementalIngester.DeleteInfo[] rval, java.util.Map map, java.lang.String outputConnectionName, java.lang.String clause, java.util.ArrayList list)
          Get a chunk of document uris.
protected  IncrementalIngester.DeleteInfo[] getDocumentURIMultiple(java.lang.String outputConnectionName, java.lang.String[] identifierClasses, java.lang.String[] identifierHashes)
          Find out what URIs a SET of document URIs are currently ingested.
protected  void getIntervals(long[] rval, java.lang.String outputConnectionName, java.util.ArrayList list, java.lang.String queryPart, java.util.HashMap returnMap)
          Query for and calculate the interval for a bunch of hashcodes.
 void install()
          Install the incremental ingestion manager.
protected static java.lang.String makeKey(java.lang.String documentClass, java.lang.String documentHash)
          Make a key from a document class and a hash
protected  void noteDocumentIngest(java.lang.String outputConnectionName, java.lang.String docKey, java.lang.String documentVersion, java.lang.String outputVersion, java.lang.String authorityNameString, long ingestTime, java.lang.String documentURI, java.lang.String documentURIHash)
          Note the ingestion of a document, or the "update" of a document.
protected  boolean performIngestion(IOutputConnection connection, java.lang.String docKey, java.lang.String documentVersion, java.lang.String outputVersion, java.lang.String authorityNameString, RepositoryDocument data, long ingestTime, java.lang.String documentURI, IOutputActivity activities)
          Do the actual ingestion, or just record it if there's nothing to ingest.
protected  void removeDocument(IOutputConnection connection, java.lang.String documentURI, java.lang.String outputDescription, IOutputRemoveActivity activities)
          Remove document, using the specified output connection, via the standard pool.
 void resetOutputConnection(java.lang.String outputConnectionName)
          Reset all documents belonging to a specific output connection, because we've got information that that system has been reconfigured.
protected  void updateRowIds(java.util.ArrayList list, java.lang.String queryPart, long checkTime)
          Update a chunk of row ids.
 
Methods inherited from class org.apache.manifoldcf.core.database.BaseTable
addTableIndex, analyzeTable, beginTransaction, constructDistinctOnClause, constructOffsetLimitClause, constructRegexpClause, constructSubstringClause, endTransaction, getDatabaseCacheKey, getDBInterface, getMaxInClause, getMaxOrClause, getTableIndexes, getTableName, getTableSchema, getTransactionID, makeTableKey, noteModifications, performAddIndex, performAlter, performCreate, performDelete, performDrop, performInsert, performLock, performModification, performQuery, performQuery, performRemoveIndex, performUpdate, prepareRowForSave, readRow, reindexTable, signalRollback
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_rcsid

public static final java.lang.String _rcsid
See Also:
Constant Field Values

idField

protected static final java.lang.String idField
See Also:
Constant Field Values

outputConnNameField

protected static final java.lang.String outputConnNameField
See Also:
Constant Field Values

docKeyField

protected static final java.lang.String docKeyField
See Also:
Constant Field Values

docURIField

protected static final java.lang.String docURIField
See Also:
Constant Field Values

uriHashField

protected static final java.lang.String uriHashField
See Also:
Constant Field Values

lastVersionField

protected static final java.lang.String lastVersionField
See Also:
Constant Field Values

lastOutputVersionField

protected static final java.lang.String lastOutputVersionField
See Also:
Constant Field Values

changeCountField

protected static final java.lang.String changeCountField
See Also:
Constant Field Values

firstIngestField

protected static final java.lang.String firstIngestField
See Also:
Constant Field Values

lastIngestField

protected static final java.lang.String lastIngestField
See Also:
Constant Field Values

authorityNameField

protected static final java.lang.String authorityNameField
See Also:
Constant Field Values

threadContext

protected IThreadContext threadContext

lockManager

protected ILockManager lockManager

connectionManager

protected IOutputConnectionManager connectionManager
Constructor Detail

IncrementalIngester

public IncrementalIngester(IThreadContext threadContext,
                           IDBInterface database)
                    throws ManifoldCFException
Constructor.

Throws:
ManifoldCFException
Method Detail

install

public void install()
             throws ManifoldCFException
Install the incremental ingestion manager.

Specified by:
install in interface IIncrementalIngester
Throws:
ManifoldCFException

deinstall

public void deinstall()
               throws ManifoldCFException
Uninstall the incremental ingestion manager.

Specified by:
deinstall in interface IIncrementalIngester
Throws:
ManifoldCFException

clearAll

public void clearAll()
              throws ManifoldCFException
Flush all knowledge of what was ingested before.

Specified by:
clearAll in interface IIncrementalIngester
Throws:
ManifoldCFException

checkMimeTypeIndexable

public boolean checkMimeTypeIndexable(java.lang.String outputConnectionName,
                                      java.lang.String mimeType)
                               throws ManifoldCFException,
                                      ServiceInterruption
Check if a mime type is indexable.

Specified by:
checkMimeTypeIndexable in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
mimeType - is the mime type to check.
Returns:
true if the mimeType is indexable.
Throws:
ManifoldCFException
ServiceInterruption

checkDocumentIndexable

public boolean checkDocumentIndexable(java.lang.String outputConnectionName,
                                      java.io.File localFile)
                               throws ManifoldCFException,
                                      ServiceInterruption
Check if a file is indexable.

Specified by:
checkDocumentIndexable in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
localFile - is the local file to check.
Returns:
true if the local file is indexable.
Throws:
ManifoldCFException
ServiceInterruption

documentRecord

public void documentRecord(java.lang.String outputConnectionName,
                           java.lang.String identifierClass,
                           java.lang.String identifierHash,
                           java.lang.String documentVersion,
                           long recordTime,
                           IOutputActivity activities)
                    throws ManifoldCFException,
                           ServiceInterruption
Record a document version, but don't ingest it. The purpose of this method is to keep track of the frequency at which ingestion "attempts" take place. ServiceInterruption is thrown if this action must be rescheduled.

Specified by:
documentRecord in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
identifierClass - is the name of the space in which the identifier hash should be interpreted.
identifierHash - is the hashed document identifier.
documentVersion - is the document version.
recordTime - is the time at which the recording took place, in milliseconds since epoch.
activities - is the object used in case a document needs to be removed from the output index as the result of this operation.
Throws:
ManifoldCFException
ServiceInterruption

documentIngest

public boolean documentIngest(java.lang.String outputConnectionName,
                              java.lang.String identifierClass,
                              java.lang.String identifierHash,
                              java.lang.String documentVersion,
                              java.lang.String outputVersion,
                              java.lang.String authorityName,
                              RepositoryDocument data,
                              long ingestTime,
                              java.lang.String documentURI,
                              IOutputActivity activities)
                       throws ManifoldCFException,
                              ServiceInterruption
Ingest a document. This ingests the document, and notes it. If this is a repeat ingestion of the document, this method also REMOVES ALL OLD METADATA. When complete, the index will contain only the metadata described by the RepositoryDocument object passed to this method. ServiceInterruption is thrown if the document ingestion must be rescheduled.

Specified by:
documentIngest in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
identifierClass - is the name of the space in which the identifier hash should be interpreted.
identifierHash - is the hashed document identifier.
documentVersion - is the document version.
outputVersion - is the output version string constructed from the output specification by the output connector.
authorityName - is the name of the authority associated with the document, if any.
data - is the document data. The data is closed after ingestion is complete.
ingestTime - is the time at which the ingestion took place, in milliseconds since epoch.
documentURI - is the URI of the document, which will be used as the key of the document in the index.
activities - is an object providing a set of methods that the implementer can use to perform the operation.
Returns:
true if the ingest was ok, false if the ingest is illegal (and should not be repeated).
Throws:
ManifoldCFException
ServiceInterruption

performIngestion

protected boolean performIngestion(IOutputConnection connection,
                                   java.lang.String docKey,
                                   java.lang.String documentVersion,
                                   java.lang.String outputVersion,
                                   java.lang.String authorityNameString,
                                   RepositoryDocument data,
                                   long ingestTime,
                                   java.lang.String documentURI,
                                   IOutputActivity activities)
                            throws ManifoldCFException,
                                   ServiceInterruption
Do the actual ingestion, or just record it if there's nothing to ingest.

Throws:
ManifoldCFException
ServiceInterruption

documentCheckMultiple

public void documentCheckMultiple(java.lang.String outputConnectionName,
                                  java.lang.String[] identifierClasses,
                                  java.lang.String[] identifierHashes,
                                  long checkTime)
                           throws ManifoldCFException
Note the fact that we checked a document (and found that it did not need to be ingested, because the versions agreed).

Specified by:
documentCheckMultiple in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.
identifierHashes - are the set of document identifier hashes.
checkTime - is the time at which the check took place, in milliseconds since epoch.
Throws:
ManifoldCFException

documentCheck

public void documentCheck(java.lang.String outputConnectionName,
                          java.lang.String identifierClass,
                          java.lang.String identifierHash,
                          long checkTime)
                   throws ManifoldCFException
Note the fact that we checked a document (and found that it did not need to be ingested, because the versions agreed).

Specified by:
documentCheck in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
identifierClass - is the name of the space in which the identifier hash should be interpreted.
identifierHash - is the hashed document identifier.
checkTime - is the time at which the check took place, in milliseconds since epoch.
Throws:
ManifoldCFException

updateRowIds

protected void updateRowIds(java.util.ArrayList list,
                            java.lang.String queryPart,
                            long checkTime)
                     throws ManifoldCFException
Update a chunk of row ids.

Throws:
ManifoldCFException

documentDeleteMultiple

public void documentDeleteMultiple(java.lang.String[] outputConnectionNames,
                                   java.lang.String[] identifierClasses,
                                   java.lang.String[] identifierHashes,
                                   IOutputRemoveActivity activities)
                            throws ManifoldCFException,
                                   ServiceInterruption
Delete multiple documents from the search engine index.

Specified by:
documentDeleteMultiple in interface IIncrementalIngester
Parameters:
outputConnectionNames - are the names of the output connections associated with this action.
identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.
identifierHashes - is tha array of document identifier hashes if the documents.
activities - is the object to use to log the details of the ingestion attempt. May be null.
Throws:
ManifoldCFException
ServiceInterruption

documentDeleteMultiple

public void documentDeleteMultiple(java.lang.String outputConnectionName,
                                   java.lang.String[] identifierClasses,
                                   java.lang.String[] identifierHashes,
                                   IOutputRemoveActivity activities)
                            throws ManifoldCFException,
                                   ServiceInterruption
Delete multiple documents from the search engine index.

Specified by:
documentDeleteMultiple in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.
identifierHashes - is tha array of document identifier hashes if the documents.
activities - is the object to use to log the details of the ingestion attempt. May be null.
Throws:
ManifoldCFException
ServiceInterruption

findRowIdsForURIs

protected void findRowIdsForURIs(java.lang.String outputConnectionName,
                                 java.util.HashMap rowIDSet,
                                 java.util.HashMap uris,
                                 java.util.ArrayList hashParamValues,
                                 java.lang.String paramList)
                          throws ManifoldCFException
Given values and parameters corresponding to a set of hash values, add corresponding table row id's to the output map.

Throws:
ManifoldCFException

findRowIdsForDocIds

protected void findRowIdsForDocIds(java.lang.String outputConnectionName,
                                   java.util.HashMap rowIDSet,
                                   java.util.ArrayList paramValues,
                                   java.lang.String paramList)
                            throws ManifoldCFException
Given values and parameters corresponding to a set of hash values, add corresponding table row id's to the output map.

Throws:
ManifoldCFException

deleteRowIds

protected void deleteRowIds(java.util.ArrayList list,
                            java.lang.String queryPart)
                     throws ManifoldCFException
Delete a chunk of row ids.

Throws:
ManifoldCFException

documentDelete

public void documentDelete(java.lang.String outputConnectionName,
                           java.lang.String identifierClass,
                           java.lang.String identifierHash,
                           IOutputRemoveActivity activities)
                    throws ManifoldCFException,
                           ServiceInterruption
Delete a document from the search engine index.

Specified by:
documentDelete in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
identifierClass - is the name of the space in which the identifier hash should be interpreted.
identifierHash - is the hash of the id of the document.
activities - is the object to use to log the details of the ingestion attempt. May be null.
Throws:
ManifoldCFException
ServiceInterruption

getDocumentURIMultiple

protected IncrementalIngester.DeleteInfo[] getDocumentURIMultiple(java.lang.String outputConnectionName,
                                                                  java.lang.String[] identifierClasses,
                                                                  java.lang.String[] identifierHashes)
                                                           throws ManifoldCFException
Find out what URIs a SET of document URIs are currently ingested.

Parameters:
identifierHashes - is the array of document id's to check.
Returns:
the array of current document uri's. Null returned for identifiers that don't exist in the index.
Throws:
ManifoldCFException

getDocumentIngestDataMultiple

public DocumentIngestStatus[] getDocumentIngestDataMultiple(java.lang.String[] outputConnectionNames,
                                                            java.lang.String[] identifierClasses,
                                                            java.lang.String[] identifierHashes)
                                                     throws ManifoldCFException
Look up ingestion data for a SET of documents.

Specified by:
getDocumentIngestDataMultiple in interface IIncrementalIngester
Parameters:
outputConnectionNames - are the names of the output connections associated with this action.
identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.
identifierHashes - is the array of document identifier hashes to look up.
Returns:
the array of document data. Null will come back for any identifier that doesn't exist in the index.
Throws:
ManifoldCFException

getDocumentIngestDataMultiple

public DocumentIngestStatus[] getDocumentIngestDataMultiple(java.lang.String outputConnectionName,
                                                            java.lang.String[] identifierClasses,
                                                            java.lang.String[] identifierHashes)
                                                     throws ManifoldCFException
Look up ingestion data for a SET of documents.

Specified by:
getDocumentIngestDataMultiple in interface IIncrementalIngester
Parameters:
outputConnectionName - is the names of the output connection associated with this action.
identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.
identifierHashes - is the array of document identifier hashes to look up.
Returns:
the array of document data. Null will come back for any identifier that doesn't exist in the index.
Throws:
ManifoldCFException

getDocumentIngestData

public DocumentIngestStatus getDocumentIngestData(java.lang.String outputConnectionName,
                                                  java.lang.String identifierClass,
                                                  java.lang.String identifierHash)
                                           throws ManifoldCFException
Look up ingestion data for a documents.

Specified by:
getDocumentIngestData in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
identifierClass - is the name of the space in which the identifier hash should be interpreted.
identifierHash - is the hash of the id of the document.
Returns:
the current document's ingestion data, or null if the document is not currently ingested.
Throws:
ManifoldCFException

getDocumentUpdateInterval

public long getDocumentUpdateInterval(java.lang.String outputConnectionName,
                                      java.lang.String identifierClass,
                                      java.lang.String identifierHash)
                               throws ManifoldCFException
Calculate the average time interval between changes for a document. This is based on the data gathered for the document.

Specified by:
getDocumentUpdateInterval in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
identifierClass - is the name of the space in which the identifier hash should be interpreted.
identifierHash - is the hash of the id of the document.
Returns:
the number of milliseconds between changes, or 0 if this cannot be calculated.
Throws:
ManifoldCFException

getDocumentUpdateIntervalMultiple

public long[] getDocumentUpdateIntervalMultiple(java.lang.String outputConnectionName,
                                                java.lang.String[] identifierClasses,
                                                java.lang.String[] identifierHashes)
                                         throws ManifoldCFException
Calculate the average time interval between changes for a document. This is based on the data gathered for the document.

Specified by:
getDocumentUpdateIntervalMultiple in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.
identifierHashes - is the hashes of the ids of the documents.
Returns:
the number of milliseconds between changes, or 0 if this cannot be calculated.
Throws:
ManifoldCFException

getIntervals

protected void getIntervals(long[] rval,
                            java.lang.String outputConnectionName,
                            java.util.ArrayList list,
                            java.lang.String queryPart,
                            java.util.HashMap returnMap)
                     throws ManifoldCFException
Query for and calculate the interval for a bunch of hashcodes.

Parameters:
rval - is the array to stuff calculated return values into.
list - is the list of parameters.
queryPart - is the part of the query pertaining to the list of hashcodes
returnMap - is a mapping from document id to rval index.
Throws:
ManifoldCFException

resetOutputConnection

public void resetOutputConnection(java.lang.String outputConnectionName)
                           throws ManifoldCFException
Reset all documents belonging to a specific output connection, because we've got information that that system has been reconfigured. This will force all such documents to be reindexed the next time they are checked.

Specified by:
resetOutputConnection in interface IIncrementalIngester
Parameters:
outputConnectionName - is the name of the output connection associated with this action.
Throws:
ManifoldCFException

noteDocumentIngest

protected void noteDocumentIngest(java.lang.String outputConnectionName,
                                  java.lang.String docKey,
                                  java.lang.String documentVersion,
                                  java.lang.String outputVersion,
                                  java.lang.String authorityNameString,
                                  long ingestTime,
                                  java.lang.String documentURI,
                                  java.lang.String documentURIHash)
                           throws ManifoldCFException
Note the ingestion of a document, or the "update" of a document.

Parameters:
outputConnectionName - is the name of the output connection.
docKey - is the key string describing the document.
documentVersion - is a string describing the new version of the document.
outputVersion - is the version string calculated for the output connection.
authorityNameString - is the name of the relevant authority connection.
ingestTime - is the time at which the ingestion took place, in milliseconds since epoch.
documentURI - is the uri the document can be accessed at, or null (which signals that we are to record the version, but no ingestion took place).
documentURIHash - is the hash of the document uri.
Throws:
ManifoldCFException

getDocumentURIChunk

protected void getDocumentURIChunk(IncrementalIngester.DeleteInfo[] rval,
                                   java.util.Map map,
                                   java.lang.String outputConnectionName,
                                   java.lang.String clause,
                                   java.util.ArrayList list)
                            throws ManifoldCFException
Get a chunk of document uris.

Parameters:
rval - is the string array where the uris should be put.
map - is the map from id to index.
clause - is the in clause for the query.
list - is the parameter list for the query.
Throws:
ManifoldCFException

getDocumentIngestDataChunk

protected void getDocumentIngestDataChunk(DocumentIngestStatus[] rval,
                                          java.util.Map map,
                                          java.lang.String outputConnectionName,
                                          java.lang.String clause,
                                          java.util.ArrayList list)
                                   throws ManifoldCFException
Get a chunk of document ingest data records.

Parameters:
rval - is the document ingest status array where the data should be put.
map - is the map from id to index.
clause - is the in clause for the query.
list - is the parameter list for the query.
Throws:
ManifoldCFException

addOrReplaceDocument

protected int addOrReplaceDocument(IOutputConnection connection,
                                   java.lang.String documentURI,
                                   java.lang.String outputDescription,
                                   RepositoryDocument document,
                                   java.lang.String authorityNameString,
                                   IOutputAddActivity activities)
                            throws ManifoldCFException,
                                   ServiceInterruption
Add or replace document, using the specified output connection, via the standard pool.

Throws:
ManifoldCFException
ServiceInterruption

removeDocument

protected void removeDocument(IOutputConnection connection,
                              java.lang.String documentURI,
                              java.lang.String outputDescription,
                              IOutputRemoveActivity activities)
                       throws ManifoldCFException,
                              ServiceInterruption
Remove document, using the specified output connection, via the standard pool.

Throws:
ManifoldCFException
ServiceInterruption

makeKey

protected static java.lang.String makeKey(java.lang.String documentClass,
                                          java.lang.String documentHash)
Make a key from a document class and a hash