|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
public interface IIncrementalIngester
This interface describes the incremental ingestion API. SOME NOTES: The expected client flow for this API is to: 1) Use the API to fetch a document's version. 2) Base a decision whether to ingest based on that version. 3) If the decision to ingest occurs, then the ingest method in the API is called. The module described by this interface is responsible for keeping track of what has been sent where, and also the corresponding version of each document so indexed. The space over which this takes place is defined by the individual output connection - that is, the output connection seems to "remember" what documents were handed to it. A secondary purpose of this module is to provide a mapping between the key by which a document is described internally (by an identifier hash, plus the name of an identifier space), and the way the document is identified in the output space (by the name of an output connection, plus a URI which is considered local to that output connection space).
| Field Summary | |
|---|---|
static java.lang.String |
_rcsid
|
| Method Summary | |
|---|---|
boolean |
checkDocumentIndexable(java.lang.String outputConnectionName,
java.io.File localFile)
Check if a file is indexable. |
boolean |
checkMimeTypeIndexable(java.lang.String outputConnectionName,
java.lang.String mimeType)
Check if a mime type is indexable. |
void |
clearAll()
Flush all knowledge of what was ingested before. |
void |
deinstall()
Uninstall the incremental ingestion manager. |
void |
documentCheck(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash,
long checkTime)
Note the fact that we checked a document (and found that it did not need to be ingested, because the versions agreed). |
void |
documentCheckMultiple(java.lang.String outputConnectionName,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes,
long checkTime)
Note the fact that we checked a document (and found that it did not need to be ingested, because the versions agreed). |
void |
documentDelete(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash,
IOutputRemoveActivity activities)
Delete a document from the search engine index. |
void |
documentDeleteMultiple(java.lang.String[] outputConnectionNames,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes,
IOutputRemoveActivity activities)
Delete multiple documents from the search engine index. |
void |
documentDeleteMultiple(java.lang.String outputConnectionName,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes,
IOutputRemoveActivity activities)
Delete multiple documents from the search engine index. |
boolean |
documentIngest(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash,
java.lang.String documentVersion,
java.lang.String outputVersion,
java.lang.String authorityName,
RepositoryDocument data,
long ingestTime,
java.lang.String documentURI,
IOutputActivity activities)
Ingest a document. |
void |
documentRecord(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash,
java.lang.String documentVersion,
long recordTime,
IOutputActivity activities)
Record a document version, but don't ingest it. |
DocumentIngestStatus |
getDocumentIngestData(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash)
Look up ingestion data for a documents. |
DocumentIngestStatus[] |
getDocumentIngestDataMultiple(java.lang.String[] outputConnectionNames,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes)
Look up ingestion data for a SET of documents. |
DocumentIngestStatus[] |
getDocumentIngestDataMultiple(java.lang.String outputConnectionName,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes)
Look up ingestion data for a SET of documents. |
long |
getDocumentUpdateInterval(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash)
Calculate the average time interval between changes for a document. |
long[] |
getDocumentUpdateIntervalMultiple(java.lang.String outputConnectionName,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes)
Calculate the average time interval between changes for a document. |
void |
install()
Install the incremental ingestion manager. |
void |
resetOutputConnection(java.lang.String outputConnectionName)
Reset all documents belonging to a specific output connection, because we've got information that that system has been reconfigured. |
| Field Detail |
|---|
static final java.lang.String _rcsid
| Method Detail |
|---|
void install()
throws ManifoldCFException
ManifoldCFException
void deinstall()
throws ManifoldCFException
ManifoldCFException
void clearAll()
throws ManifoldCFException
ManifoldCFException
boolean checkMimeTypeIndexable(java.lang.String outputConnectionName,
java.lang.String mimeType)
throws ManifoldCFException,
ServiceInterruption
outputConnectionName - is the name of the output connection associated with this action.mimeType - is the mime type to check.
ManifoldCFException
ServiceInterruption
boolean checkDocumentIndexable(java.lang.String outputConnectionName,
java.io.File localFile)
throws ManifoldCFException,
ServiceInterruption
outputConnectionName - is the name of the output connection associated with this action.localFile - is the local file to check.
ManifoldCFException
ServiceInterruption
void documentRecord(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash,
java.lang.String documentVersion,
long recordTime,
IOutputActivity activities)
throws ManifoldCFException,
ServiceInterruption
outputConnectionName - is the name of the output connection associated with this action.identifierClass - is the name of the space in which the identifier hash should be interpreted.identifierHash - is the hashed document identifier.documentVersion - is the document version.recordTime - is the time at which the recording took place, in milliseconds since epoch.activities - is the object used in case a document needs to be removed from the output index as the result of this operation.
ManifoldCFException
ServiceInterruption
boolean documentIngest(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash,
java.lang.String documentVersion,
java.lang.String outputVersion,
java.lang.String authorityName,
RepositoryDocument data,
long ingestTime,
java.lang.String documentURI,
IOutputActivity activities)
throws ManifoldCFException,
ServiceInterruption
outputConnectionName - is the name of the output connection associated with this action.identifierClass - is the name of the space in which the identifier hash should be interpreted.identifierHash - is the hashed document identifier.documentVersion - is the document version.outputVersion - is the output version string constructed from the output specification by the output connector.authorityName - is the name of the authority associated with the document, if any.data - is the document data. The data is closed after ingestion is complete.ingestTime - is the time at which the ingestion took place, in milliseconds since epoch.documentURI - is the URI of the document, which will be used as the key of the document in the index.activities - is an object providing a set of methods that the implementer can use to perform the operation.
ManifoldCFException
ServiceInterruption
void documentCheckMultiple(java.lang.String outputConnectionName,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes,
long checkTime)
throws ManifoldCFException
outputConnectionName - is the name of the output connection associated with this action.identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.identifierHashes - are the set of document identifier hashes.checkTime - is the time at which the check took place, in milliseconds since epoch.
ManifoldCFException
void documentCheck(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash,
long checkTime)
throws ManifoldCFException
outputConnectionName - is the name of the output connection associated with this action.identifierClass - is the name of the space in which the identifier hash should be interpreted.identifierHash - is the hashed document identifier.checkTime - is the time at which the check took place, in milliseconds since epoch.
ManifoldCFException
void documentDeleteMultiple(java.lang.String[] outputConnectionNames,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes,
IOutputRemoveActivity activities)
throws ManifoldCFException,
ServiceInterruption
outputConnectionNames - are the names of the output connections associated with this action.identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.identifierHashes - is tha array of document identifier hashes if the documents.activities - is the object to use to log the details of the ingestion attempt. May be null.
ManifoldCFException
ServiceInterruption
void documentDeleteMultiple(java.lang.String outputConnectionName,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes,
IOutputRemoveActivity activities)
throws ManifoldCFException,
ServiceInterruption
outputConnectionName - is the name of the output connection associated with this action.identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.identifierHashes - is tha array of document identifier hashes if the documents.activities - is the object to use to log the details of the ingestion attempt. May be null.
ManifoldCFException
ServiceInterruption
void documentDelete(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash,
IOutputRemoveActivity activities)
throws ManifoldCFException,
ServiceInterruption
outputConnectionName - is the name of the output connection associated with this action.identifierClass - is the name of the space in which the identifier hash should be interpreted.identifierHash - is the hash of the id of the document.activities - is the object to use to log the details of the ingestion attempt. May be null.
ManifoldCFException
ServiceInterruption
DocumentIngestStatus[] getDocumentIngestDataMultiple(java.lang.String[] outputConnectionNames,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes)
throws ManifoldCFException
outputConnectionNames - are the names of the output connections associated with this action.identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.identifierHashes - is the array of document identifier hashes to look up.
ManifoldCFException
DocumentIngestStatus[] getDocumentIngestDataMultiple(java.lang.String outputConnectionName,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes)
throws ManifoldCFException
outputConnectionName - is the names of the output connection associated with this action.identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.identifierHashes - is the array of document identifier hashes to look up.
ManifoldCFException
DocumentIngestStatus getDocumentIngestData(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash)
throws ManifoldCFException
outputConnectionName - is the name of the output connection associated with this action.identifierClass - is the name of the space in which the identifier hash should be interpreted.identifierHash - is the hash of the id of the document.
ManifoldCFException
long[] getDocumentUpdateIntervalMultiple(java.lang.String outputConnectionName,
java.lang.String[] identifierClasses,
java.lang.String[] identifierHashes)
throws ManifoldCFException
outputConnectionName - is the name of the output connection associated with this action.identifierClasses - are the names of the spaces in which the identifier hashes should be interpreted.identifierHashes - is the hashes of the ids of the documents.
ManifoldCFException
long getDocumentUpdateInterval(java.lang.String outputConnectionName,
java.lang.String identifierClass,
java.lang.String identifierHash)
throws ManifoldCFException
outputConnectionName - is the name of the output connection associated with this action.identifierClass - is the name of the space in which the identifier hash should be interpreted.identifierHash - is the hash of the id of the document.
ManifoldCFException
void resetOutputConnection(java.lang.String outputConnectionName)
throws ManifoldCFException
outputConnectionName - is the name of the output connection associated with this action.
ManifoldCFException
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||