org.apache.manifoldcf.crawler.interfaces
Interface IEventActivity

All Superinterfaces:
INamingActivity
All Known Subinterfaces:
IProcessActivity, IVersionActivity
All Known Implementing Classes:
WorkerThread.ProcessActivity, WorkerThread.VersionActivity

public interface IEventActivity
extends INamingActivity

This interface abstracts from the activities that use and govern events. The purpose of this model is to allow a connector to: (a) insure that documents whose prerequisites have not been met do not get processed until those prerequisites are completed (b) guarantee that only one thread at a time deal with sequencing of documents The way it works is as follows. We define the notion of an "event", which is described by a simple string (and thus can be global, local to a connection, or local to a job, whichever is appropriate). An event is managed solely by the connector that knows about it. Effectively it can be in either of two states: "completed", or "pending". The only time the framework ever changes an event state is when the crawler is restarted, at which point all pending events are marked "completed". Documents, when they are added to the processing queue, specify the set of events on which they will block. If an event is in the "pending" state, no documents that block on that event will be processed at that time. Of course, it is possible that a document could be handed to processing just before an event entered the "pending" state - in which case it is the responsibility of the connector itself to avoid any problems or conflicts. This can usually be handled by proper handling of event signalling. More on that later. The presumed underlying model of flow inside the connector's processing method is as follows: (1) The connector examines the document in question, and decides whether it can be processed successfully or not, based on what it knows about sequencing (2) If the connector determines that the document can properly be processed, it does so, and that's it. (3) If the connector finds a sequencing-related problem, it: (a) Begins an appropriate event sequence. (b) If the framework indicates that this event is already in the "pending" state, then some other thread is already handling the event, and the connector should abort processing of the current document. (c) If the framework successfully begins the event sequence, then the connector code knows unequivocably that it is the only thread processing the event. It should take whatever action it needs to - which might be requesting special documents, for instance. [Note well: At this time, there is no way to guarantee that special documents added to the queue are in fact properly synchronized by this mechanism, so I recommend avoiding this practice, and instead handling any special document sequences without involving the queue.] (d) If the connector CANNOT successfully take the action it needs to to push the sequence along, it MUST set the event back to the "completed" state. Otherwise, the event will remain in the "pending" state until the next time the crawler is restarted. (e) If the current document cannot yet be processed, its processing should be aborted. (4) When the connector determines that the event's conditions have been met, or when it determines that an event sequence is no longer viable and has been aborted, it must set the event status to "completed". In summary, a connector may perform the following event-related actions: (a) Set an event into the "pending" state (b) Set an event into the "completed" state (c) Add a document to the queue with a specified set of prerequisite events attached (d) Request that the current document be requeued for later processing (i.e. abort processing of a document due to sequencing reasons)


Field Summary
static java.lang.String _rcsid
           
 
Method Summary
 boolean beginEventSequence(java.lang.String eventName)
          Begin an event sequence.
 void completeEventSequence(java.lang.String eventName)
          Complete an event sequence.
 void retryDocumentProcessing(java.lang.String localIdentifier)
          Abort processing a document (for sequencing reasons).
 
Methods inherited from interface org.apache.manifoldcf.crawler.interfaces.INamingActivity
createConnectionSpecificString, createGlobalString, createJobSpecificString
 

Field Detail

_rcsid

static final java.lang.String _rcsid
See Also:
Constant Field Values
Method Detail

beginEventSequence

boolean beginEventSequence(java.lang.String eventName)
                           throws ManifoldCFException
Begin an event sequence. This method should be called by a connector when a sequencing event should enter the "pending" state. If the event is already in that state, this method will return false, otherwise true. The connector has the responsibility of appropriately managing sequencing given the response status.

Parameters:
eventName - is the event name.
Returns:
false if the event is already in the "pending" state.
Throws:
ManifoldCFException

completeEventSequence

void completeEventSequence(java.lang.String eventName)
                           throws ManifoldCFException
Complete an event sequence. This method should be called to signal that an event is no longer in the "pending" state. This can mean that the prerequisite processing is completed, but it can also mean that prerequisite processing was aborted or cannot be completed. Note well: This method should not be called unless the connector is CERTAIN that an event is in progress, and that the current thread has the sole right to complete it. Otherwise, race conditions can develop which would be difficult to diagnose.

Parameters:
eventName - is the event name.
Throws:
ManifoldCFException

retryDocumentProcessing

void retryDocumentProcessing(java.lang.String localIdentifier)
                             throws ManifoldCFException
Abort processing a document (for sequencing reasons). This method should be called in order to cause the specified document to be requeued for later processing. While this is similar in some respects to the semantics of a ServiceInterruption, it is applicable to only one document at a time, and also does not specify any delay period, since it is presumed that the reason for the requeue is because of sequencing issues synchronized around an underlying event.

Parameters:
localIdentifier - is the document identifier to requeue
Throws:
ManifoldCFException