org.apache.manifoldcf.crawler.connectors.webcrawler
Class DataCache

java.lang.Object
  extended by org.apache.manifoldcf.crawler.connectors.webcrawler.DataCache

public class DataCache
extends java.lang.Object

This class is a cache of a specific URL's data. It's fetched early and kept, so that (1) an accurate data length can be found, and (2) we can compute a version checksum.


Nested Class Summary
protected static class DataCache.DocumentData
          This class represents everything we need to know about a document that's getting passed from the getDocumentVersions() phase to the processDocuments() phase.
 
Field Summary
static java.lang.String _rcsid
           
protected  java.util.HashMap cacheData
           
 
Constructor Summary
DataCache()
          Constructor.
 
Method Summary
 java.lang.String addData(org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities, java.lang.String documentIdentifier, IThrottledConnection connection)
          Add a data entry into the cache.
 void deleteData(java.lang.String documentIdentifier)
          Delete specified item of data.
 java.lang.String getContentType(java.lang.String documentIdentifier)
          Get the content type.
 java.io.InputStream getData(java.lang.String documentIdentifier)
          Fetch binary data entry from the cache.
 long getDataLength(java.lang.String documentIdentifier)
          Fetch binary data length.
 java.lang.String getReferralURI(java.lang.String documentIdentifier)
          Get the referral URI.
 int getResponseCode(java.lang.String documentIdentifier)
          Get the response code.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_rcsid

public static final java.lang.String _rcsid
See Also:
Constant Field Values

cacheData

protected java.util.HashMap cacheData
Constructor Detail

DataCache

public DataCache()
Constructor.

Method Detail

addData

public java.lang.String addData(org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities,
                                java.lang.String documentIdentifier,
                                IThrottledConnection connection)
                         throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Add a data entry into the cache. This method is called whenever the data from a fetch is considered interesting or useful, and will be thus passed on from getDocumentVersions() to the processDocuments() phase. At the moment that's usually a 200 or a 302 response.

Parameters:
documentIdentifier - is the document identifier (url).
connection - is the connection, upon which a fetch has been done that needs to be cached.
Returns:
a "checksum" value, to use as a version string.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseCode

public int getResponseCode(java.lang.String documentIdentifier)
Get the response code.

Parameters:
documentIdentifier - is the document identifier.
Returns:
the code.

getContentType

public java.lang.String getContentType(java.lang.String documentIdentifier)
Get the content type.

Parameters:
documentIdentifier - is the document identifier.
Returns:
the content type, or null if there is none.

getReferralURI

public java.lang.String getReferralURI(java.lang.String documentIdentifier)
Get the referral URI.

Parameters:
documentIdentifier - is the document identifier.
Returns:
the referral URI, or null if none.

getDataLength

public long getDataLength(java.lang.String documentIdentifier)
Fetch binary data length.

Parameters:
documentIdentifier - is the document identifier.
Returns:
the length.

getData

public java.io.InputStream getData(java.lang.String documentIdentifier)
                            throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Fetch binary data entry from the cache.

Parameters:
documentIdentifier - is the document identifier (url).
Returns:
a binary data stream.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

deleteData

public void deleteData(java.lang.String documentIdentifier)
Delete specified item of data.

Parameters:
documentIdentifier - is the document identifier (url).