org.apache.manifoldcf.crawler.jobs
Class HopCount

java.lang.Object
  extended by org.apache.manifoldcf.core.database.BaseTable
      extended by org.apache.manifoldcf.crawler.jobs.HopCount

public class HopCount
extends BaseTable

This class manages the table that keeps track of hop count, and algorithmically determines this value for a document identifier upon request.


Nested Class Summary
protected static class HopCount.Answer
          This class represents an answer - which consists both of an answer value, and also the dependencies of that answer (i.e.
protected  class HopCount.DocumentHash
          The Document Hash structure contains the document nodes we are interested in, including those we need answers for to proceed.
protected static class HopCount.DocumentNode
          This class keeps track of the data associated with a node in the hash map.
protected static class HopCount.DocumentReference
          This class describes a document reference.
protected static class HopCount.NodeQueue
          A queue object allows document nodes to be ordered appropriately for the most efficient execution.
protected static class HopCount.NodeReference
          This class describes a node link reference.
protected static class HopCount.Question
          A class describing a document identifier and a link type, to be used in looking up the appropriate node in the hash.
 
Field Summary
static java.lang.String _rcsid
           
static int ANSWER_INFINITY
           
static int ANSWER_UNKNOWN
           
protected  HopDeleteDeps deleteDepsManager
          Hop "delete" dependencies manager
static java.lang.String distanceField
           
static java.lang.String idField
           
protected  IntrinsicLink intrinsicLinkManager
          Intrinsic link table manager.
static java.lang.String jobIDField
           
static java.lang.String linkTypeField
           
static int MARK_DELETING
           
static int MARK_NORMAL
           
static int MARK_QUEUED
           
static java.lang.String markForDeathField
           
protected static java.util.Map markMap
           
static java.lang.String parentIDHashField
           
protected  IThreadContext threadContext
          Thread context
 
Fields inherited from class org.apache.manifoldcf.core.database.BaseTable
dbInterface, tableName
 
Constructor Summary
HopCount(IThreadContext tc, IDBInterface database)
          Constructor.
 
Method Summary
protected  void addToProcessingQueue(java.lang.Long jobID, java.lang.String[] affectedLinkTypes, java.lang.String[] documentIDHashes, HopCount.Answer[] startingAnswers, java.lang.String sourceDocumentIDHash, java.lang.String linkType, int hopcountMethod)
          Add documents to the processing queue.
 void deinstall()
          Uninstall.
 void deleteDocumentIdentifiers(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod)
          Remove a set of document identifier hashes.
 void deleteMatchingDocuments(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceTableName, java.lang.String sourceTableIDColumn, java.lang.String sourceTableJobColumn, java.lang.String sourceTableCriteria, java.util.ArrayList sourceTableParams, int hopcountMethod)
          Remove a set of document identifiers specified as a criteria.
 void deleteOwner(java.lang.Long jobID)
          Delete an owner (and clean up the corresponding hopcount rows).
protected  void doDeleteInvalidation(java.lang.Long jobID, java.lang.String[] legalLinkTypes, boolean existingOnly, java.lang.String[] sourceDocumentHashes, java.lang.String sourceTableName, java.lang.String sourceTableIDColumn, java.lang.String sourceTableJobColumn, java.lang.String sourceTableCriteria, java.util.ArrayList sourceTableParams)
          Invalidate links meeting a simple criteria which have a given set of source documents.
protected  void doFinish(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod)
          Method that does the work of "finishing" a set of child references.
protected  void doRecord(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod)
          Do the work of recording source-target references.
 int[] findHopCounts(java.lang.Long jobID, java.lang.String[] parentIdentifierHashes, java.lang.String linkType)
          Calculate a bunch of hop-counts.
 void finishParents(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] sourceDocumentHashes, int hopcountMethod)
          Complete a recalculation pass for a set of source documents.
 void finishSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod)
          Finish seed references.
protected  IResultSet getDocumentChildren(java.lang.Long jobID, java.lang.String documentIDHash)
          Get document's children.
 void install(java.lang.String jobsTable, java.lang.String jobsColumn)
          Install or upgrade.
protected  void markForDelete(java.lang.String query, java.util.ArrayList list, java.lang.String commonNewExpression, java.util.ArrayList commonNewList)
           
static java.lang.String markToString(int mark)
          Go from mark to string.
protected  void performFindMissingRecords(java.lang.String query, java.util.ArrayList list, java.util.Map matchMap)
          Limited find for missing records.
protected  void performGetCachedDistanceDeps(java.util.Map depsMap, java.lang.String query, java.util.ArrayList list)
          Do a limited fetch of cached distance dependencies
protected  void performGetCachedDistances(HopCount.DocumentNode[] rval, java.util.Map indexMap, java.util.Map depsMap, java.lang.String query, java.util.ArrayList list)
          Do a limited fetch of cached distances
protected  void performMarkAddDeps(java.lang.String query, java.util.ArrayList list)
          Do the work of marking add-dep-dependent links in the hopcount table.
protected  void processFind(int[] rval, java.util.Map rvalMap, java.lang.String query, java.util.ArrayList list)
          Process a portion of a find request for hopcount information.
 boolean processQueue(java.lang.Long jobID, java.lang.String[] legalLinkTypes, int hopcountMethod)
          Process a stage of the propagation queue for a job.
protected  HopCount.DocumentNode[] readCachedNodes(java.lang.Long jobID, HopCount.Question[] unansweredQuestions)
          Find the cached distance from a set of identifiers to the root.
 void recordReference(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String targetDocumentIDHash, java.lang.String linkType, int hopcountMethod)
          Record a reference from source to target.
 void recordReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String sourceDocumentIDHash, java.lang.String[] targetDocumentIDHashes, java.lang.String linkType, int hopcountMethod)
          Record a set of references from source to target.
 void recordSeedReferences(java.lang.Long jobID, java.lang.String[] legalLinkTypes, java.lang.String[] targetDocumentIDHashes, int hopcountMethod)
          Record a references from a set of documents to the root.
 void reset()
          Reset, at startup time.
static int stringToMark(java.lang.String value)
          Go from string to mark.
protected  void writeCachedDistance(java.lang.Long jobID, java.lang.String[] legalLinkTypes, HopCount.DocumentNode dn, int hopcountMethod)
          Write a distance into the cache.
 
Methods inherited from class org.apache.manifoldcf.core.database.BaseTable
addTableIndex, analyzeTable, beginTransaction, constructDistinctOnClause, constructOffsetLimitClause, constructRegexpClause, constructSubstringClause, endTransaction, getDatabaseCacheKey, getDBInterface, getMaxInClause, getMaxOrClause, getTableIndexes, getTableName, getTableSchema, getTransactionID, makeTableKey, noteModifications, performAddIndex, performAlter, performCreate, performDelete, performDrop, performInsert, performLock, performModification, performQuery, performQuery, performRemoveIndex, performUpdate, prepareRowForSave, readRow, reindexTable, signalRollback
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_rcsid

public static final java.lang.String _rcsid
See Also:
Constant Field Values

ANSWER_UNKNOWN

public static final int ANSWER_UNKNOWN
See Also:
Constant Field Values

ANSWER_INFINITY

public static final int ANSWER_INFINITY
See Also:
Constant Field Values

idField

public static final java.lang.String idField
See Also:
Constant Field Values

jobIDField

public static final java.lang.String jobIDField
See Also:
Constant Field Values

linkTypeField

public static final java.lang.String linkTypeField
See Also:
Constant Field Values

parentIDHashField

public static final java.lang.String parentIDHashField
See Also:
Constant Field Values

distanceField

public static final java.lang.String distanceField
See Also:
Constant Field Values

markForDeathField

public static final java.lang.String markForDeathField
See Also:
Constant Field Values

MARK_NORMAL

public static final int MARK_NORMAL
See Also:
Constant Field Values

MARK_QUEUED

public static final int MARK_QUEUED
See Also:
Constant Field Values

MARK_DELETING

public static final int MARK_DELETING
See Also:
Constant Field Values

markMap

protected static java.util.Map markMap

intrinsicLinkManager

protected IntrinsicLink intrinsicLinkManager
Intrinsic link table manager.


deleteDepsManager

protected HopDeleteDeps deleteDepsManager
Hop "delete" dependencies manager


threadContext

protected IThreadContext threadContext
Thread context

Constructor Detail

HopCount

public HopCount(IThreadContext tc,
                IDBInterface database)
         throws ManifoldCFException
Constructor.

Parameters:
database - is the database handle.
Throws:
ManifoldCFException
Method Detail

install

public void install(java.lang.String jobsTable,
                    java.lang.String jobsColumn)
             throws ManifoldCFException
Install or upgrade.

Throws:
ManifoldCFException

deinstall

public void deinstall()
               throws ManifoldCFException
Uninstall.

Throws:
ManifoldCFException

stringToMark

public static int stringToMark(java.lang.String value)
                        throws ManifoldCFException
Go from string to mark.

Parameters:
value - is the string.
Returns:
the status value.
Throws:
ManifoldCFException

markToString

public static java.lang.String markToString(int mark)
                                     throws ManifoldCFException
Go from mark to string.

Parameters:
mark - is the mark.
Returns:
the string.
Throws:
ManifoldCFException

deleteOwner

public void deleteOwner(java.lang.Long jobID)
                 throws ManifoldCFException
Delete an owner (and clean up the corresponding hopcount rows).

Throws:
ManifoldCFException

reset

public void reset()
           throws ManifoldCFException
Reset, at startup time.

Throws:
ManifoldCFException

recordSeedReferences

public void recordSeedReferences(java.lang.Long jobID,
                                 java.lang.String[] legalLinkTypes,
                                 java.lang.String[] targetDocumentIDHashes,
                                 int hopcountMethod)
                          throws ManifoldCFException
Record a references from a set of documents to the root. These will be marked as "new" or "existing", and will have a null linktype.

Throws:
ManifoldCFException

finishSeedReferences

public void finishSeedReferences(java.lang.Long jobID,
                                 java.lang.String[] legalLinkTypes,
                                 int hopcountMethod)
                          throws ManifoldCFException
Finish seed references. Seed references are special in that the only source is the root.

Throws:
ManifoldCFException

recordReference

public void recordReference(java.lang.Long jobID,
                            java.lang.String[] legalLinkTypes,
                            java.lang.String sourceDocumentIDHash,
                            java.lang.String targetDocumentIDHash,
                            java.lang.String linkType,
                            int hopcountMethod)
                     throws ManifoldCFException
Record a reference from source to target. This reference will be marked as "new" or "existing".

Throws:
ManifoldCFException

recordReferences

public void recordReferences(java.lang.Long jobID,
                             java.lang.String[] legalLinkTypes,
                             java.lang.String sourceDocumentIDHash,
                             java.lang.String[] targetDocumentIDHashes,
                             java.lang.String linkType,
                             int hopcountMethod)
                      throws ManifoldCFException
Record a set of references from source to target. This reference will be marked as "new" or "existing".

Throws:
ManifoldCFException

finishParents

public void finishParents(java.lang.Long jobID,
                          java.lang.String[] legalLinkTypes,
                          java.lang.String[] sourceDocumentHashes,
                          int hopcountMethod)
                   throws ManifoldCFException
Complete a recalculation pass for a set of source documents. All child links that are not marked as "new" or "existing" will be removed. At the completion of this pass, the links will have their "new" flag cleared.

Throws:
ManifoldCFException

doRecord

protected void doRecord(java.lang.Long jobID,
                        java.lang.String[] legalLinkTypes,
                        java.lang.String sourceDocumentIDHash,
                        java.lang.String[] targetDocumentIDHashes,
                        java.lang.String linkType,
                        int hopcountMethod)
                 throws ManifoldCFException
Do the work of recording source-target references.

Throws:
ManifoldCFException

deleteMatchingDocuments

public void deleteMatchingDocuments(java.lang.Long jobID,
                                    java.lang.String[] legalLinkTypes,
                                    java.lang.String sourceTableName,
                                    java.lang.String sourceTableIDColumn,
                                    java.lang.String sourceTableJobColumn,
                                    java.lang.String sourceTableCriteria,
                                    java.util.ArrayList sourceTableParams,
                                    int hopcountMethod)
                             throws ManifoldCFException
Remove a set of document identifiers specified as a criteria. This will remove hopcount rows and also intrinsic links that have the specified document identifiers as sources.

Throws:
ManifoldCFException

deleteDocumentIdentifiers

public void deleteDocumentIdentifiers(java.lang.Long jobID,
                                      java.lang.String[] legalLinkTypes,
                                      java.lang.String[] sourceDocumentHashes,
                                      int hopcountMethod)
                               throws ManifoldCFException
Remove a set of document identifier hashes. This will also remove the intrinsic links that have these document identifier hashes as sources, as well as invalidating cached hop counts that depend on them.

Throws:
ManifoldCFException

findHopCounts

public int[] findHopCounts(java.lang.Long jobID,
                           java.lang.String[] parentIdentifierHashes,
                           java.lang.String linkType)
                    throws ManifoldCFException
Calculate a bunch of hop-counts. The values returned are only guaranteed to be an upper bound, unless the queue has recently been processed (via processQueue below). -1 will be returned to indicate "infinity".

Throws:
ManifoldCFException

processFind

protected void processFind(int[] rval,
                           java.util.Map rvalMap,
                           java.lang.String query,
                           java.util.ArrayList list)
                    throws ManifoldCFException
Process a portion of a find request for hopcount information.

Throws:
ManifoldCFException

processQueue

public boolean processQueue(java.lang.Long jobID,
                            java.lang.String[] legalLinkTypes,
                            int hopcountMethod)
                     throws ManifoldCFException
Process a stage of the propagation queue for a job.

Parameters:
jobID - is the job we need to have the hopcount propagated for.
Returns:
true if the queue is empty.
Throws:
ManifoldCFException

performFindMissingRecords

protected void performFindMissingRecords(java.lang.String query,
                                         java.util.ArrayList list,
                                         java.util.Map matchMap)
                                  throws ManifoldCFException
Limited find for missing records.

Throws:
ManifoldCFException

addToProcessingQueue

protected void addToProcessingQueue(java.lang.Long jobID,
                                    java.lang.String[] affectedLinkTypes,
                                    java.lang.String[] documentIDHashes,
                                    HopCount.Answer[] startingAnswers,
                                    java.lang.String sourceDocumentIDHash,
                                    java.lang.String linkType,
                                    int hopcountMethod)
                             throws ManifoldCFException
Add documents to the processing queue. For the supplied bunch of link types and document ids, the corresponding hopcount records will be marked as being queued. If, for example, the affected link types are 'link' and 'redirect', and the specified document id's are 'A' and 'B' and 'C', then six hopcount rows will be created and/or queued. The values that this code uses for initial distance or delete dependencies for each of the hopcount rows combinatorially described above are calculated by this method by starting with the passed-in hopcount values and dependencies for each of the affectedLinkTypes for the specified "source" document. The result estimates are then generated by passing these values and dependencies over the links to the target document identifiers, presuming that the link is of the supplied link type.

Parameters:
jobID - is the job the documents belong to.
affectedLinkTypes - are the set of affected link types.
documentIDHashes - are the documents to add.
startingAnswers - are the hopcounts for the documents as they are currently known.
sourceDocumentIDHash - is the source document identifier for the links from source to target documents.
linkType - is the link type for this queue addition.
hopcountMethod - is the desired method of managing hopcounts.
Throws:
ManifoldCFException

performMarkAddDeps

protected void performMarkAddDeps(java.lang.String query,
                                  java.util.ArrayList list)
                           throws ManifoldCFException
Do the work of marking add-dep-dependent links in the hopcount table.

Throws:
ManifoldCFException

doFinish

protected void doFinish(java.lang.Long jobID,
                        java.lang.String[] legalLinkTypes,
                        java.lang.String[] sourceDocumentHashes,
                        int hopcountMethod)
                 throws ManifoldCFException
Method that does the work of "finishing" a set of child references.

Throws:
ManifoldCFException

doDeleteInvalidation

protected void doDeleteInvalidation(java.lang.Long jobID,
                                    java.lang.String[] legalLinkTypes,
                                    boolean existingOnly,
                                    java.lang.String[] sourceDocumentHashes,
                                    java.lang.String sourceTableName,
                                    java.lang.String sourceTableIDColumn,
                                    java.lang.String sourceTableJobColumn,
                                    java.lang.String sourceTableCriteria,
                                    java.util.ArrayList sourceTableParams)
                             throws ManifoldCFException
Invalidate links meeting a simple criteria which have a given set of source documents. This also runs a queue which is initialized with all the documents that have sources that exist in the hopcount table. The purpose of that queue is to re-establish non-infinite values for all nodes that are described in IntrinsicLinks, that are still connected to the root.

Throws:
ManifoldCFException

markForDelete

protected void markForDelete(java.lang.String query,
                             java.util.ArrayList list,
                             java.lang.String commonNewExpression,
                             java.util.ArrayList commonNewList)
                      throws ManifoldCFException
Throws:
ManifoldCFException

getDocumentChildren

protected IResultSet getDocumentChildren(java.lang.Long jobID,
                                         java.lang.String documentIDHash)
                                  throws ManifoldCFException
Get document's children.

Returns:
rows that contain the children. Column names are 'linktype','childidentifier'.
Throws:
ManifoldCFException

readCachedNodes

protected HopCount.DocumentNode[] readCachedNodes(java.lang.Long jobID,
                                                  HopCount.Question[] unansweredQuestions)
                                           throws ManifoldCFException
Find the cached distance from a set of identifiers to the root. This is tricky, because if there is a queue assessment going on, some values are not valid. In general, one would treat a missing record as meaning "infinity". But if the missing record is simply invalidated at the moment, we want it to be treated as "missing". So... we pick up the record despite it potentially being marked, and we then examine the mark to figure out what to do.

Returns:
the corresponding list of nodes, taking into account unknown distances.
Throws:
ManifoldCFException

performGetCachedDistanceDeps

protected void performGetCachedDistanceDeps(java.util.Map depsMap,
                                            java.lang.String query,
                                            java.util.ArrayList list)
                                     throws ManifoldCFException
Do a limited fetch of cached distance dependencies

Throws:
ManifoldCFException

performGetCachedDistances

protected void performGetCachedDistances(HopCount.DocumentNode[] rval,
                                         java.util.Map indexMap,
                                         java.util.Map depsMap,
                                         java.lang.String query,
                                         java.util.ArrayList list)
                                  throws ManifoldCFException
Do a limited fetch of cached distances

Throws:
ManifoldCFException

writeCachedDistance

protected void writeCachedDistance(java.lang.Long jobID,
                                   java.lang.String[] legalLinkTypes,
                                   HopCount.DocumentNode dn,
                                   int hopcountMethod)
                            throws ManifoldCFException
Write a distance into the cache.

Throws:
ManifoldCFException