|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.Objectorg.apache.manifoldcf.core.database.BaseTable
org.apache.manifoldcf.crawler.connectors.webcrawler.RobotsManager
public class RobotsManager
This class manages the database table into which we write robots.txt files for hosts. The data resides in the database, as well as in cache (up to a certain point). The result is that there is a memory limited, database-backed repository of robots files that we can draw on.
| Nested Class Summary | |
|---|---|
protected static class |
RobotsManager.HostDescription
This is the object description for a robots host object. |
protected static class |
RobotsManager.HostExecutor
This is the executor object for locating robots host objects. |
protected static class |
RobotsManager.Record
This class represents a record in a robots.txt file. |
protected static class |
RobotsManager.RobotsCacheClass
Cache class for robots. |
protected static class |
RobotsManager.RobotsData
This is a cached data item. |
| Field Summary | |
|---|---|
static java.lang.String |
_rcsid
|
protected static java.lang.String |
expirationField
|
protected static java.lang.String |
hostField
|
protected static RobotsManager.RobotsCacheClass |
robotsCacheClass
|
protected static java.lang.String |
robotsField
|
| Fields inherited from class org.apache.manifoldcf.core.database.BaseTable |
|---|
dbInterface, tableName |
| Constructor Summary | |
|---|---|
RobotsManager(org.apache.manifoldcf.core.interfaces.IThreadContext tc,
org.apache.manifoldcf.core.interfaces.IDBInterface database)
Constructor. |
|
| Method Summary | |
|---|---|
java.lang.Boolean |
checkFetchAllowed(java.lang.String userAgent,
java.lang.String hostName,
long currentTime,
java.lang.String pathString,
org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
Read robots.txt data from the cache or from the database. |
void |
deinstall()
Uninstall the manager. |
protected static boolean |
doesPathMatch(java.lang.String path,
int pathIndex,
java.lang.String spec,
int specIndex)
Recursive method for matching specification to path. |
protected static boolean |
doesPathMatch(java.lang.String path,
java.lang.String spec)
Check if path matches specification |
protected static java.lang.String |
getRobotsKey(java.lang.String hostName)
Construct a key which represents an individual host name. |
void |
install()
Install the manager. |
protected static java.lang.String |
makeReadable(java.lang.String inputString)
Convert a string from the robots file into a readable form that does NOT contain NUL characters (since postgresql does not accept those). |
protected RobotsManager.RobotsData |
readRobotsData(java.lang.String hostName,
org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
Read robots data, if it exists. |
void |
writeRobotsData(java.lang.String hostName,
long expirationTime,
java.io.InputStream data)
Write robots.txt, replacing any existing row. |
| Methods inherited from class org.apache.manifoldcf.core.database.BaseTable |
|---|
addTableIndex, analyzeTable, beginTransaction, constructDistinctOnClause, constructOffsetLimitClause, constructRegexpClause, constructSubstringClause, endTransaction, getDatabaseCacheKey, getDBInterface, getMaxInClause, getMaxOrClause, getTableIndexes, getTableName, getTableSchema, getTransactionID, makeTableKey, noteModifications, performAddIndex, performAlter, performCreate, performDelete, performDrop, performInsert, performLock, performModification, performQuery, performQuery, performRemoveIndex, performUpdate, prepareRowForSave, readRow, reindexTable, signalRollback |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String _rcsid
protected static RobotsManager.RobotsCacheClass robotsCacheClass
protected static final java.lang.String hostField
protected static final java.lang.String robotsField
protected static final java.lang.String expirationField
| Constructor Detail |
|---|
public RobotsManager(org.apache.manifoldcf.core.interfaces.IThreadContext tc,
org.apache.manifoldcf.core.interfaces.IDBInterface database)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
tc - is the thread context.database - is the database handle.
org.apache.manifoldcf.core.interfaces.ManifoldCFException| Method Detail |
|---|
public void install()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.core.interfaces.ManifoldCFException
public void deinstall()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.core.interfaces.ManifoldCFException
public java.lang.Boolean checkFetchAllowed(java.lang.String userAgent,
java.lang.String hostName,
long currentTime,
java.lang.String pathString,
org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
hostName - is the host for which the data is desired.currentTime - is the time of the check.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
public void writeRobotsData(java.lang.String hostName,
long expirationTime,
java.io.InputStream data)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
hostName - is the host.expirationTime - is the time this data should expire.data - is the robots data stream. May be null.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOExceptionprotected static java.lang.String getRobotsKey(java.lang.String hostName)
hostName - is the name of the connector.
protected RobotsManager.RobotsData readRobotsData(java.lang.String hostName,
org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionprotected static java.lang.String makeReadable(java.lang.String inputString)
protected static boolean doesPathMatch(java.lang.String path,
java.lang.String spec)
protected static boolean doesPathMatch(java.lang.String path,
int pathIndex,
java.lang.String spec,
int specIndex)
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||