|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.Objectorg.apache.manifoldcf.core.connector.BaseConnector
org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
org.apache.manifoldcf.crawler.connectors.rss.RSSConnector
public class RSSConnector
This is the RSS implementation of the IRepositoryConnector interface. This connector basically looks at an RSS document in order to seed the document queue. The document is always fetched from the same URL (it's specified in the configuration parameters). The documents subsequently crawled are not scraped for additional links; only the primary document is ingested. On the other hand, redirections ARE honored, so that various sites that use this trick can be supported (e.g. the BBC)
| Nested Class Summary | |
|---|---|
protected static class |
RSSConnector.CanonicalizationPolicies
Class representing a list of canonicalization rules |
protected static class |
RSSConnector.CanonicalizationPolicy
Class representing a URL regular expression match, for the purposes of determining canonicalization policy |
protected static class |
RSSConnector.EvaluatorToken
Evaluator token. |
protected static class |
RSSConnector.EvaluatorTokenStream
Token stream. |
protected class |
RSSConnector.FeedContextClass
|
protected class |
RSSConnector.FeedItemContextClass
|
protected static class |
RSSConnector.Filter
Class that handles parsing and interpretation of the document specification. |
protected static class |
RSSConnector.MappingRule
Class representing a mapping rule |
protected static class |
RSSConnector.MappingRules
Class that represents all mappings |
protected static class |
RSSConnector.NameValue
Name/value class |
protected class |
RSSConnector.OuterContextClass
This class handles the outermost XML context for the feed document. |
protected class |
RSSConnector.RDFContextClass
|
protected class |
RSSConnector.RDFItemContextClass
|
protected class |
RSSConnector.RSSChannelContextClass
|
protected class |
RSSConnector.RSSContextClass
|
protected class |
RSSConnector.RSSItemContextClass
|
| Field Summary | |
|---|---|
static java.lang.String |
_rcsid
|
static java.lang.String |
ACTIVITY_FETCH
|
static java.lang.String |
ACTIVITY_ROBOTSPARSE
|
static java.lang.String |
bandwidthParameter
Max kilobytes per second per server |
protected static DataCache |
cache
|
static int |
CHROMED_SKIP
Chromed suppression mode - skip all chromed content |
static int |
CHROMED_USE
Chromed suppression mode - use chromed content |
static int |
DECHROMED_CONTENT
Dechromed content mode - content field |
static int |
DECHROMED_DESCRIPTION
Dechromed content mode - description field |
static int |
DECHROMED_NONE
Dechromed content mode - none |
static java.lang.String |
emailParameter
Email parameter |
protected ThrottledFetcher |
fetcher
The throttled fetcher used by this instance |
protected static java.util.Map |
fetcherMap
Storage for fetcher objects |
protected java.lang.String |
from
The email address for this connector instance |
protected boolean |
isInitialized
Flag indicating whether session data is initialized |
static java.lang.String |
maxFetchesParameter
Max fetches per minute per server |
protected int |
maxOpenConnectionsPerServer
The maximum open connections |
static java.lang.String |
maxOpenParameter
Max simultaneous open connections per server |
protected static java.util.HashMap |
milTzMap
Timezone mapping from RFC822 timezones to ones understood by Java |
protected double |
minimumMillisecondsPerBytePerServer
The minimum milliseconds between bytes |
protected long |
minimumMillisecondsPerFetchPerServer
The minimum milliseconds between fetches |
protected static java.util.HashMap |
monthMap
|
protected java.lang.String |
proxyAuthDomain
Proxy auth domain |
static java.lang.String |
proxyAuthDomainParameter
Proxy auth domain |
protected java.lang.String |
proxyAuthPassword
Proxy auth password |
static java.lang.String |
proxyAuthPasswordParameter
Proxy auth password |
protected java.lang.String |
proxyAuthUsername
Proxy auth username |
static java.lang.String |
proxyAuthUsernameParameter
Proxy auth username |
protected java.lang.String |
proxyHost
The proxy host |
static java.lang.String |
proxyHostParameter
Proxy host name |
protected int |
proxyPort
The proxy port |
static java.lang.String |
proxyPortParameter
Proxy port |
protected Robots |
robots
The robots object used by this instance |
protected static int |
ROBOTS_ALL
|
protected static int |
ROBOTS_DATA
|
protected static int |
ROBOTS_NONE
|
protected static java.util.Map |
robotsMap
Storage for robots objects |
protected int |
robotsUsage
Robots usage flag |
static java.lang.String |
robotsUsageParameter
Robots usage parameter |
protected java.lang.String |
throttleGroupName
The throttle group name |
static java.lang.String |
throttleGroupParameter
The throttle group name |
protected static java.util.Map |
understoodProtocols
|
protected java.lang.String |
userAgent
The user-agent for this connector instance |
| Fields inherited from class org.apache.manifoldcf.core.connector.BaseConnector |
|---|
currentContext, params |
| Fields inherited from interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector |
|---|
JOBMODE_CONTINUOUS, JOBMODE_ONCEONLY, MODEL_ADD, MODEL_ADD_CHANGE, MODEL_ADD_CHANGE_DELETE, MODEL_ALL, MODEL_PARTIAL |
| Constructor Summary | |
|---|---|
RSSConnector()
Constructor. |
|
| Method Summary | |
|---|---|
void |
addSeedDocuments(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification spec,
long startTime,
long endTime)
Queue "seed" documents. |
java.lang.String |
check()
Check status of connection. |
void |
connect(org.apache.manifoldcf.core.interfaces.ConfigParams configParams)
Connect. |
void |
disconnect()
Close the connection. |
protected static java.lang.String |
doCanonicalization(RSSConnector.CanonicalizationPolicy p,
java.net.URI url)
Code to canonicalize a URL. |
java.lang.String[] |
getActivitiesList()
Return the list of activities that this connector supports (i.e. |
java.lang.String[] |
getBinNames(java.lang.String documentIdentifier)
Get the bin name string for a document identifier. |
int |
getConnectorModel()
Tell the world what model this connector uses for getDocumentIdentifiers(). |
java.lang.String[] |
getDocumentVersions(java.lang.String[] documentIdentifiers,
java.lang.String[] oldVersions,
org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification spec,
int jobType,
boolean usesDefaultAuthority)
Get document versions given an array of document identifiers. |
protected ThrottledFetcher |
getFetcher()
Given the current parameters, find the correct throttled fetcher object (or create one if not there). |
java.lang.String |
getJSPFolder()
Return the path for the UI interface JSP elements. |
int |
getMaxDocumentRequest()
Get the maximum number of documents to amalgamate together into one batch, for this connector. |
protected Robots |
getRobots(ThrottledFetcher fetcher)
Given the current parameters, find the correct robots object (or create one if none found). |
protected void |
getSession()
Establish a session |
protected void |
handleRSSFeedSAX(java.lang.String documentIdentifier,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
RSSConnector.Filter filter)
Handle an RSS feed document, using SAX to limit the memory impact |
protected boolean |
isContentInteresting(org.apache.manifoldcf.crawler.interfaces.IFingerprintActivity activities,
java.lang.String contentType)
Code to check if data is interesting, based on response code and content type. |
protected static java.lang.String |
makeDocumentIdentifier(RSSConnector.CanonicalizationPolicies policies,
java.lang.String parentIdentifier,
java.lang.String rawURL)
Convert an absolute or relative URL to a document identifier. |
void |
outputConfigurationBody(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
java.lang.String tabName)
Output the configuration body section. |
void |
outputConfigurationHeader(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
java.util.ArrayList tabsArray)
Output the configuration header section. |
void |
outputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification ds,
java.lang.String tabName)
Output the specification body section. |
void |
outputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification ds,
java.util.ArrayList tabsArray)
Output the specification header section. |
protected static void |
pack(java.lang.StringBuffer output,
java.lang.String value,
char delimiter)
Stuffer for packing a single string with an end delimiter |
protected static void |
packFixedList(java.lang.StringBuffer output,
java.lang.String[] values,
char delimiter)
Stuffer for packing lists of fixed length |
protected static void |
packList(java.lang.StringBuffer output,
java.util.ArrayList values,
char delimiter)
Stuffer for packing lists of variable length |
protected static void |
packList(java.lang.StringBuffer output,
java.lang.String[] values,
char delimiter)
Another stuffer for packing lists of variable length |
protected static java.lang.Long |
parseChinaDate(java.lang.String dateValue)
Parse a China Daily News date |
protected static java.lang.Long |
parseRSSDate(java.lang.String dateValue)
Parse an RSS date |
protected static java.lang.Long |
parseZuluDate(java.lang.String dateValue)
Parse an RDF date |
void |
poll()
This method is periodically called for all connectors that are connected but not in active use. |
java.lang.String |
processConfigurationPost(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
Process a configuration post. |
void |
processDocuments(java.lang.String[] documentIdentifiers,
java.lang.String[] versions,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification spec,
boolean[] scanOnly,
int jobType)
Process a set of documents. |
java.lang.String |
processSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification ds)
Process a specification post. |
void |
releaseDocumentVersions(java.lang.String[] documentIdentifiers,
java.lang.String[] versions)
Free a set of documents. |
protected static int |
unpack(java.lang.StringBuffer sb,
java.lang.String value,
int startPosition,
char delimiter)
Unstuffer for the above. |
protected static int |
unpackFixedList(java.lang.String[] output,
java.lang.String value,
int startPosition,
char delimiter)
Unstuffer for unpacking lists of fixed length |
protected static int |
unpackList(java.util.ArrayList output,
java.lang.String value,
int startPosition,
char delimiter)
Unstuffer for unpacking lists of variable length. |
void |
viewConfiguration(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
View configuration. |
void |
viewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification ds)
View specification. |
| Methods inherited from class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector |
|---|
addSeedDocuments, getDocumentIdentifiers, getDocumentIdentifiers, getDocumentVersions, getDocumentVersions, getDocumentVersions, getDocumentVersions, getRelationshipTypes, getRemainingDocumentIdentifiers, processDocuments, requestInfo |
| Methods inherited from class org.apache.manifoldcf.core.connector.BaseConnector |
|---|
clearThreadContext, deinstall, getConfiguration, install, setThreadContext |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface org.apache.manifoldcf.core.interfaces.IConnector |
|---|
clearThreadContext, deinstall, getConfiguration, install, setThreadContext |
| Field Detail |
|---|
public static final java.lang.String _rcsid
public static final java.lang.String robotsUsageParameter
public static final java.lang.String emailParameter
public static final java.lang.String bandwidthParameter
public static final java.lang.String maxOpenParameter
public static final java.lang.String maxFetchesParameter
public static final java.lang.String throttleGroupParameter
public static final java.lang.String proxyHostParameter
public static final java.lang.String proxyPortParameter
public static final java.lang.String proxyAuthDomainParameter
public static final java.lang.String proxyAuthUsernameParameter
public static final java.lang.String proxyAuthPasswordParameter
protected static final int ROBOTS_NONE
protected static final int ROBOTS_DATA
protected static final int ROBOTS_ALL
public static final int DECHROMED_NONE
public static final int DECHROMED_DESCRIPTION
public static final int DECHROMED_CONTENT
public static final int CHROMED_USE
public static final int CHROMED_SKIP
protected int robotsUsage
protected java.lang.String userAgent
protected java.lang.String from
protected long minimumMillisecondsPerFetchPerServer
protected int maxOpenConnectionsPerServer
protected double minimumMillisecondsPerBytePerServer
protected java.lang.String throttleGroupName
protected java.lang.String proxyHost
protected int proxyPort
protected java.lang.String proxyAuthDomain
protected java.lang.String proxyAuthUsername
protected java.lang.String proxyAuthPassword
protected ThrottledFetcher fetcher
protected Robots robots
protected static java.util.Map fetcherMap
protected static java.util.Map robotsMap
protected boolean isInitialized
protected static DataCache cache
protected static final java.util.Map understoodProtocols
public static final java.lang.String ACTIVITY_FETCH
public static final java.lang.String ACTIVITY_ROBOTSPARSE
protected static java.util.HashMap monthMap
protected static final java.util.HashMap milTzMap
| Constructor Detail |
|---|
public RSSConnector()
| Method Detail |
|---|
protected void getSession()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionpublic java.lang.String[] getActivitiesList()
getActivitiesList in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorgetActivitiesList in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorpublic int getConnectorModel()
getConnectorModel in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorgetConnectorModel in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorpublic java.lang.String getJSPFolder()
public void connect(org.apache.manifoldcf.core.interfaces.ConfigParams configParams)
connect in interface org.apache.manifoldcf.core.interfaces.IConnectorconnect in class org.apache.manifoldcf.core.connector.BaseConnectorconfigParams - are the configuration parameters for this connection.
Note well: There are no exceptions allowed from this call, since it is expected to mainly establish connection parameters.
public void poll()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
poll in interface org.apache.manifoldcf.core.interfaces.IConnectorpoll in class org.apache.manifoldcf.core.connector.BaseConnectororg.apache.manifoldcf.core.interfaces.ManifoldCFException
public java.lang.String check()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
check in interface org.apache.manifoldcf.core.interfaces.IConnectorcheck in class org.apache.manifoldcf.core.connector.BaseConnectororg.apache.manifoldcf.core.interfaces.ManifoldCFException
public void disconnect()
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
disconnect in interface org.apache.manifoldcf.core.interfaces.IConnectordisconnect in class org.apache.manifoldcf.core.connector.BaseConnectororg.apache.manifoldcf.core.interfaces.ManifoldCFExceptionpublic java.lang.String[] getBinNames(java.lang.String documentIdentifier)
getBinNames in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorgetBinNames in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectordocumentIdentifier - is the document identifier.
public void addSeedDocuments(org.apache.manifoldcf.crawler.interfaces.ISeedingActivity activities,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification spec,
long startTime,
long endTime)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
addSeedDocuments in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectoractivities - is the interface this method should use to perform whatever framework actions are desired.spec - is a document specification (that comes from the job).startTime - is the beginning of the time range to consider, inclusive.endTime - is the end of the time range to consider, exclusive.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
protected static java.lang.String makeDocumentIdentifier(RSSConnector.CanonicalizationPolicies policies,
java.lang.String parentIdentifier,
java.lang.String rawURL)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
policies - are the canonicalization policies in effect.parentIdentifier - the identifier of the document in which the raw url was found, or null if none.rawURL - is the raw, un-normalized and un-canonicalized url.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
protected static java.lang.String doCanonicalization(RSSConnector.CanonicalizationPolicy p,
java.net.URI url)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.net.URISyntaxException
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.net.URISyntaxException
public java.lang.String[] getDocumentVersions(java.lang.String[] documentIdentifiers,
java.lang.String[] oldVersions,
org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification spec,
int jobType,
boolean usesDefaultAuthority)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
getDocumentVersions in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorgetDocumentVersions in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectordocumentIdentifiers - is the array of local document identifiers, as understood by this connector.oldVersions - is the corresponding array of version strings that have been saved for the document identifiers.
A null value indicates that this is a first-time fetch, while an empty string indicates that the previous document
had an empty version string.activities - is the interface this method should use to perform whatever framework actions are desired.spec - is the current document specification for the current job. If there is a dependency on this
specification, then the version string should include the pertinent data, so that reingestion will occur
when the specification changes. This is primarily useful for metadata.jobType - is an integer describing how the job is being run, whether continuous or once-only.usesDefaultAuthority - will be true only if the authority in use for these documents is the default one.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
public void processDocuments(java.lang.String[] documentIdentifiers,
java.lang.String[] versions,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification spec,
boolean[] scanOnly,
int jobType)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
processDocuments in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorprocessDocuments in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectordocumentIdentifiers - is the set of document identifiers to process.activities - is the interface this method should use to queue up new document references
and ingest documents.spec - is the document specification.scanOnly - is an array corresponding to the document identifiers. It is set to true to indicate when the processing
should only find other references, and should not actually call the ingestion methods.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
public void releaseDocumentVersions(java.lang.String[] documentIdentifiers,
java.lang.String[] versions)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
releaseDocumentVersions in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorreleaseDocumentVersions in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectordocumentIdentifiers - is the set of document identifiers.versions - is the corresponding set of version identifiers (individual identifiers may be null).
org.apache.manifoldcf.core.interfaces.ManifoldCFException
public void outputConfigurationHeader(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
java.util.ArrayList tabsArray)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
outputConfigurationHeader in interface org.apache.manifoldcf.core.interfaces.IConnectoroutputConfigurationHeader in class org.apache.manifoldcf.core.connector.BaseConnectorthreadContext - is the local thread context.out - is the output to which any HTML should be sent.parameters - are the configuration parameters, as they currently exist, for this connection being configured.tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException
public void outputConfigurationBody(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
java.lang.String tabName)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
public java.lang.String processConfigurationPost(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
processConfigurationPost in interface org.apache.manifoldcf.core.interfaces.IConnectorprocessConfigurationPost in class org.apache.manifoldcf.core.connector.BaseConnectorthreadContext - is the local thread context.variableContext - is the set of variables available from the post, including binary file post information.parameters - are the configuration parameters, as they currently exist, for this connection being configured.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
public void viewConfiguration(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
viewConfiguration in interface org.apache.manifoldcf.core.interfaces.IConnectorviewConfiguration in class org.apache.manifoldcf.core.connector.BaseConnectorthreadContext - is the local thread context.out - is the output to which any HTML should be sent.parameters - are the configuration parameters, as they currently exist, for this connection being configured.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException
public void outputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification ds,
java.util.ArrayList tabsArray)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
outputSpecificationHeader in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectoroutputSpecificationHeader in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorout - is the output to which any HTML should be sent.ds - is the current document specification for this job.tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException
public void outputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification ds,
java.lang.String tabName)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
public java.lang.String processSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification ds)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
processSpecificationPost in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorprocessSpecificationPost in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorvariableContext - contains the post data, including binary file-upload information.ds - is the current document specification for this job.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
public void viewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
org.apache.manifoldcf.crawler.interfaces.DocumentSpecification ds)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
java.io.IOException
viewSpecification in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorviewSpecification in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnectorout - is the output to which any HTML should be sent.ds - is the current document specification for this job.
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException
protected void handleRSSFeedSAX(java.lang.String documentIdentifier,
org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
RSSConnector.Filter filter)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruptionprotected static java.lang.Long parseZuluDate(java.lang.String dateValue)
protected static java.lang.Long parseChinaDate(java.lang.String dateValue)
protected static java.lang.Long parseRSSDate(java.lang.String dateValue)
public int getMaxDocumentRequest()
getMaxDocumentRequest in interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnectorgetMaxDocumentRequest in class org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
protected boolean isContentInteresting(org.apache.manifoldcf.crawler.interfaces.IFingerprintActivity activities,
java.lang.String contentType)
throws org.apache.manifoldcf.agents.interfaces.ServiceInterruption,
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption
org.apache.manifoldcf.core.interfaces.ManifoldCFException
protected static void pack(java.lang.StringBuffer output,
java.lang.String value,
char delimiter)
protected static int unpack(java.lang.StringBuffer sb,
java.lang.String value,
int startPosition,
char delimiter)
protected static void packFixedList(java.lang.StringBuffer output,
java.lang.String[] values,
char delimiter)
protected static int unpackFixedList(java.lang.String[] output,
java.lang.String value,
int startPosition,
char delimiter)
protected static void packList(java.lang.StringBuffer output,
java.util.ArrayList values,
char delimiter)
protected static void packList(java.lang.StringBuffer output,
java.lang.String[] values,
char delimiter)
protected static int unpackList(java.util.ArrayList output,
java.lang.String value,
int startPosition,
char delimiter)
output - is the array to fill with the unpacked data.value - is the value to unpack.startPosition - is the place to start the unpack.delimiter - is the character to use between values.
protected ThrottledFetcher getFetcher()
protected Robots getRobots(ThrottledFetcher fetcher)
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||