Package org.apache.manifoldcf.crawler.connectors.webcrawler

Interface Summary
AuthenticationCredentials This interface describes immutable classes which represents authentication information for all kinds of authentication.
FormData This interface describes the form data gleaned from an HTML page.
FormDataElement This interface describes individual form data elements, for form submission.
IDiscoveredLinkHandler This interface describes the functionality needed by a link extractor to note a discovered link.
IHTMLHandler This interface describes the functionality needed by an HTML processor in order to handle an HTML document.
IMetaTagHandler This interface describes the functionality needed by a parser to handle metadata tags.
IRedirectionHandler This interface describes the functionality needed by an redirection processor in order to handle a redirection.
IThrottledConnection This interface represents an established connection to a URL.
IXMLHandler This interface describes the functionality needed by an XML processor in order to handle an XML document.
LoginCookies This interface describes cookies obtained during sequential authentication.
LoginParameters This interface describes login parameters to be used to submit a page during sequential authentication.
PageCredentials This interface describes immutable classes which represents authentication information for page-based authentication.
SequenceCredentials This interface describes immutable classes which represents authentication information for sequence-based authentication.
 

Class Summary
BasicParseState This class represents the basic, outermost parse state.
CookieManager This class manages the database table into which we write cookies.
CookieManager.CookiesCacheClass Cache class for robots.
CookieManager.CookiesDescription This is the object description for a session key object.
CookieManager.CookiesExecutor This is the executor object for locating cookies session objects.
CookieManager.DynamicCookieSet This is a set of cookies, built dynamically.
CookieSet This class represents a bunch of cookies
CredentialsDescription This class describes credential information pulled from a configuration.
CredentialsDescription.BasicCredential Basic type credentials
CredentialsDescription.CredentialsItem Class representing an individual credential item.
CredentialsDescription.LoginParameterIterator LoginParameter iterator
CredentialsDescription.NTLMCredential NTLM-style credentials
CredentialsDescription.SessionCredential Session credentials
CredentialsDescription.SessionCredentialItem Session credential helper class
CredentialsDescription.SessionCredentialParameter Session credential parameter class
DataCache This class is a cache of a specific URL's data.
DataCache.DocumentData This class represents everything we need to know about a document that's getting passed from the getDocumentVersions() phase to the processDocuments() phase.
DNSManager This class manages the database table into which we DNS entries for hosts.
DNSManager.DNSCacheClass Cache class for robots.
DNSManager.DNSInfo This is a cached data item.
DNSManager.HostDescription This is the object description for a robots host object.
DNSManager.HostExecutor This is the executor object for locating robots host objects.
FormDataAccumulator This class accumulates form data and allows overrides
FormDataAccumulator.FormItemIterator Iterator over FormItems
FormItem This class provides an individual data item
FormParseState This class interprets the tag stream generated by the BasicParseState class, and keeps track of the form tags.
LinkParseState This class recognizes and interprets all links
MetaParseState This class recognizes and interprets all meta tags
RobotsManager This class manages the database table into which we write robots.txt files for hosts.
RobotsManager.HostDescription This is the object description for a robots host object.
RobotsManager.HostExecutor This is the executor object for locating robots host objects.
RobotsManager.Record This class represents a record in a robots.txt file.
RobotsManager.RobotsCacheClass Cache class for robots.
RobotsManager.RobotsData This is a cached data item.
ScriptParseState This class interprets the tag stream generated by the BasicParseState class, and causes script sections to be skipped
ThrottleDescription This class describes complex throttling criteria pulled from a configuration.
ThrottleDescription.ThrottleItem Class representing an individual throttle item.
ThrottledFetcher This class uses httpclient to fetch stuff from webservers.
ThrottledFetcher.ConnectionBin Connection pool for a bin.
ThrottledFetcher.DataRecorder This class takes care of recording data and results for posterity
ThrottledFetcher.DataSession Helper class for the above
ThrottledFetcher.SocketCreateThread Create a secure socket in a thread, so that we can "give up" after a while if the socket fails to connect.
ThrottledFetcher.ThrottleBin Throttles for a bin.
ThrottledFetcher.ThrottledConnection Throttled connections.
ThrottledFetcher.ThrottledConnection.ExecuteMethodThread  
ThrottledFetcher.ThrottledInputstream This class throttles an input stream based on the specified byte rate parameters.
ThrottledFetcher.WebSecureSocketFactory HTTPClient secure socket factory, which implements SecureProtocolSocketFactory
TrustsDescription This class describes trust information pulled from a configuration.
TrustsDescription.TrustsItem Class representing an individual credential item.
WebcrawlerConfig Constants for the Webcrawler connector configuration.
WebcrawlerConnector This is the Web Crawler implementation of the IRepositoryConnector interface.
WebcrawlerConnector.CanonicalizationPolicies Class representing a list of canonicalization rules
WebcrawlerConnector.CanonicalizationPolicy Class representing a URL regular expression match, for the purposes of determining canonicalization policy
WebcrawlerConnector.DocumentURLFilter This class describes the url filtering information obtained from a digested DocumentSpecification.
WebcrawlerConnector.NameValue Name/value class
 

Exception Summary
ThrottledFetcher.PoolException Pool exception class
ThrottledFetcher.WaitException Wait exception class