|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.Objectorg.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.DocumentURLFilter
protected static class WebcrawlerConnector.DocumentURLFilter
This class describes the url filtering information obtained from a digested DocumentSpecification.
| Field Summary | |
|---|---|
protected WebcrawlerConnector.CanonicalizationPolicies |
canonicalizationPolicies
Canonicalization policies |
protected java.util.ArrayList |
excludePatterns
The arraylist of exclude patterns |
protected java.util.ArrayList |
includePatterns
The arraylist of include patterns |
protected java.util.HashMap |
seedHosts
The hash map of seed hosts, to limit urls by, if non-null |
| Constructor Summary | |
|---|---|
WebcrawlerConnector.DocumentURLFilter(org.apache.manifoldcf.crawler.interfaces.DocumentSpecification spec)
Process a document specification to produce a filter. |
|
| Method Summary | |
|---|---|
WebcrawlerConnector.CanonicalizationPolicies |
getCanonicalizationPolicies()
Get canonicalization policies |
boolean |
isDocumentAndHostLegal(java.lang.String url)
Check if both a document and host are legal. |
boolean |
isDocumentLegal(java.lang.String url)
Check if the document identifier is legal. |
boolean |
isHostLegal(java.lang.String host)
Check if a host is legal. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected java.util.ArrayList includePatterns
protected java.util.ArrayList excludePatterns
protected java.util.HashMap seedHosts
protected WebcrawlerConnector.CanonicalizationPolicies canonicalizationPolicies
| Constructor Detail |
|---|
public WebcrawlerConnector.DocumentURLFilter(org.apache.manifoldcf.crawler.interfaces.DocumentSpecification spec)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.core.interfaces.ManifoldCFException| Method Detail |
|---|
public boolean isDocumentAndHostLegal(java.lang.String url)
public boolean isHostLegal(java.lang.String host)
public boolean isDocumentLegal(java.lang.String url)
public WebcrawlerConnector.CanonicalizationPolicies getCanonicalizationPolicies()
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||