public interface IndexingFilter extends Pluggable, org.apache.hadoop.conf.Configurable
| Modifier and Type | Field and Description |
|---|---|
static String |
X_POINT_ID
The name of the extension point.
|
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a
parse.
|
static final String X_POINT_ID
NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
doc - document instance for collecting fieldsparse - parse data instanceurl - page urldatum - crawl datum for the pageinlinks - page inlinksIndexingExceptionCopyright © 2014 The Apache Software Foundation