public class AnchorIndexingFilter extends Object implements IndexingFilter
anchorIndexingFilter.deduplicate} in nutch-default.xml.| Modifier and Type | Field and Description |
|---|---|
static org.slf4j.Logger |
LOG |
X_POINT_ID| Constructor and Description |
|---|
AnchorIndexingFilter() |
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The
AnchorIndexingFilter filter object which supports boolean
configuration settings for the deduplication of anchors. |
org.apache.hadoop.conf.Configuration |
getConf()
Get the
Configuration object |
void |
setConf(org.apache.hadoop.conf.Configuration conf)
Set the
Configuration object |
public void setConf(org.apache.hadoop.conf.Configuration conf)
Configuration objectsetConf in interface org.apache.hadoop.conf.Configurablepublic org.apache.hadoop.conf.Configuration getConf()
Configuration objectgetConf in interface org.apache.hadoop.conf.Configurablepublic NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
AnchorIndexingFilter filter object which supports boolean
configuration settings for the deduplication of anchors.
See anchorIndexingFilter.deduplicate in nutch-default.xml.filter in interface IndexingFilterdoc - The NutchDocument objectparse - The relevant Parse object passing through the filterurl - URL to be filtered for anchor textdatum - The CrawlDatum entryinlinks - The Inlinks containing anchor textIndexingExceptionCopyright © 2014 The Apache Software Foundation