public class BasicIndexingFilter extends Object implements IndexingFilter
indexer.add.domain in nutch-default.xml.
title is truncated as per indexer.max.title.length in nutch-default.xml.
(As per NUTCH-1004, a zero-length title is not added)
content is truncated as per indexer.max.content.length in nutch-default.xml.| Modifier and Type | Field and Description |
|---|---|
static org.slf4j.Logger |
LOG |
X_POINT_ID| Constructor and Description |
|---|
BasicIndexingFilter() |
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
The
BasicIndexingFilter filter object which supports few
configuration settings for adding basic searchable fields. |
org.apache.hadoop.conf.Configuration |
getConf()
Get the
Configuration object |
void |
setConf(org.apache.hadoop.conf.Configuration conf)
Set the
Configuration object |
public NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
BasicIndexingFilter filter object which supports few
configuration settings for adding basic searchable fields.
See indexer.add.domain, indexer.max.title.length,
indexer.max.content.length in nutch-default.xml.filter in interface IndexingFilterdoc - The NutchDocument objectparse - The relevant Parse object passing through the filterurl - URL to be filtered for anchor textdatum - The CrawlDatum entryinlinks - The Inlinks containing anchor textIndexingExceptionpublic void setConf(org.apache.hadoop.conf.Configuration conf)
Configuration objectsetConf in interface org.apache.hadoop.conf.Configurablepublic org.apache.hadoop.conf.Configuration getConf()
Configuration objectgetConf in interface org.apache.hadoop.conf.ConfigurableCopyright © 2014 The Apache Software Foundation