public class LanguageIndexingFilter extends Object implements IndexingFilter
IndexingFilter that
add a lang (language) field to the document.
It tries to find the language of the document by:
HTMLLanguageParser add some language
informationContent-Language HTTP header can be
foundX_POINT_ID| Constructor and Description |
|---|
LanguageIndexingFilter()
Constructs a new Language Indexing Filter.
|
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a
parse.
|
org.apache.hadoop.conf.Configuration |
getConf() |
void |
setConf(org.apache.hadoop.conf.Configuration conf) |
public LanguageIndexingFilter()
public NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
IndexingFilterfilter in interface IndexingFilterdoc - document instance for collecting fieldsparse - parse data instanceurl - page urldatum - crawl datum for the pageinlinks - page inlinksIndexingExceptionpublic void setConf(org.apache.hadoop.conf.Configuration conf)
setConf in interface org.apache.hadoop.conf.Configurablepublic org.apache.hadoop.conf.Configuration getConf()
getConf in interface org.apache.hadoop.conf.ConfigurableCopyright © 2014 The Apache Software Foundation