public class FeedIndexingFilter extends Object implements IndexingFilter
IndexingFilter implementation to pull out the
relevant extracted Metadata fields from the RSS feeds
and into the index.| Modifier and Type | Field and Description |
|---|---|
static String |
dateFormatStr |
X_POINT_ID| Constructor and Description |
|---|
FeedIndexingFilter() |
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
org.apache.hadoop.io.Text url,
CrawlDatum datum,
Inlinks inlinks)
Extracts out the relevant fields:
FEED_AUTHOR
FEED_TAGS
FEED_PUBLISHED
FEED_UPDATED
FEED
And sends them to the
Indexer for indexing within the Nutch
index. |
org.apache.hadoop.conf.Configuration |
getConf() |
void |
setConf(org.apache.hadoop.conf.Configuration conf)
Sets the
Configuration object used to configure this
IndexingFilter. |
public static final String dateFormatStr
public NutchDocument filter(NutchDocument doc, Parse parse, org.apache.hadoop.io.Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
Indexer for indexing within the Nutch
index.filter in interface IndexingFilterdoc - document instance for collecting fieldsparse - parse data instanceurl - page urldatum - crawl datum for the pageinlinks - page inlinksIndexingExceptionpublic org.apache.hadoop.conf.Configuration getConf()
getConf in interface org.apache.hadoop.conf.ConfigurableConfiguration object used to configure
this IndexingFilter.public void setConf(org.apache.hadoop.conf.Configuration conf)
Configuration object used to configure this
IndexingFilter.setConf in interface org.apache.hadoop.conf.Configurableconf - The Configuration object used to configure
this IndexingFilter.Copyright © 2014 The Apache Software Foundation