| Interface | Description |
|---|---|
| IndexingFilter |
Extension point for indexing.
|
| IndexWriter |
| Class | Description |
|---|---|
| CleaningJob |
The class scans CrawlDB looking for entries with status DB_GONE (404) or
DB_DUPLICATE and
sends delete requests to indexers for those documents.
|
| CleaningJob.DBFilter | |
| CleaningJob.DeleterReducer | |
| IndexerMapReduce | |
| IndexerOutputFormat | |
| IndexingFilters |
Creates and caches
IndexingFilter implementing plugins. |
| IndexingFiltersChecker |
Reads and parses a URL and run the indexers on it.
|
| IndexingJob |
Generic indexer which relies on the plugins implementing IndexWriter
|
| IndexWriters |
Creates and caches
IndexWriter implementing plugins. |
| NutchDocument |
A
NutchDocument is the unit of indexing. |
| NutchField |
This class represents a multi-valued field with a weight.
|
| NutchIndexAction |
A
NutchIndexAction is the new unit of indexing holding the
document and action information. |
| Exception | Description |
|---|---|
| IndexingException |
Copyright © 2014 The Apache Software Foundation