See: Description
| Package | Description |
|---|---|
| org.apache.nutch.protocol.http.api |
Common API used by HTTP plugins (
http,
httpclient) |
| org.apache.nutch.urlfilter.api |
| Package | Description |
|---|---|
| org.apache.nutch.protocol.file |
Protocol plugin which supports retrieving local file resources.
|
| org.apache.nutch.protocol.ftp |
Protocol plugin which supports retrieving documents via the ftp protocol.
|
| org.apache.nutch.protocol.http |
Protocol plugin which supports retrieving documents via the http protocol.
|
| org.apache.nutch.protocol.httpclient |
Protocol plugin which supports retrieving documents via the HTTP and
HTTPS protocols, optionally with Basic, Digest and NTLM authentication
schemes for web server as well as proxy server.
|
| Package | Description |
|---|---|
| org.apache.nutch.net.urlnormalizer.basic | |
| org.apache.nutch.net.urlnormalizer.pass | |
| org.apache.nutch.net.urlnormalizer.regex |
| Package | Description |
|---|---|
| org.apache.nutch.scoring.link | |
| org.apache.nutch.scoring.opic | |
| org.apache.nutch.scoring.tld |
Top Level Domain Scoring plugin.
|
| org.apache.nutch.scoring.urlmeta |
URL Meta Tag Scoring Plugin
|
| Package | Description |
|---|---|
| org.apache.nutch.parse.headings |
| Package | Description |
|---|---|
| org.apache.nutch.indexer.anchor |
An indexing plugin for inbound anchor text.
|
| org.apache.nutch.indexer.basic |
A basic indexing plugin.
|
| org.apache.nutch.indexer.feed | |
| org.apache.nutch.indexer.metadata | |
| org.apache.nutch.indexer.staticfield |
A simple plugin called at indexing that adds fields with static data.
|
| org.apache.nutch.indexer.subcollection | |
| org.apache.nutch.indexer.tld |
Top Level Domain Indexing plugin.
|
| org.apache.nutch.indexer.urlmeta |
URL Meta Tag Indexing Plugin
|
| Package | Description |
|---|---|
| org.apache.nutch.indexwriter.solr |
| Package | Description |
|---|---|
| org.apache.nutch.analysis.lang |
Text document language identifier.
|
| org.apache.nutch.collection |
Subcollection is a subset of an index.
|
| org.creativecommons.nutch |
Sample plugins that parse and index Creative Commons medadata.
|
Copyright © 2014 The Apache Software Foundation