public class HttpRobotRulesParser extends RobotRulesParser
RobotRulesParser class and contains
Http protocol specific implementation for obtaining the robots file.| Modifier and Type | Field and Description |
|---|---|
protected boolean |
allowForbidden |
static org.slf4j.Logger |
LOG |
agentNames, CACHE, EMPTY_RULES, FORBID_ALL_RULES| Constructor and Description |
|---|
HttpRobotRulesParser(org.apache.hadoop.conf.Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
crawlercommons.robots.BaseRobotRules |
getRobotRulesSet(Protocol http,
URL url)
The hosts for which the caching of robots rules is yet to be done,
it sends a Http request to the host corresponding to the
URL
passed, gets robots file, parses the rules and caches the rules object
to avoid re-work in future. |
getConf, getRobotRulesSet, main, parseRules, setConfpublic static final org.slf4j.Logger LOG
protected boolean allowForbidden
public HttpRobotRulesParser(org.apache.hadoop.conf.Configuration conf)
public crawlercommons.robots.BaseRobotRules getRobotRulesSet(Protocol http, URL url)
URL
passed, gets robots file, parses the rules and caches the rules object
to avoid re-work in future.getRobotRulesSet in class RobotRulesParserhttp - The Protocol objecturl - URLBaseRobotRules object for the rulesCopyright © 2014 The Apache Software Foundation