org.apache.manifoldcf.crawler.connectors.webcrawler
Class WebcrawlerConnector.ProcessActivityHTMLHandler

java.lang.Object
  extended by org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityLinkHandler
      extended by org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityHTMLHandler
All Implemented Interfaces:
IDiscoveredLinkHandler, IHTMLHandler, IMetaTagHandler
Enclosing class:
WebcrawlerConnector

protected class WebcrawlerConnector.ProcessActivityHTMLHandler
extends WebcrawlerConnector.ProcessActivityLinkHandler
implements IHTMLHandler

Class that describes HTML handling


Field Summary
 
Fields inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityLinkHandler
activities, contextDescription, documentIdentifier, filter, linkType
 
Constructor Summary
WebcrawlerConnector.ProcessActivityHTMLHandler(java.lang.String documentIdentifier, org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities, WebcrawlerConnector.DocumentURLFilter filter)
          Constructor.
 
Method Summary
 void noteAHREF(java.lang.String rawURL)
          Note discovered href
 void noteFormEnd()
          Note the end of a form
 void noteFormInput(java.util.Map inputAttributes)
          Note an input tag
 void noteFormStart(java.util.Map formAttributes)
          Note the start of a form
 void noteFRAMESRC(java.lang.String rawURL)
          Note discovered FRAME SRC
 void noteIMGSRC(java.lang.String rawURL)
          Note discovered IMG SRC
 void noteLINKHREF(java.lang.String rawURL)
          Note discovered href
 void noteMetaTag(java.util.Map metaAttributes)
          Note a meta tag
 boolean shouldIndex()
          Decide whether we should index.
 
Methods inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.ProcessActivityLinkHandler
noteDiscoveredLink
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.manifoldcf.crawler.connectors.webcrawler.IDiscoveredLinkHandler
noteDiscoveredLink
 

Constructor Detail

WebcrawlerConnector.ProcessActivityHTMLHandler

public WebcrawlerConnector.ProcessActivityHTMLHandler(java.lang.String documentIdentifier,
                                                      org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities,
                                                      WebcrawlerConnector.DocumentURLFilter filter)
Constructor.

Method Detail

shouldIndex

public boolean shouldIndex()
Decide whether we should index.


noteMetaTag

public void noteMetaTag(java.util.Map metaAttributes)
                 throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note a meta tag

Specified by:
noteMetaTag in interface IMetaTagHandler
Parameters:
metaAttributes - are the attributes that belong to the tag.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteFormStart

public void noteFormStart(java.util.Map formAttributes)
                   throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note the start of a form

Specified by:
noteFormStart in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteFormInput

public void noteFormInput(java.util.Map inputAttributes)
                   throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note an input tag

Specified by:
noteFormInput in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteFormEnd

public void noteFormEnd()
                 throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note the end of a form

Specified by:
noteFormEnd in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteAHREF

public void noteAHREF(java.lang.String rawURL)
               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered href

Specified by:
noteAHREF in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteLINKHREF

public void noteLINKHREF(java.lang.String rawURL)
                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered href

Specified by:
noteLINKHREF in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteIMGSRC

public void noteIMGSRC(java.lang.String rawURL)
                throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered IMG SRC

Specified by:
noteIMGSRC in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteFRAMESRC

public void noteFRAMESRC(java.lang.String rawURL)
                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered FRAME SRC

Specified by:
noteFRAMESRC in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException