org.apache.manifoldcf.crawler.connectors.webcrawler
Class WebcrawlerConnector.FindHTMLFormHandler

java.lang.Object
  extended by org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.FindHandler
      extended by org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.FindHTMLFormHandler
All Implemented Interfaces:
IDiscoveredLinkHandler, IHTMLHandler, IMetaTagHandler
Enclosing class:
WebcrawlerConnector

protected class WebcrawlerConnector.FindHTMLFormHandler
extends WebcrawlerConnector.FindHandler
implements IHTMLHandler

This class is the handler for HTML form parsing during state transitions


Field Summary
protected  FormDataAccumulator currentFormData
           
protected  FormDataAccumulator discoveredFormData
           
protected  java.util.regex.Pattern formNamePattern
           
 
Fields inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.FindHandler
parentURI, targetURI
 
Constructor Summary
WebcrawlerConnector.FindHTMLFormHandler(java.lang.String parentURI, java.util.regex.Pattern formNamePattern)
           
 
Method Summary
 void applyFormOverrides(LoginParameters lp)
           
 FormData getFormData()
           
 void noteAHREF(java.lang.String rawURL)
          Note discovered href
 void noteFormEnd()
          Note the end of a form
 void noteFormInput(java.util.Map inputAttributes)
          Note an input tag
 void noteFormStart(java.util.Map formAttributes)
          Note the start of a form
 void noteFRAMESRC(java.lang.String rawURL)
          Note discovered FRAME SRC
 void noteIMGSRC(java.lang.String rawURL)
          Note discovered IMG SRC
 void noteLINKHREF(java.lang.String rawURL)
          Note discovered href
 void noteMetaTag(java.util.Map metaAttributes)
          Note a meta tag
 
Methods inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.FindHandler
getTargetURI, noteDiscoveredLink
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.manifoldcf.crawler.connectors.webcrawler.IDiscoveredLinkHandler
noteDiscoveredLink
 

Field Detail

formNamePattern

protected java.util.regex.Pattern formNamePattern

discoveredFormData

protected FormDataAccumulator discoveredFormData

currentFormData

protected FormDataAccumulator currentFormData
Constructor Detail

WebcrawlerConnector.FindHTMLFormHandler

public WebcrawlerConnector.FindHTMLFormHandler(java.lang.String parentURI,
                                               java.util.regex.Pattern formNamePattern)
Method Detail

applyFormOverrides

public void applyFormOverrides(LoginParameters lp)

getFormData

public FormData getFormData()

noteMetaTag

public void noteMetaTag(java.util.Map metaAttributes)
                 throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note a meta tag

Specified by:
noteMetaTag in interface IMetaTagHandler
Parameters:
metaAttributes - are the attributes that belong to the tag.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteFormStart

public void noteFormStart(java.util.Map formAttributes)
                   throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note the start of a form

Specified by:
noteFormStart in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteFormInput

public void noteFormInput(java.util.Map inputAttributes)
                   throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note an input tag

Specified by:
noteFormInput in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteFormEnd

public void noteFormEnd()
                 throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note the end of a form

Specified by:
noteFormEnd in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteAHREF

public void noteAHREF(java.lang.String rawURL)
               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered href

Specified by:
noteAHREF in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteLINKHREF

public void noteLINKHREF(java.lang.String rawURL)
                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered href

Specified by:
noteLINKHREF in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteIMGSRC

public void noteIMGSRC(java.lang.String rawURL)
                throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered IMG SRC

Specified by:
noteIMGSRC in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

noteFRAMESRC

public void noteFRAMESRC(java.lang.String rawURL)
                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Note discovered FRAME SRC

Specified by:
noteFRAMESRC in interface IHTMLHandler
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException