org.apache.manifoldcf.crawler.connectors.webcrawler
Interface IThrottledConnection

All Known Implementing Classes:
ThrottledFetcher.ThrottledConnection

public interface IThrottledConnection

This interface represents an established connection to a URL.


Field Summary
static java.lang.String _rcsid
           
static int FETCH_BAD_URI
           
static int FETCH_CIRCULAR_REDIRECT
           
static int FETCH_INTERRUPTED
           
static int FETCH_IO_ERROR
           
static int FETCH_NOT_TRIED
           
static int FETCH_SEQUENCE_ERROR
           
static int FETCH_UNKNOWN_ERROR
           
 
Method Summary
 void beginFetch(java.lang.String fetchType)
          Begin the fetch process.
 void close()
          Close the connection.
 void doneFetch(org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
          Done with the fetch.
 void executeFetch(java.lang.String urlPath, java.lang.String userAgent, java.lang.String from, int connectionTimeoutMilliseconds, int socketTimeoutMilliseconds, boolean redirectOK, java.lang.String host, FormData formData, LoginCookies loginCookies)
          Execute the fetch and get the return code.
 LoginCookies getLastFetchCookies()
          Get the last fetch cookies.
 java.io.InputStream getResponseBodyStream()
          Get the response input stream.
 int getResponseCode()
          Get the http response code.
 java.lang.String getResponseHeader(java.lang.String headerName)
          Get a specified response header, if it exists.
 void noteInterrupted(java.lang.Throwable e)
          Note that the connection fetch was interrupted by something.
 

Field Detail

_rcsid

static final java.lang.String _rcsid
See Also:
Constant Field Values

FETCH_NOT_TRIED

static final int FETCH_NOT_TRIED
See Also:
Constant Field Values

FETCH_CIRCULAR_REDIRECT

static final int FETCH_CIRCULAR_REDIRECT
See Also:
Constant Field Values

FETCH_BAD_URI

static final int FETCH_BAD_URI
See Also:
Constant Field Values

FETCH_SEQUENCE_ERROR

static final int FETCH_SEQUENCE_ERROR
See Also:
Constant Field Values

FETCH_IO_ERROR

static final int FETCH_IO_ERROR
See Also:
Constant Field Values

FETCH_INTERRUPTED

static final int FETCH_INTERRUPTED
See Also:
Constant Field Values

FETCH_UNKNOWN_ERROR

static final int FETCH_UNKNOWN_ERROR
See Also:
Constant Field Values
Method Detail

beginFetch

void beginFetch(java.lang.String fetchType)
                throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Begin the fetch process.

Parameters:
fetchType - is a short descriptive string describing the kind of fetch being requested. This is used solely for logging purposes.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

executeFetch

void executeFetch(java.lang.String urlPath,
                  java.lang.String userAgent,
                  java.lang.String from,
                  int connectionTimeoutMilliseconds,
                  int socketTimeoutMilliseconds,
                  boolean redirectOK,
                  java.lang.String host,
                  FormData formData,
                  LoginCookies loginCookies)
                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                         org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Execute the fetch and get the return code. This method uses the standard logging mechanism to keep track of the fetch attempt. It also signals the following conditions: ServiceInterruption (if a dynamic error occurs), or ManifoldCFException if a fatal error occurs, or nothing if a standard protocol error occurs. Note that, for proxies etc, the idea is for this fetch request to handle whatever redirections are needed to support proxies.

Parameters:
urlPath - is the path part of the url, e.g. "/robots.txt"
userAgent - is the value of the userAgent header to use.
from - is the value of the from header to use.
connectionTimeoutMilliseconds - is the maximum number of milliseconds to wait on socket connect.
redirectOK - should be set to true if you want redirects to be automatically followed.
host - is the value to use as the "Host" header, or null to use the default.
formData - describes additional form arguments and how to fetch the page.
loginCookies - describes the cookies that should be in effect for this page fetch.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseCode

int getResponseCode()
                    throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                           org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get the http response code.

Returns:
the response code. This is either an HTTP response code, or one of the codes above.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getLastFetchCookies

LoginCookies getLastFetchCookies()
                                 throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get the last fetch cookies.

Returns:
the cookies now in effect from the last fetch.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseHeader

java.lang.String getResponseHeader(java.lang.String headerName)
                                   throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                          org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get a specified response header, if it exists.

Parameters:
headerName - is the name of the header.
Returns:
the header value, or null if it doesn't exist.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseBodyStream

java.io.InputStream getResponseBodyStream()
                                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                 org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get the response input stream. It is the responsibility of the caller to close this stream when done.

Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

noteInterrupted

void noteInterrupted(java.lang.Throwable e)
Note that the connection fetch was interrupted by something.


doneFetch

void doneFetch(org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Done with the fetch. Call this when the fetch has been completed. A log entry will be generated describing what was done.

Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

close

void close()
           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Close the connection. Call this to end this server connection.

Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException