org.apache.manifoldcf.crawler.connectors.rss
Class ThrottledFetcher.ThrottledConnection

java.lang.Object
  extended by org.apache.manifoldcf.crawler.connectors.rss.ThrottledFetcher.ThrottledConnection
All Implemented Interfaces:
IThrottledConnection
Enclosing class:
ThrottledFetcher

protected static class ThrottledFetcher.ThrottledConnection
extends java.lang.Object
implements IThrottledConnection

This class represents an established connection to a URL.


Nested Class Summary
protected static class ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
           
 
Field Summary
protected  org.apache.commons.httpclient.MultiThreadedHttpConnectionManager connectionManager
          The connection pool (max size 1)
protected  int connectionTimeoutMilliseconds
          Connection timeout in milliseconds
protected  ThrottledFetcher.DataSession dataSession
          Hack added to record all access data from current crawler
protected  long fetchCounter
          The current bytes in the current fetch
protected  org.apache.commons.httpclient.HttpMethodBase fetchMethod
          The method object
protected  java.lang.String fetchType
          The kind of fetch we are doing
protected  int maxOpenConnectionsPerServer
          The maximum open connections per server
protected  double minimumMillisecondsPerBytePerServer
          The connection bandwidth we want
protected  long minimumMillisecondsPerFetchPerServer
          The minimum time between fetches
protected  java.lang.String myUrl
          The current URL being fetched
protected  ThrottledFetcher.Server server
          The server object we use to track connections and fetches.
protected  long startFetchTime
          The start-fetch time
protected  int statusCode
          The status code fetched, if any
protected  java.lang.Throwable throwable
          The error trace, if any
 
Fields inherited from interface org.apache.manifoldcf.crawler.connectors.rss.IThrottledConnection
_rcsid, FETCH_BAD_URI, FETCH_CIRCULAR_REDIRECT, FETCH_IO_ERROR, FETCH_NOT_TRIED, FETCH_SEQUENCE_ERROR, FETCH_UNKNOWN_ERROR, STATUS_NOCHANGE, STATUS_OK, STATUS_PAGEERROR, STATUS_SITEERROR
 
Constructor Summary
ThrottledFetcher.ThrottledConnection(ThrottledFetcher.Server server, double minimumMillisecondsPerBytePerServer, int maxOpenConnectionsPerServer, long minimumMillisecondsPerFetchPerServer, int connectionTimeoutMilliseconds, int connectionLimit)
          Constructor.
 
Method Summary
 void beginFetch(java.lang.String fetchType)
          Begin the fetch process.
 void close()
          Close the connection.
 void doneFetch(org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
          Done with the fetch.
 int executeFetch(java.lang.String protocol, int port, java.lang.String urlPath, java.lang.String userAgent, java.lang.String from, java.lang.String proxyHost, int proxyPort, java.lang.String proxyAuthDomain, java.lang.String proxyAuthUsername, java.lang.String proxyAuthPassword, java.lang.String lastETag, java.lang.String lastModified)
          Execute the fetch and get the return code.
 java.io.InputStream getResponseBodyStream()
          Get the response input stream.
 int getResponseCode()
          Get the http response code.
 java.lang.String getResponseHeader(java.lang.String headerName)
          Get a specified response header, if it exists.
 void logFetchCount(int count)
          Log the fetch of a number of bytes.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

minimumMillisecondsPerBytePerServer

protected double minimumMillisecondsPerBytePerServer
The connection bandwidth we want


maxOpenConnectionsPerServer

protected int maxOpenConnectionsPerServer
The maximum open connections per server


minimumMillisecondsPerFetchPerServer

protected long minimumMillisecondsPerFetchPerServer
The minimum time between fetches


server

protected ThrottledFetcher.Server server
The server object we use to track connections and fetches.


fetchMethod

protected org.apache.commons.httpclient.HttpMethodBase fetchMethod
The method object


startFetchTime

protected long startFetchTime
The start-fetch time


throwable

protected java.lang.Throwable throwable
The error trace, if any


myUrl

protected java.lang.String myUrl
The current URL being fetched


statusCode

protected int statusCode
The status code fetched, if any


fetchType

protected java.lang.String fetchType
The kind of fetch we are doing


fetchCounter

protected long fetchCounter
The current bytes in the current fetch


connectionManager

protected org.apache.commons.httpclient.MultiThreadedHttpConnectionManager connectionManager
The connection pool (max size 1)


connectionTimeoutMilliseconds

protected int connectionTimeoutMilliseconds
Connection timeout in milliseconds


dataSession

protected ThrottledFetcher.DataSession dataSession
Hack added to record all access data from current crawler

Constructor Detail

ThrottledFetcher.ThrottledConnection

public ThrottledFetcher.ThrottledConnection(ThrottledFetcher.Server server,
                                            double minimumMillisecondsPerBytePerServer,
                                            int maxOpenConnectionsPerServer,
                                            long minimumMillisecondsPerFetchPerServer,
                                            int connectionTimeoutMilliseconds,
                                            int connectionLimit)
                                     throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Constructor.

Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
Method Detail

beginFetch

public void beginFetch(java.lang.String fetchType)
                throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Begin the fetch process.

Specified by:
beginFetch in interface IThrottledConnection
Parameters:
fetchType - is a short descriptive string describing the kind of fetch being requested. This is used solely for logging purposes.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

logFetchCount

public void logFetchCount(int count)
Log the fetch of a number of bytes.


executeFetch

public int executeFetch(java.lang.String protocol,
                        int port,
                        java.lang.String urlPath,
                        java.lang.String userAgent,
                        java.lang.String from,
                        java.lang.String proxyHost,
                        int proxyPort,
                        java.lang.String proxyAuthDomain,
                        java.lang.String proxyAuthUsername,
                        java.lang.String proxyAuthPassword,
                        java.lang.String lastETag,
                        java.lang.String lastModified)
                 throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Execute the fetch and get the return code. This method uses the standard logging mechanism to keep track of the fetch attempt. It also signals the following three conditions: ServiceInterruption (if a dynamic error occurs), OK, or a static error code (for a condition where retry is not likely to be helpful). The actual HTTP error code is NOT returned by this method.

Specified by:
executeFetch in interface IThrottledConnection
Parameters:
protocol - is the protocol to use to perform the access, e.g. "http"
port - is the port to use to perform the access, where -1 means "use the default"
urlPath - is the path part of the url, e.g. "/robots.txt"
userAgent - is the value of the userAgent header to use.
from - is the value of the from header to use.
proxyHost - is the proxy host, or null if none.
proxyPort - is the proxy port, or -1 if none.
proxyAuthDomain - is the proxy authentication domain, or null.
proxyAuthUsername - is the proxy authentication user name, or null.
proxyAuthPassword - is the proxy authentication password, or null.
lastETag - is the requested lastETag header value.
lastModified - is the requested lastModified header value.
Returns:
the status code: success, static error, or dynamic error.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseCode

public int getResponseCode()
                    throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                           org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get the http response code.

Specified by:
getResponseCode in interface IThrottledConnection
Returns:
the response code. This is either an HTTP response code, or one of the codes above.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseBodyStream

public java.io.InputStream getResponseBodyStream()
                                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                 org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get the response input stream. It is the responsibility of the caller to close this stream when done.

Specified by:
getResponseBodyStream in interface IThrottledConnection
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseHeader

public java.lang.String getResponseHeader(java.lang.String headerName)
                                   throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                          org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get a specified response header, if it exists.

Specified by:
getResponseHeader in interface IThrottledConnection
Parameters:
headerName - is the name of the header.
Returns:
the header value, or null if it doesn't exist.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

doneFetch

public void doneFetch(org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Done with the fetch. Call this when the fetch has been completed. A log entry will be generated describing what was done.

Specified by:
doneFetch in interface IThrottledConnection
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

close

public void close()
           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Close the connection. Call this to end this server connection.

Specified by:
close in interface IThrottledConnection
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException