org.apache.manifoldcf.crawler.connectors.webcrawler
Class ThrottledFetcher.ThrottledConnection

java.lang.Object
  extended by org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher.ThrottledConnection
All Implemented Interfaces:
IThrottledConnection
Enclosing class:
ThrottledFetcher

protected static class ThrottledFetcher.ThrottledConnection
extends java.lang.Object
implements IThrottledConnection

Throttled connections. Each instance of a connection describes the bins to which it belongs, along with the actual open connection itself, and the last time the connection was used.


Nested Class Summary
protected static class ThrottledFetcher.ThrottledConnection.ExecuteMethodThread
           
 
Field Summary
protected  PageCredentials authentication
          Authentication
protected  ThrottledFetcher.ConnectionBin[] connectionBinArray
          The connection has resolved pointers to the ConnectionBin structures that manage pool maximums.
protected  org.apache.commons.httpclient.MultiThreadedHttpConnectionManager connManager
          The http connection manager.
protected  ThrottledFetcher.DataSession dataSession
          Hack added to record all access data from current crawler
protected  long fetchCounter
          The current bytes in the current fetch
protected  org.apache.commons.httpclient.HttpMethodBase fetchMethod
          The method object
protected  java.lang.String fetchType
          The kind of fetch we are doing
protected  long inactiveTime
          If not active, this is when it went inactive
protected  boolean isActive
          Is the connection considered "active"?
protected  LoginCookies lastFetchCookies
          The cookies from the last fetch
protected  double[] minMillisecondsPerByte
          These are the bandwidth limits, per bin
protected  org.apache.commons.httpclient.protocol.ProtocolFactory myFactory
           
protected  java.lang.String myUrl
          The current URL being fetched
protected  int port
          Port
protected  java.lang.String protocol
          Protocol
protected  org.apache.commons.httpclient.protocol.ProtocolSocketFactory secureSocketFactory
          Protocol socket factory
protected  java.lang.String server
          Server
protected  long startFetchTime
          The start of the current fetch
protected  int statusCode
          The status code fetched, if any
protected  ThrottledFetcher.ThrottleBin[] throttleBinArray
          The connection has resolved pointers to the ThrottleBin structures that help manage bandwidth throttling.
protected  java.lang.Throwable throwable
          The error trace, if any
protected  org.apache.manifoldcf.core.interfaces.IKeystoreManager trustStore
          Trust store
protected  java.lang.String trustStoreString
          Trust store string
 
Fields inherited from interface org.apache.manifoldcf.crawler.connectors.webcrawler.IThrottledConnection
_rcsid, FETCH_BAD_URI, FETCH_CIRCULAR_REDIRECT, FETCH_INTERRUPTED, FETCH_IO_ERROR, FETCH_NOT_TRIED, FETCH_SEQUENCE_ERROR, FETCH_UNKNOWN_ERROR
 
Constructor Summary
ThrottledFetcher.ThrottledConnection(java.lang.String protocol, java.lang.String server, int port, PageCredentials authentication, org.apache.commons.httpclient.protocol.ProtocolFactory myFactory, java.lang.String trustStoreString, ThrottledFetcher.ConnectionBin[] connectionBins)
          Constructor.
 
Method Summary
 void activate()
          Activate the connection.
 void beginFetch(java.lang.String fetchType)
          Begin the fetch process.
 void beginRead(int len)
          Begin a read operation, from within a stream
 void close()
          Close the connection.
protected  void destroy()
          Destroy the connection forever
 void doneFetch(org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
          Done with the fetch.
 void endRead(int origLen, int actualAmt)
          End a read operation, from within a stream
 void executeFetch(java.lang.String urlPath, java.lang.String userAgent, java.lang.String from, int connectionTimeoutMilliseconds, int socketTimeoutMilliseconds, boolean redirectOK, java.lang.String host, FormData formData, LoginCookies loginCookies)
          Execute the fetch and get the return code.
 boolean flushIdleConnections(long idleTimeout)
          Do periodic bookkeeping.
 LoginCookies getLastFetchCookies()
          Get the last fetch cookies.
 java.io.InputStream getResponseBodyStream()
          Get the response input stream.
 int getResponseCode()
          Get the http response code.
 java.lang.String getResponseHeader(java.lang.String headerName)
          Get a specified response header, if it exists.
 void logFetchCount(int count)
          Log the fetch of a number of bytes, from within a stream.
 boolean matches(ThrottledFetcher.ConnectionBin[] bins, java.lang.String protocol, java.lang.String server, int port, PageCredentials authentication, java.lang.String trustStoreString)
          See if this instances matches a given server and port.
 void mustHaveReference(ThrottledFetcher.ConnectionBin cb)
           
 void noteInterrupted(java.lang.Throwable e)
          Note that the connection fetch was interrupted by something.
 void setup(ThrottleDescription description)
          Set up the connection.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

connectionBinArray

protected ThrottledFetcher.ConnectionBin[] connectionBinArray
The connection has resolved pointers to the ConnectionBin structures that manage pool maximums. These are ONLY valid when the connection is actually in the pool.


throttleBinArray

protected ThrottledFetcher.ThrottleBin[] throttleBinArray
The connection has resolved pointers to the ThrottleBin structures that help manage bandwidth throttling.


minMillisecondsPerByte

protected double[] minMillisecondsPerByte
These are the bandwidth limits, per bin


isActive

protected boolean isActive
Is the connection considered "active"?


inactiveTime

protected long inactiveTime
If not active, this is when it went inactive


protocol

protected java.lang.String protocol
Protocol


server

protected java.lang.String server
Server


port

protected int port
Port


authentication

protected PageCredentials authentication
Authentication


trustStore

protected org.apache.manifoldcf.core.interfaces.IKeystoreManager trustStore
Trust store


trustStoreString

protected java.lang.String trustStoreString
Trust store string


connManager

protected org.apache.commons.httpclient.MultiThreadedHttpConnectionManager connManager
The http connection manager. The pool is of size 1.


fetchMethod

protected org.apache.commons.httpclient.HttpMethodBase fetchMethod
The method object


throwable

protected java.lang.Throwable throwable
The error trace, if any


myUrl

protected java.lang.String myUrl
The current URL being fetched


statusCode

protected int statusCode
The status code fetched, if any


fetchType

protected java.lang.String fetchType
The kind of fetch we are doing


fetchCounter

protected long fetchCounter
The current bytes in the current fetch


startFetchTime

protected long startFetchTime
The start of the current fetch


lastFetchCookies

protected LoginCookies lastFetchCookies
The cookies from the last fetch


secureSocketFactory

protected org.apache.commons.httpclient.protocol.ProtocolSocketFactory secureSocketFactory
Protocol socket factory


myFactory

protected org.apache.commons.httpclient.protocol.ProtocolFactory myFactory

dataSession

protected ThrottledFetcher.DataSession dataSession
Hack added to record all access data from current crawler

Constructor Detail

ThrottledFetcher.ThrottledConnection

public ThrottledFetcher.ThrottledConnection(java.lang.String protocol,
                                            java.lang.String server,
                                            int port,
                                            PageCredentials authentication,
                                            org.apache.commons.httpclient.protocol.ProtocolFactory myFactory,
                                            java.lang.String trustStoreString,
                                            ThrottledFetcher.ConnectionBin[] connectionBins)
Constructor. Create a connection with a specific server and port, and register it as active against all bins.

Method Detail

mustHaveReference

public void mustHaveReference(ThrottledFetcher.ConnectionBin cb)

matches

public boolean matches(ThrottledFetcher.ConnectionBin[] bins,
                       java.lang.String protocol,
                       java.lang.String server,
                       int port,
                       PageCredentials authentication,
                       java.lang.String trustStoreString)
See if this instances matches a given server and port.


activate

public void activate()
Activate the connection.


setup

public void setup(ThrottleDescription description)
Set up the connection. This allows us to feed all bins the correct bandwidth limit info.


flushIdleConnections

public boolean flushIdleConnections(long idleTimeout)
Do periodic bookkeeping.

Returns:
true if the connection is no longer valid, and can be removed.

logFetchCount

public void logFetchCount(int count)
Log the fetch of a number of bytes, from within a stream.


beginRead

public void beginRead(int len)
               throws java.lang.InterruptedException
Begin a read operation, from within a stream

Throws:
java.lang.InterruptedException

endRead

public void endRead(int origLen,
                    int actualAmt)
End a read operation, from within a stream


destroy

protected void destroy()
Destroy the connection forever


beginFetch

public void beginFetch(java.lang.String fetchType)
                throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Begin the fetch process.

Specified by:
beginFetch in interface IThrottledConnection
Parameters:
fetchType - is a short descriptive string describing the kind of fetch being requested. This is used solely for logging purposes.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

executeFetch

public void executeFetch(java.lang.String urlPath,
                         java.lang.String userAgent,
                         java.lang.String from,
                         int connectionTimeoutMilliseconds,
                         int socketTimeoutMilliseconds,
                         boolean redirectOK,
                         java.lang.String host,
                         FormData formData,
                         LoginCookies loginCookies)
                  throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                         org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Execute the fetch and get the return code. This method uses the standard logging mechanism to keep track of the fetch attempt. It also signals the following conditions: ServiceInterruption (if a dynamic error occurs), or ManifoldCFException if a fatal error occurs, or nothing if a standard protocol error occurs. Note that, for proxies etc, the idea is for this fetch request to handle whatever redirections are needed to support proxies.

Specified by:
executeFetch in interface IThrottledConnection
Parameters:
urlPath - is the path part of the url, e.g. "/robots.txt"
userAgent - is the value of the userAgent header to use.
from - is the value of the from header to use.
connectionTimeoutMilliseconds - is the maximum number of milliseconds to wait on socket connect.
redirectOK - should be set to true if you want redirects to be automatically followed.
host - is the value to use as the "Host" header, or null to use the default.
formData - describes additional form arguments and how to fetch the page.
loginCookies - describes the cookies that should be in effect for this page fetch.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseCode

public int getResponseCode()
                    throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                           org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get the http response code.

Specified by:
getResponseCode in interface IThrottledConnection
Returns:
the response code. This is either an HTTP response code, or one of the codes above.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getLastFetchCookies

public LoginCookies getLastFetchCookies()
                                 throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                        org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get the last fetch cookies.

Specified by:
getLastFetchCookies in interface IThrottledConnection
Returns:
the cookies now in effect from the last fetch.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseHeader

public java.lang.String getResponseHeader(java.lang.String headerName)
                                   throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                          org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get a specified response header, if it exists.

Specified by:
getResponseHeader in interface IThrottledConnection
Parameters:
headerName - is the name of the header.
Returns:
the header value, or null if it doesn't exist.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getResponseBodyStream

public java.io.InputStream getResponseBodyStream()
                                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                                 org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get the response input stream. It is the responsibility of the caller to close this stream when done.

Specified by:
getResponseBodyStream in interface IThrottledConnection
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

noteInterrupted

public void noteInterrupted(java.lang.Throwable e)
Note that the connection fetch was interrupted by something.

Specified by:
noteInterrupted in interface IThrottledConnection

doneFetch

public void doneFetch(org.apache.manifoldcf.crawler.interfaces.IVersionActivity activities)
               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Done with the fetch. Call this when the fetch has been completed. A log entry will be generated describing what was done.

Specified by:
doneFetch in interface IThrottledConnection
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

close

public void close()
           throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Close the connection. Call this to end this server connection.

Specified by:
close in interface IThrottledConnection
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException