|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.Objectorg.apache.manifoldcf.crawler.connectors.rss.ThrottledFetcher.Server
protected class ThrottledFetcher.Server
This class represents the throttling stuff kept around for a single server. In order to calculate the effective "burst" fetches per second and bytes per second, we need to have some idea what the window is. For example, a long hiatus from fetching could cause overuse of the server when fetching resumes, if the window length is too long. One solution to this problem would be to keep a list of the individual fetches as records. Then, we could "expire" a fetch by discarding the old record. However, this is quite memory consumptive for all but the smallest intervals. Another, better, solution is to hook into the start and end of individual fetches. These will, presumably, occur at the fastest possible rate without long pauses spent doing something else. The only complication is that fetches may well overlap, so we need to "reference count" the fetches to know when to reset the counters. For "fetches per second", we can simply make sure we "schedule" the next fetch at an appropriate time, rather than keep records around. The overall rate may therefore be somewhat less than the specified rate, but that's perfectly acceptable. For the "maximum open connections" limit, the best thing would be to establish a separate MultiThreadedConnectionPool for each Server. Then, the limit would be automatic. Some notes on the algorithms used to limit server bandwidth impact ================================================================== In a single connection case, the algorithm we'd want to use works like this. On the first chunk of a series, the total length of time and the number of bytes are recorded. Then, prior to each subsequent chunk, a calculation is done which attempts to hit the bandwidth target by the end of the chunk read, using the rate of the first chunk access as a way of estimating how long it will take to fetch those next n bytes. For a multi-connection case, which this is, it's harder to either come up with a good maximum bandwidth estimate, and harder still to "hit the target", because simultaneous fetches will intrude. The strategy is therefore: 1) The first chunk of any series should proceed without interference from other connections to the same server. The goal here is to get a decent quality estimate without any possibility of overwhelming the server. 2) The bandwidth of the first chunk is treated as the "maximum bandwidth per connection". That is, if other connections are going on, we can presume that each connection will use at most the bandwidth that the first fetch took. Thus, by generating end-time estimates based on this number, we are actually being conservative and using less server bandwidth. 3) For chunks that have started but not finished, we keep track of their size and estimated elapsed time in order to schedule when new chunks from other connections can start.
| Field Summary | |
|---|---|
protected boolean |
estimateInProgress
Flag indicating whether rate estimation is in progress yet |
protected boolean |
estimateValid
Flag indicating whether a rate estimate is needed |
protected java.lang.Integer |
firstChunkLock
This object is used to gate access while the first chunk is being read |
protected long |
nextFetchTime
This is the time of the next allowed fetch (in ms since epoch) |
protected int |
outstandingConnections
Outstanding connection counter |
protected double |
rateEstimate
The inverse rate estimate of the first fetch, in ms/byte |
protected int |
refCount
Reference count for bandwidth variables |
protected long |
seriesStartTime
The start time of this series |
protected java.lang.String |
serverName
The fqdn of the server |
protected long |
totalBytesRead
Total actual bytes read in this series; this includes fetches in progress |
| Constructor Summary | |
|---|---|
ThrottledFetcher.Server(java.lang.String serverName)
Constructor |
|
| Method Summary | |
|---|---|
void |
beginFetch(long minimumMillisecondsPerFetchPerServer)
Note the start of a fetch operation. |
void |
beginRead(int byteCount,
double minimumMillisecondsPerBytePerServer)
Note the start of an individual byte read of a specified size. |
void |
discard()
Discard this server. |
void |
endFetch()
Note the end of a fetch operation. |
void |
endRead(int originalCount,
int actualCount)
Note the end of an individual read from the server. |
java.lang.String |
getServerName()
Get the fqdn of the server |
void |
registerConnection(int maxOutstandingConnections)
Register an outstanding connection (and wait until it can be obtained before proceeding) |
void |
releaseConnection()
Release an outstanding connection back into the pool |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected java.lang.String serverName
protected long nextFetchTime
protected int refCount
protected double rateEstimate
protected boolean estimateValid
protected boolean estimateInProgress
protected long seriesStartTime
protected long totalBytesRead
protected java.lang.Integer firstChunkLock
protected int outstandingConnections
| Constructor Detail |
|---|
public ThrottledFetcher.Server(java.lang.String serverName)
| Method Detail |
|---|
public java.lang.String getServerName()
public void registerConnection(int maxOutstandingConnections)
throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionpublic void releaseConnection()
public void beginFetch(long minimumMillisecondsPerFetchPerServer)
throws java.lang.InterruptedException
java.lang.InterruptedExceptionpublic void endFetch()
public void beginRead(int byteCount,
double minimumMillisecondsPerBytePerServer)
throws java.lang.InterruptedException
java.lang.InterruptedException
public void endRead(int originalCount,
int actualCount)
public void discard()
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||