org.apache.manifoldcf.agents.output.gts
Class GTSConnector

java.lang.Object
  extended by org.apache.manifoldcf.core.connector.BaseConnector
      extended by org.apache.manifoldcf.agents.output.BaseOutputConnector
          extended by org.apache.manifoldcf.agents.output.gts.GTSConnector
All Implemented Interfaces:
org.apache.manifoldcf.agents.interfaces.IOutputConnector, org.apache.manifoldcf.core.interfaces.IConnector

public class GTSConnector
extends org.apache.manifoldcf.agents.output.BaseOutputConnector

This is the output connector for the MetaCarta appliance. It establishes a notion of collection(s) a document is ingested into, as well as the idea of a document template for the output.


Nested Class Summary
protected static class GTSConnector.ReaderListener
          Reader listener object that extracts the app name
 
Field Summary
static java.lang.String _rcsid
           
protected static int DT_COMPOUND_DOC
           
protected static int DT_MSEXCEL
           
protected static int DT_MSOUTLOOK
           
protected static int DT_MSPOWERPOINT
           
protected static int DT_MSWORD
           
protected static int DT_PDF
           
protected static int DT_TEXT
           
protected static int DT_UNKNOWN
           
protected static int DT_ZERO
           
static java.lang.String INGEST_ACTIVITY
          Ingestion activity
protected static java.lang.String[] ingestableMimeTypeArray
           
protected static java.util.Map ingestableMimeTypeMap
           
protected  HttpPoster poster
          Local data
static java.lang.String REMOVE_ACTIVITY
          Document removal activity
 
Fields inherited from class org.apache.manifoldcf.core.connector.BaseConnector
currentContext, params
 
Fields inherited from interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
DOCUMENTSTATUS_ACCEPTED, DOCUMENTSTATUS_REJECTED
 
Constructor Summary
GTSConnector()
          Constructor.
 
Method Summary
 int addOrReplaceDocument(java.lang.String documentURI, java.lang.String outputDescription, org.apache.manifoldcf.agents.interfaces.RepositoryDocument document, java.lang.String authorityNameString, org.apache.manifoldcf.agents.interfaces.IOutputAddActivity activities)
          Add (or replace) a document in the output data store using the connector.
 java.lang.String check()
          Test the connection.
 boolean checkDocumentIndexable(java.io.File localFile)
          Pre-determine whether a document (passed here as a File object) is indexable by this connector.
 boolean checkMimeTypeIndexable(java.lang.String mimeType)
          Detect if a mime type is indexable or not.
 void connect(org.apache.manifoldcf.core.interfaces.ConfigParams configParameters)
          Connect.
 void disconnect()
          Close the connection.
protected static int fingerprint(java.io.File file)
          Fingerprint a file! Pass in the name of the (local) temporary file that we should be looking at.
 java.lang.String[] getActivitiesList()
          Return the list of activities that this connector supports (i.e.
protected static java.lang.String getAppName(java.io.File documentPath)
          Get a binary document's APPNAME field, or return null if the document does not seem to be an OLE compound document.
 java.lang.String getJSPFolder()
          Return the path for the UI interface JSP elements.
 java.lang.String getOutputDescription(org.apache.manifoldcf.agents.interfaces.OutputSpecification spec)
          Get an output version string, given an output specification.
protected  void getSession()
          Set up a session
protected static java.lang.String hexprint(byte x)
           
protected static boolean isStrange(byte x)
          Check if character is not typical ASCII.
protected static boolean isText(byte[] beginChunk, int chunkLength)
          Test to see if a document is text or not.
protected static boolean isWhiteSpace(byte x)
          Check if a byte is a whitespace character.
protected static char nibbleprint(int x)
           
 void outputConfigurationBody(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.lang.String tabName)
          Output the configuration body section.
 void outputConfigurationHeader(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.util.ArrayList tabsArray)
          Output the configuration header section.
 void outputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, org.apache.manifoldcf.agents.interfaces.OutputSpecification os, java.lang.String tabName)
          Output the specification body section.
 void outputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, org.apache.manifoldcf.agents.interfaces.OutputSpecification os, java.util.ArrayList tabsArray)
          Output the specification header section.
protected static void pack(java.lang.StringBuffer output, java.lang.String value, char delimiter)
          Stuffer for packing a single string with an end delimiter
protected static void packFixedList(java.lang.StringBuffer output, java.lang.String[] values, char delimiter)
          Stuffer for packing lists of fixed length
protected static void packList(java.lang.StringBuffer output, java.util.ArrayList values, char delimiter)
          Stuffer for packing lists of variable length
protected static void packList(java.lang.StringBuffer output, java.lang.String[] values, char delimiter)
          Another stuffer for packing lists of variable length
 java.lang.String processConfigurationPost(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
          Process a configuration post.
 java.lang.String processSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, org.apache.manifoldcf.agents.interfaces.OutputSpecification os)
          Process a specification post.
protected static int recognizeApp(java.lang.String appName)
          Translate a string application name to one of the kinds of documents we care about.
 void removeDocument(java.lang.String documentURI, java.lang.String outputDescription, org.apache.manifoldcf.agents.interfaces.IOutputRemoveActivity activities)
          Remove a document using the connector.
protected static int unpack(java.lang.StringBuffer sb, java.lang.String value, int startPosition, char delimiter)
          Unstuffer for the above.
protected static int unpackFixedList(java.lang.String[] output, java.lang.String value, int startPosition, char delimiter)
          Unstuffer for unpacking lists of fixed length
protected static int unpackList(java.util.ArrayList output, java.lang.String value, int startPosition, char delimiter)
          Unstuffer for unpacking lists of variable length.
 void viewConfiguration(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
          View configuration.
 void viewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out, org.apache.manifoldcf.agents.interfaces.OutputSpecification os)
          View specification.
 
Methods inherited from class org.apache.manifoldcf.agents.output.BaseOutputConnector
noteJobComplete, requestInfo
 
Methods inherited from class org.apache.manifoldcf.core.connector.BaseConnector
clearThreadContext, deinstall, getConfiguration, install, poll, setThreadContext
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.manifoldcf.core.interfaces.IConnector
clearThreadContext, deinstall, getConfiguration, install, poll, setThreadContext
 

Field Detail

_rcsid

public static final java.lang.String _rcsid
See Also:
Constant Field Values

INGEST_ACTIVITY

public static final java.lang.String INGEST_ACTIVITY
Ingestion activity

See Also:
Constant Field Values

REMOVE_ACTIVITY

public static final java.lang.String REMOVE_ACTIVITY
Document removal activity

See Also:
Constant Field Values

DT_UNKNOWN

protected static final int DT_UNKNOWN
See Also:
Constant Field Values

DT_COMPOUND_DOC

protected static final int DT_COMPOUND_DOC
See Also:
Constant Field Values

DT_MSWORD

protected static final int DT_MSWORD
See Also:
Constant Field Values

DT_MSEXCEL

protected static final int DT_MSEXCEL
See Also:
Constant Field Values

DT_MSPOWERPOINT

protected static final int DT_MSPOWERPOINT
See Also:
Constant Field Values

DT_MSOUTLOOK

protected static final int DT_MSOUTLOOK
See Also:
Constant Field Values

DT_TEXT

protected static final int DT_TEXT
See Also:
Constant Field Values

DT_ZERO

protected static final int DT_ZERO
See Also:
Constant Field Values

DT_PDF

protected static final int DT_PDF
See Also:
Constant Field Values

poster

protected HttpPoster poster
Local data


ingestableMimeTypeArray

protected static final java.lang.String[] ingestableMimeTypeArray

ingestableMimeTypeMap

protected static final java.util.Map ingestableMimeTypeMap
Constructor Detail

GTSConnector

public GTSConnector()
Constructor.

Method Detail

getActivitiesList

public java.lang.String[] getActivitiesList()
Return the list of activities that this connector supports (i.e. writes into the log).

Specified by:
getActivitiesList in interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
Overrides:
getActivitiesList in class org.apache.manifoldcf.agents.output.BaseOutputConnector
Returns:
the list.

getJSPFolder

public java.lang.String getJSPFolder()
Return the path for the UI interface JSP elements. This method should return the name of the folder, under the /output/ area, where the appropriate JSP's can be found. The name should NOT have a slash in it.

Returns:
the folder part

connect

public void connect(org.apache.manifoldcf.core.interfaces.ConfigParams configParameters)
Connect.

Specified by:
connect in interface org.apache.manifoldcf.core.interfaces.IConnector
Overrides:
connect in class org.apache.manifoldcf.core.connector.BaseConnector
Parameters:
configParameters - is the set of configuration parameters, which in this case describe the target appliance, basic auth configuration, etc. (This formerly came out of the ini file.)

disconnect

public void disconnect()
                throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Close the connection. Call this before discarding the connection.

Specified by:
disconnect in interface org.apache.manifoldcf.core.interfaces.IConnector
Overrides:
disconnect in class org.apache.manifoldcf.core.connector.BaseConnector
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

getSession

protected void getSession()
                   throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Set up a session

Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

check

public java.lang.String check()
                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Test the connection. Returns a string describing the connection integrity.

Specified by:
check in interface org.apache.manifoldcf.core.interfaces.IConnector
Overrides:
check in class org.apache.manifoldcf.core.connector.BaseConnector
Returns:
the connection's status as a displayable string.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

checkMimeTypeIndexable

public boolean checkMimeTypeIndexable(java.lang.String mimeType)
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                      org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Detect if a mime type is indexable or not. This method is used by participating repository connectors to pre-filter the number of unusable documents that will be passed to this output connector.

Specified by:
checkMimeTypeIndexable in interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
Overrides:
checkMimeTypeIndexable in class org.apache.manifoldcf.agents.output.BaseOutputConnector
Parameters:
mimeType - is the mime type of the document.
Returns:
true if the mime type is indexable by this connector.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

checkDocumentIndexable

public boolean checkDocumentIndexable(java.io.File localFile)
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                      org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Pre-determine whether a document (passed here as a File object) is indexable by this connector. This method is used by participating repository connectors to help reduce the number of unmanageable documents that are passed to this output connector in advance of an actual transfer. This hook is provided mainly to support search engines that only handle a small set of accepted file types.

Specified by:
checkDocumentIndexable in interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
Overrides:
checkDocumentIndexable in class org.apache.manifoldcf.agents.output.BaseOutputConnector
Parameters:
localFile - is the local file to check.
Returns:
true if the file is indexable.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

getOutputDescription

public java.lang.String getOutputDescription(org.apache.manifoldcf.agents.interfaces.OutputSpecification spec)
                                      throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Get an output version string, given an output specification. The output version string is used to uniquely describe the pertinent details of the output specification and the configuration, to allow the Connector Framework to determine whether a document will need to be output again. Note that the contents of the document cannot be considered by this method, and that a different version string (defined in IRepositoryConnector) is used to describe the version of the actual document. This method presumes that the connector object has been configured, and it is thus able to communicate with the output data store should that be necessary.

Parameters:
spec - is the current output specification for the job that is doing the crawling.
Returns:
a string, of unlimited length, which uniquely describes output configuration and specification in such a way that if two such strings are equal, the document will not need to be sent again to the output data store.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

addOrReplaceDocument

public int addOrReplaceDocument(java.lang.String documentURI,
                                java.lang.String outputDescription,
                                org.apache.manifoldcf.agents.interfaces.RepositoryDocument document,
                                java.lang.String authorityNameString,
                                org.apache.manifoldcf.agents.interfaces.IOutputAddActivity activities)
                         throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Add (or replace) a document in the output data store using the connector. This method presumes that the connector object has been configured, and it is thus able to communicate with the output data store should that be necessary. The OutputSpecification is *not* provided to this method, because the goal is consistency, and if output is done it must be consistent with the output description, since that was what was partly used to determine if output should be taking place. So it may be necessary for this method to decode an output description string in order to determine what should be done.

Parameters:
documentURI - is the URI of the document. The URI is presumed to be the unique identifier which the output data store will use to process and serve the document. This URI is constructed by the repository connector which fetches the document, and is thus universal across all output connectors.
outputDescription - is the description string that was constructed for this document by the getOutputDescription() method.
document - is the document data to be processed (handed to the output data store).
authorityNameString - is the name of the authority responsible for authorizing any access tokens passed in with the repository document. May be null.
activities - is the handle to an object that the implementer of an output connector may use to perform operations, such as logging processing activity.
Returns:
the document status (accepted or permanently rejected).
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

removeDocument

public void removeDocument(java.lang.String documentURI,
                           java.lang.String outputDescription,
                           org.apache.manifoldcf.agents.interfaces.IOutputRemoveActivity activities)
                    throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                           org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Remove a document using the connector. Note that the last outputDescription is included, since it may be necessary for the connector to use such information to know how to properly remove the document.

Parameters:
documentURI - is the URI of the document. The URI is presumed to be the unique identifier which the output data store will use to process and serve the document. This URI is constructed by the repository connector which fetches the document, and is thus universal across all output connectors.
outputDescription - is the last description string that was constructed for this document by the getOutputDescription() method above.
activities - is the handle to an object that the implementer of an output connector may use to perform operations, such as logging processing activity.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
org.apache.manifoldcf.agents.interfaces.ServiceInterruption

outputConfigurationHeader

public void outputConfigurationHeader(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                      org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                      org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
                                      java.util.ArrayList tabsArray)
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                      java.io.IOException
Output the configuration header section. This method is called in the head section of the connector's configuration page. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the configuration editing HTML.

Specified by:
outputConfigurationHeader in interface org.apache.manifoldcf.core.interfaces.IConnector
Overrides:
outputConfigurationHeader in class org.apache.manifoldcf.core.connector.BaseConnector
Parameters:
threadContext - is the local thread context.
out - is the output to which any HTML should be sent.
parameters - are the configuration parameters, as they currently exist, for this connection being configured.
tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException

outputConfigurationBody

public void outputConfigurationBody(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                    org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                    org.apache.manifoldcf.core.interfaces.ConfigParams parameters,
                                    java.lang.String tabName)
                             throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                    java.io.IOException
Output the configuration body section. This method is called in the body section of the connector's configuration page. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate , , and
tags. The name of the form is "editconnection".

Specified by:
outputConfigurationBody in interface org.apache.manifoldcf.core.interfaces.IConnector
Overrides:
outputConfigurationBody in class org.apache.manifoldcf.core.connector.BaseConnector
Parameters:
threadContext - is the local thread context.
out - is the output to which any HTML should be sent.
parameters - are the configuration parameters, as they currently exist, for this connection being configured.
tabName - is the current tab name.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException

processConfigurationPost

public java.lang.String processConfigurationPost(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                                                 org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
                                                 org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
                                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Process a configuration post. This method is called at the start of the connector's configuration page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the configuration parameters accordingly. The name of the posted form is "editconnection".

Specified by:
processConfigurationPost in interface org.apache.manifoldcf.core.interfaces.IConnector
Overrides:
processConfigurationPost in class org.apache.manifoldcf.core.connector.BaseConnector
Parameters:
threadContext - is the local thread context.
variableContext - is the set of variables available from the post, including binary file post information.
parameters - are the configuration parameters, as they currently exist, for this connection being configured.
Returns:
null if all is well, or a string error message if there is an error that should prevent saving of the connection (and cause a redirection to an error page).
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

viewConfiguration

public void viewConfiguration(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext,
                              org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                              org.apache.manifoldcf.core.interfaces.ConfigParams parameters)
                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                              java.io.IOException
View configuration. This method is called in the body section of the connector's view configuration page. Its purpose is to present the connection information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate and tags.

Specified by:
viewConfiguration in interface org.apache.manifoldcf.core.interfaces.IConnector
Overrides:
viewConfiguration in class org.apache.manifoldcf.core.connector.BaseConnector
Parameters:
threadContext - is the local thread context.
out - is the output to which any HTML should be sent.
parameters - are the configuration parameters, as they currently exist, for this connection being configured.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException

outputSpecificationHeader

public void outputSpecificationHeader(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                      org.apache.manifoldcf.agents.interfaces.OutputSpecification os,
                                      java.util.ArrayList tabsArray)
                               throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                      java.io.IOException
Output the specification header section. This method is called in the head section of a job page which has selected an output connection of the current type. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the job editing HTML.

Specified by:
outputSpecificationHeader in interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
Overrides:
outputSpecificationHeader in class org.apache.manifoldcf.agents.output.BaseOutputConnector
Parameters:
out - is the output to which any HTML should be sent.
os - is the current output specification for this job.
tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException

outputSpecificationBody

public void outputSpecificationBody(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                                    org.apache.manifoldcf.agents.interfaces.OutputSpecification os,
                                    java.lang.String tabName)
                             throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                                    java.io.IOException
Output the specification body section. This method is called in the body section of a job page which has selected an output connection of the current type. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate , , and tags. The name of the form is "editjob".

Specified by:
outputSpecificationBody in interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
Overrides:
outputSpecificationBody in class org.apache.manifoldcf.agents.output.BaseOutputConnector
Parameters:
out - is the output to which any HTML should be sent.
os - is the current output specification for this job.
tabName - is the current tab name.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException

processSpecificationPost

public java.lang.String processSpecificationPost(org.apache.manifoldcf.core.interfaces.IPostParameters variableContext,
                                                 org.apache.manifoldcf.agents.interfaces.OutputSpecification os)
                                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Process a specification post. This method is called at the start of job's edit or view page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the output specification accordingly. The name of the posted form is "editjob".

Specified by:
processSpecificationPost in interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
Overrides:
processSpecificationPost in class org.apache.manifoldcf.agents.output.BaseOutputConnector
Parameters:
variableContext - contains the post data, including binary file-upload information.
os - is the current output specification for this job.
Returns:
null if all is well, or a string error message if there is an error that should prevent saving of the job (and cause a redirection to an error page).
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

viewSpecification

public void viewSpecification(org.apache.manifoldcf.core.interfaces.IHTTPOutput out,
                              org.apache.manifoldcf.agents.interfaces.OutputSpecification os)
                       throws org.apache.manifoldcf.core.interfaces.ManifoldCFException,
                              java.io.IOException
View specification. This method is called in the body section of a job's view page. Its purpose is to present the output specification information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate and tags.

Specified by:
viewSpecification in interface org.apache.manifoldcf.agents.interfaces.IOutputConnector
Overrides:
viewSpecification in class org.apache.manifoldcf.agents.output.BaseOutputConnector
Parameters:
out - is the output to which any HTML should be sent.
os - is the current output specification for this job.
Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
java.io.IOException

pack

protected static void pack(java.lang.StringBuffer output,
                           java.lang.String value,
                           char delimiter)
Stuffer for packing a single string with an end delimiter


unpack

protected static int unpack(java.lang.StringBuffer sb,
                            java.lang.String value,
                            int startPosition,
                            char delimiter)
Unstuffer for the above.


packFixedList

protected static void packFixedList(java.lang.StringBuffer output,
                                    java.lang.String[] values,
                                    char delimiter)
Stuffer for packing lists of fixed length


unpackFixedList

protected static int unpackFixedList(java.lang.String[] output,
                                     java.lang.String value,
                                     int startPosition,
                                     char delimiter)
Unstuffer for unpacking lists of fixed length


packList

protected static void packList(java.lang.StringBuffer output,
                               java.util.ArrayList values,
                               char delimiter)
Stuffer for packing lists of variable length


packList

protected static void packList(java.lang.StringBuffer output,
                               java.lang.String[] values,
                               char delimiter)
Another stuffer for packing lists of variable length


unpackList

protected static int unpackList(java.util.ArrayList output,
                                java.lang.String value,
                                int startPosition,
                                char delimiter)
Unstuffer for unpacking lists of variable length.

Parameters:
output - is the array into which to write the unpacked result.
value - is the value to unpack.
startPosition - is the place to start the unpack.
delimiter - is the character to use between values.
Returns:
the next position beyond the end of the list.

fingerprint

protected static int fingerprint(java.io.File file)
                          throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Fingerprint a file! Pass in the name of the (local) temporary file that we should be looking at. This method will read it as needed until the file has been identified (or found to remain "unknown"). The code here has been lifted algorithmically from products/ShareCrawler/Fingerprinter.pas.

Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

getAppName

protected static java.lang.String getAppName(java.io.File documentPath)
                                      throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Get a binary document's APPNAME field, or return null if the document does not seem to be an OLE compound document.

Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException

recognizeApp

protected static int recognizeApp(java.lang.String appName)
Translate a string application name to one of the kinds of documents we care about.


isText

protected static boolean isText(byte[] beginChunk,
                                int chunkLength)
Test to see if a document is text or not. The first n bytes are passed in, and this code returns "true" if it thinks they represent text. The code has been lifted algorithmically from products/Sharecrawler/Fingerprinter.pas, which was based on "perldoc -f -T".


isStrange

protected static boolean isStrange(byte x)
Check if character is not typical ASCII.


isWhiteSpace

protected static boolean isWhiteSpace(byte x)
Check if a byte is a whitespace character.


hexprint

protected static java.lang.String hexprint(byte x)

nibbleprint

protected static char nibbleprint(int x)