gate.creole.ml.maxent
Class MaxentWrapper

java.lang.Object
  extended bygate.creole.ml.maxent.MaxentWrapper
All Implemented Interfaces:
ActionsPublisher, MLEngine

public class MaxentWrapper
extends Object
implements MLEngine, ActionsPublisher

Wrapper class for the Maxent machine learning algorithm.

See Also:
Maxent homepage

Nested Class Summary
protected  class MaxentWrapper.LoadModelAction
          This reloads a file that was previously saved using the SaveModelAction class.
protected  class MaxentWrapper.SaveModelAction
          This allows the model, including its parameters to be saved to a file.
 
Field Summary
protected  List actionsList
           
protected  double confidenceThreshold
           
protected  int cutoff
          The following members are set by the part of the config file, and control the parameters used for training the model, and for classifying instances.
protected  boolean datasetChanged
          Marks whether the dataset was changed since the last time the classifier was built.
protected  DatasetDefintion datasetDefinition
           
(package private)  boolean DEBUG
           
protected  int iterations
           
protected  opennlp.maxent.MaxentModel maxentClassifier
          The Maxent classifier used by this wrapper
protected  org.jdom.Element optionsElement
          The JDom element contaning the options fro this wrapper.
protected  ProcessingResource owner
           
protected  StatusListener sListener
           
protected  boolean smoothing
           
protected  double smoothingObservation
           
protected  List trainingData
          This List stores all the data that has been collected.
protected  boolean verbose
           
 
Constructor Summary
MaxentWrapper()
          This constructor sets up action list so that these actions (loading and saving models and data) will be available from a context menu in the gui).
 
Method Summary
 void addTrainingInstance(List attributeValues)
          This is called to add a new training instance to the data set collected in this wrapper object.
 List batchClassifyInstances(List instances)
          Some wrappers allow batch classification, but this one doesn't, so if it's ever called just inform the user about this by throwing an exception.
private  void checkDatasetDefinition()
          Tests that the attributes specified in the DatasetDefinition are valid for maxent.
 Object classifyInstance(List attributeValues)
          Decide on the outcome for the instance, based on the values of all the maxent features.
 void cleanUp()
          No clean up is needed for this wrapper, so this is just added because its in the interface.
private  void extractAndCheckOptions()
          Extract the options from the stored Element, and verifiy that they are all valid.
 List getActions()
          Gets the list of actions that can be performed on this resource.
 DatasetDefintion getDatasetDefinition()
           
 void init()
          Initialises the classifier and prepares for running.
private  void initialiseAndTrainClassifier()
          This method first sets the static parameters of GIS to reflect those specified in the configuration file, then it trains the model using the data collected up to this point, and stores the model in maxentClassifier.
 void load(InputStream is)
          Loads the state of this engine from previously saved data.
(package private)  void markIndicesOnFeatures(List attributeValues)
          Annotate the features (but not the outcome), by prepending the index of their location in the list of attributes, followed by a colon.
 void save(OutputStream os)
          Saves the state of the engine for reuse at a later time.
private  void setConfidenceThreshold(org.jdom.Element optionsElem)
          See if a cutoff is specified in the congif file.
private  void setCutoff(org.jdom.Element optionsElem)
          See if a cutoff is specified in the congif file.
 void setDatasetDefinition(DatasetDefintion definition)
          Set the data set defition for this classifier.
private  void setIterations(org.jdom.Element optionsElem)
          See if a value for how many iterations should be performed during training is specified in the congif file.
 void setOptions(org.jdom.Element optionsElem)
          Take a representation of the part of the XML configuration file which corresponds to , and store it.
 void setOwnerPR(ProcessingResource pr)
          Registers the PR using the engine with the engine itself.
private  void setSmoothing(org.jdom.Element optionsElem)
          Set the smoothing field appropriately, depending on whether is specified in the configuration file.
private  void setSmoothingObservation(org.jdom.Element optionsElem)
          Set the smoothing observation field appropriately, depending on what value is specified for in the configuration file.
private  void setVerbose(org.jdom.Element optionsElem)
          Set the verbose field appropriately, depending on whether is specified in the configuration file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEBUG

boolean DEBUG

datasetDefinition

protected DatasetDefintion datasetDefinition

maxentClassifier

protected opennlp.maxent.MaxentModel maxentClassifier
The Maxent classifier used by this wrapper


trainingData

protected List trainingData
This List stores all the data that has been collected. Each item is a List of Strings, each of which is an attribute. In maxent terms, these are the features and the outcome - the position of the outcome can be found by referring to the the datasetDefition object.


optionsElement

protected org.jdom.Element optionsElement
The JDom element contaning the options fro this wrapper.


datasetChanged

protected boolean datasetChanged
Marks whether the dataset was changed since the last time the classifier was built.


actionsList

protected List actionsList

owner

protected ProcessingResource owner

sListener

protected StatusListener sListener

cutoff

protected int cutoff
The following members are set by the part of the config file, and control the parameters used for training the model, and for classifying instances. They are initialised with their default values, but may be changed when setOptions is called.


confidenceThreshold

protected double confidenceThreshold

iterations

protected int iterations

verbose

protected boolean verbose

smoothing

protected boolean smoothing

smoothingObservation

protected double smoothingObservation
Constructor Detail

MaxentWrapper

public MaxentWrapper()
This constructor sets up action list so that these actions (loading and saving models and data) will be available from a context menu in the gui). There is no option to load or save data sets, as maxent does not support this. If there is a need to save data sets, then this can be done using weka.wrapper instead.

Method Detail

cleanUp

public void cleanUp()
No clean up is needed for this wrapper, so this is just added because its in the interface.

Specified by:
cleanUp in interface MLEngine

batchClassifyInstances

public List batchClassifyInstances(List instances)
                            throws ExecutionException
Some wrappers allow batch classification, but this one doesn't, so if it's ever called just inform the user about this by throwing an exception.

Specified by:
batchClassifyInstances in interface MLEngine
Parameters:
instances - This parameter is not used.
Returns:
Nothing is ever returned - an exception is always thrown.
Throws:
ExecutionException

setOptions

public void setOptions(org.jdom.Element optionsElem)
Take a representation of the part of the XML configuration file which corresponds to , and store it.

Specified by:
setOptions in interface MLEngine
Parameters:
optionsElem - the JDom element containing the options from the configuration.
Throws:
GateException

extractAndCheckOptions

private void extractAndCheckOptions()
                             throws ResourceInstantiationException
Extract the options from the stored Element, and verifiy that they are all valid. Store them in the class's fields.

Throws:
ResourceInstansitaionException
ResourceInstantiationException

setVerbose

private void setVerbose(org.jdom.Element optionsElem)
Set the verbose field appropriately, depending on whether is specified in the configuration file.


setSmoothing

private void setSmoothing(org.jdom.Element optionsElem)
Set the smoothing field appropriately, depending on whether is specified in the configuration file.


setSmoothingObservation

private void setSmoothingObservation(org.jdom.Element optionsElem)
                              throws ResourceInstantiationException
Set the smoothing observation field appropriately, depending on what value is specified for in the configuration file.

Throws:
ResourceInstantiationException

setConfidenceThreshold

private void setConfidenceThreshold(org.jdom.Element optionsElem)
                             throws ResourceInstantiationException
See if a cutoff is specified in the congif file. If it is set the cutoff field, otherwise set cutoff to its default value.

Throws:
ResourceInstantiationException

setCutoff

private void setCutoff(org.jdom.Element optionsElem)
                throws ResourceInstantiationException
See if a cutoff is specified in the congif file. If it is set the cutoff field, otherwise set cutoff to its default value.

Throws:
ResourceInstantiationException

setIterations

private void setIterations(org.jdom.Element optionsElem)
                    throws ResourceInstantiationException
See if a value for how many iterations should be performed during training is specified in the congif file. If it is set the iterations field, otherwise set it to its default value, 10.

Throws:
ResourceInstantiationException

addTrainingInstance

public void addTrainingInstance(List attributeValues)
This is called to add a new training instance to the data set collected in this wrapper object.

Specified by:
addTrainingInstance in interface MLEngine
Parameters:
attributeValues - A list of String objects, each of which corresponds to an attribute value. For boolean attributes the values will be true or false.

markIndicesOnFeatures

void markIndicesOnFeatures(List attributeValues)
Annotate the features (but not the outcome), by prepending the index of their location in the list of attributes, followed by a colon. This is because all features are true or false, but it is important that maxent does not confuse a true in one position with a true in another when, for example, calculating the cutoff.

Parameters:
attributeValues - a list of String objects listing all the feature values and the outcome value for an instance.

setDatasetDefinition

public void setDatasetDefinition(DatasetDefintion definition)
Set the data set defition for this classifier.

Specified by:
setDatasetDefinition in interface MLEngine
Parameters:
definition - A specification of the types and allowable values of all the attributes, as specified in the part of the configuration file.

checkDatasetDefinition

private void checkDatasetDefinition()
                             throws ResourceInstantiationException
Tests that the attributes specified in the DatasetDefinition are valid for maxent. That is that all the attributes except for the class attribute are boolean, and that class is boolean or nominal, as that is a requirement of the maxent implementation used.

Throws:
ResourceInstantiationException

initialiseAndTrainClassifier

private void initialiseAndTrainClassifier()
This method first sets the static parameters of GIS to reflect those specified in the configuration file, then it trains the model using the data collected up to this point, and stores the model in maxentClassifier.


classifyInstance

public Object classifyInstance(List attributeValues)
                        throws ExecutionException
Decide on the outcome for the instance, based on the values of all the maxent features. N.B. Unless this function was previously called, and there has been no new data added since, the model will be trained when it is called. This could result in calls to this function taking a long time to execute.

Specified by:
classifyInstance in interface MLEngine
Parameters:
attributeValues - A list of all the attributes, including the one that corresponds to the maxent outcome (the attribute). The value of outcome is arbitrary.
Returns:
A string value giving the nominal value of the outcome or, if the outcome is boolean, a java String with value "true" or "false"
Throws:
ExecutionException

init

public void init()
          throws GateException
Initialises the classifier and prepares for running. Before calling this method, the datasetDefinition and optionsElement fields should have been set using calls to the appropriate methods.

Specified by:
init in interface MLEngine
Throws:
GateException - If it is not possible to initialise the classifier for any reason.

load

public void load(InputStream is)
          throws IOException
Loads the state of this engine from previously saved data.

Parameters:
is - An open InputStream from which the model will be loaded.
Throws:
IOException

save

public void save(OutputStream os)
          throws IOException
Saves the state of the engine for reuse at a later time.

Parameters:
os - An open output stream to which the model will be saved.
Throws:
IOException

getActions

public List getActions()
Gets the list of actions that can be performed on this resource.

Specified by:
getActions in interface ActionsPublisher
Returns:
a List of Action objects (or null values)

setOwnerPR

public void setOwnerPR(ProcessingResource pr)
Registers the PR using the engine with the engine itself.

Specified by:
setOwnerPR in interface MLEngine
Parameters:
pr - the processing resource that owns this engine.

getDatasetDefinition

public DatasetDefintion getDatasetDefinition()