gate.creole.tokeniser.chinesetokeniser
Class Segmenter
java.lang.Object
gate.creole.tokeniser.chinesetokeniser.Segmenter
- public class Segmenter
- extends Object
Title: Segmenter.java
Description: This class segments the Chinese Text by adding extra spaces
Company: University Of Sheffield
- Author:
- Erik E. Peterson - modified by Niraj Aswani
- See Also:
- source
Constructor Summary |
Segmenter(int charform,
boolean loadwordfile)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
zhwords
private TreeMap zhwords
csurname
private TreeSet csurname
cforeign
private TreeSet cforeign
cnumbers
private TreeSet cnumbers
cnotname
private TreeSet cnotname
debug
private boolean debug
TRAD
public static final int TRAD
- See Also:
- Constant Field Values
SIMP
public static final int SIMP
- See Also:
- Constant Field Values
BOTH
public static final int BOTH
- See Also:
- Constant Field Values
marks
private ArrayList marks
Segmenter
public Segmenter(int charform,
boolean loadwordfile)
loadset
private void loadset(TreeSet targetset,
String sourcefile)
- Load a set of character data
isNumber
public boolean isNumber(String testword)
isAllForeign
public boolean isAllForeign(String testword)
isNotCJK
public boolean isNotCJK(String testword)
stemWord
public String stemWord(String word)
segmentLine
public String segmentLine(String cline,
String separator)
addword
public void addword(String newword)
getMarks
public ArrayList getMarks()
- This method returns the marks where the spaces were added by the segmenter
segmentData
public String segmentData(String fileContents,
String encoding)