org.carrot2.filter.lingo.common
Class CarrotLibTokenizerPreprocessingStrategy

java.lang.Object
  extended by org.carrot2.filter.lingo.common.CarrotLibTokenizerPreprocessingStrategy
All Implemented Interfaces:
PreprocessingStrategy

public final class CarrotLibTokenizerPreprocessingStrategy
extends Object
implements PreprocessingStrategy

A preprocessing strategy utilizing an internal tokenizer, languages map and stemmers from the new Carrot2 core.

Author:
Dawid Weiss, Stanisław Osiński

Field Summary
protected  Map caseCheck
           
protected  Map inflectedFreqSets
           
protected  Map inflectedSets
           
protected  Map languages
           
protected static org.apache.log4j.Logger logger
          Logger
protected  Set lowCaseWords
           
protected  Map nonStopWordSets
           
protected  Set queryWords
           
protected  Map stemSets
          Linguistic information
protected  Map stopWordSets
           
protected  Set strongWords
           
 
Constructor Summary
CarrotLibTokenizerPreprocessingStrategy()
           
 
Method Summary
 Snippet[] preprocess(AbstractClusteringContext clusteringContext)
           
protected  Snippet preprocess(Snippet snippet, LanguageTokenizer tokenizer)
          Method clean.
protected  Snippet stemming(Snippet snippet)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static final org.apache.log4j.Logger logger
Logger


stemSets

protected Map stemSets
Linguistic information


inflectedSets

protected Map inflectedSets

stopWordSets

protected Map stopWordSets

nonStopWordSets

protected Map nonStopWordSets

languages

protected Map languages

strongWords

protected Set strongWords

queryWords

protected Set queryWords

lowCaseWords

protected Set lowCaseWords

caseCheck

protected Map caseCheck

inflectedFreqSets

protected Map inflectedFreqSets
Constructor Detail

CarrotLibTokenizerPreprocessingStrategy

public CarrotLibTokenizerPreprocessingStrategy()
Method Detail

preprocess

public Snippet[] preprocess(AbstractClusteringContext clusteringContext)
Specified by:
preprocess in interface PreprocessingStrategy

preprocess

protected Snippet preprocess(Snippet snippet,
                             LanguageTokenizer tokenizer)
Method clean.


stemming

protected Snippet stemming(Snippet snippet)


Copyright (c) Dawid Weiss, Stanislaw Osinski