org.carrot2.filter.normalizer
Interface CaseNormalizer

All Known Implementing Classes:
SimpleCaseNormalizer, SmartCaseNormalizer

public interface CaseNormalizer

Brings the case of all tokens in all input tokenized documents's titles and snippets to one common form. This process can be thought of as 'stemming for case'. All input tokens must be subclasses of StringTypedToken interface. The input documents will get modified --their tokens will get overwritten with case-normalized versions. Token types will be preserved. No support is provided for the full text of documents. This class is not thread-safe.

Version:
$Revision: 2122 $
Author:
Stanislaw Osinski

Method Summary
 void addDocument(TokenizedDocument document)
          Adds a document to the normalization engine.
 void clear()
          Clears this instance so that it can be reused with another set of documents.
 List getNormalizedDocuments()
          Returns a List of case normalized documents.
 

Method Detail

clear

void clear()
Clears this instance so that it can be reused with another set of documents.


addDocument

void addDocument(TokenizedDocument document)
Adds a document to the normalization engine.

Throws:
IllegalStateException - when an attempt is made to add documents after the getNormalizedDocuments()has been called.

getNormalizedDocuments

List getNormalizedDocuments()
Returns a List of case normalized documents. After a successful call to this method, no documents can be added until this case normalizer is cleared using the clear()method. Note: it is in this method that document's tokenks get modified.

Returns:
a List of case normalized documents


Copyright (c) Dawid Weiss, Stanislaw Osinski