org.carrot2.tools.odp.index
Class CatidPrimaryTopicIndexBuilder
java.lang.Object
org.xml.sax.helpers.DefaultHandler
org.carrot2.tools.odp.common.ODPAbstractSaxHandler
org.carrot2.tools.odp.index.CatidPrimaryTopicIndexBuilder
- All Implemented Interfaces:
- ObservableTopicIndexBuilder, PrimaryTopicIndexBuilder, PropertyProvider, ContentHandler, DTDHandler, EntityResolver, ErrorHandler
public class CatidPrimaryTopicIndexBuilder
- extends ODPAbstractSaxHandler
- implements PrimaryTopicIndexBuilder
Builds a CatidPrimaryTopicIndexBuilderbased on the ODP Topic's
catid attribute. Indices created by this class are instances
of CatidPrimaryTopicIndexBuilder.
Content files (one file contains one topic along with all its external pages)
are laid out in a hierarchical structure of file system directories
corresponding to topic 'paths' in the original ODP structure, e.g.
Top/World/Poland/Komputery. The maximum depth of the file system directory
structure can be specified beyond which all topics will be saved in a flat
list of files (file name is the topics catid). As ODP topic
'paths' can contain problematic UTF8 characters, each element of the path is
mapped to an integer number.
This index builder is not thread-safe.
- Version:
- $Revision: 2122 $
- Author:
- Stanislaw Osinski
| Methods inherited from class org.carrot2.tools.odp.common.ODPAbstractSaxHandler |
addTopicIndexBuilderListener, characters, endDocument, endElement, fireTopicIndexed, getDoubleProperty, getIntProperty, getProperty, initalizeParser, removeTopicIndexBuilderListener, setDoubleProperty, setIntProperty, setProperty, startElement |
| Methods inherited from class org.xml.sax.helpers.DefaultHandler |
endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CatidPrimaryTopicIndexBuilder
public CatidPrimaryTopicIndexBuilder()
create
public PrimaryTopicIndex create(InputStream inputStream,
TopicSerializer topicSerializer,
Collection topicIndexBuilders)
throws IOException,
ClassNotFoundException
- Description copied from interface:
PrimaryTopicIndexBuilder
- Creates a
PrimaryTopicIndexfor given ODP RDF content data. This
method must also create the underlying file and directory structure.
- Specified by:
create in interface PrimaryTopicIndexBuilder
- Parameters:
inputStream - an InputStreamassociated with the ODP RDF
content file to be indexed.topicSerializer - a TopicSerializerto be used to store
topic datatopicIndexBuilders - a collection of TopicIndexBuilders to
be executed after a primary index entry has been created for a
topic
- Throws:
IOException
ClassNotFoundException
index
protected void index(Topic topic)
throws IOException
- Specified by:
index in class ODPAbstractSaxHandler
- Parameters:
topic -
- Throws:
IOException
setTopicSerializer
public void setTopicSerializer(TopicSerializer topicSerializer)
- Sets this CatidPrimaryTopicIndexBuilder's
topicSerializer.
- Parameters:
topicSerializer -
Copyright (c) Dawid Weiss, Stanislaw Osinski