org.carrot2.tools.odp.index
Class CatidPrimaryTopicIndexBuilder

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by org.carrot2.tools.odp.common.ODPAbstractSaxHandler
          extended by org.carrot2.tools.odp.index.CatidPrimaryTopicIndexBuilder
All Implemented Interfaces:
ObservableTopicIndexBuilder, PrimaryTopicIndexBuilder, PropertyProvider, ContentHandler, DTDHandler, EntityResolver, ErrorHandler

public class CatidPrimaryTopicIndexBuilder
extends ODPAbstractSaxHandler
implements PrimaryTopicIndexBuilder

Builds a CatidPrimaryTopicIndexBuilderbased on the ODP Topic's catid attribute. Indices created by this class are instances of CatidPrimaryTopicIndexBuilder. Content files (one file contains one topic along with all its external pages) are laid out in a hierarchical structure of file system directories corresponding to topic 'paths' in the original ODP structure, e.g. Top/World/Poland/Komputery. The maximum depth of the file system directory structure can be specified beyond which all topics will be saved in a flat list of files (file name is the topics catid). As ODP topic 'paths' can contain problematic UTF8 characters, each element of the path is mapped to an integer number. This index builder is not thread-safe.

Version:
$Revision: 2122 $
Author:
Stanislaw Osinski

Field Summary
 
Fields inherited from class org.carrot2.tools.odp.common.ODPAbstractSaxHandler
currentExternalPage, currentTopic, propertyHelper, stringBuffer, topicIndexBuilderListeners
 
Constructor Summary
CatidPrimaryTopicIndexBuilder()
           
 
Method Summary
 PrimaryTopicIndex create(InputStream inputStream, TopicSerializer topicSerializer, Collection topicIndexBuilders)
          Creates a PrimaryTopicIndexfor given ODP RDF content data.
protected  void index(Topic topic)
           
 void setTopicSerializer(TopicSerializer topicSerializer)
          Sets this CatidPrimaryTopicIndexBuilder's topicSerializer.
 
Methods inherited from class org.carrot2.tools.odp.common.ODPAbstractSaxHandler
addTopicIndexBuilderListener, characters, endDocument, endElement, fireTopicIndexed, getDoubleProperty, getIntProperty, getProperty, initalizeParser, removeTopicIndexBuilderListener, setDoubleProperty, setIntProperty, setProperty, startElement
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.carrot2.util.PropertyProvider
getDoubleProperty, getIntProperty, getProperty, setDoubleProperty, setIntProperty, setProperty
 

Constructor Detail

CatidPrimaryTopicIndexBuilder

public CatidPrimaryTopicIndexBuilder()
Method Detail

create

public PrimaryTopicIndex create(InputStream inputStream,
                                TopicSerializer topicSerializer,
                                Collection topicIndexBuilders)
                         throws IOException,
                                ClassNotFoundException
Description copied from interface: PrimaryTopicIndexBuilder
Creates a PrimaryTopicIndexfor given ODP RDF content data. This method must also create the underlying file and directory structure.

Specified by:
create in interface PrimaryTopicIndexBuilder
Parameters:
inputStream - an InputStreamassociated with the ODP RDF content file to be indexed.
topicSerializer - a TopicSerializerto be used to store topic data
topicIndexBuilders - a collection of TopicIndexBuilders to be executed after a primary index entry has been created for a topic
Throws:
IOException
ClassNotFoundException

index

protected void index(Topic topic)
              throws IOException
Specified by:
index in class ODPAbstractSaxHandler
Parameters:
topic -
Throws:
IOException

setTopicSerializer

public void setTopicSerializer(TopicSerializer topicSerializer)
Sets this CatidPrimaryTopicIndexBuilder's topicSerializer.

Parameters:
topicSerializer -


Copyright (c) Dawid Weiss, Stanislaw Osinski