edu.jhu.nlp.wikipedia
Class WikiXMLSAXParser

java.lang.Object
  extended by edu.jhu.nlp.wikipedia.WikiXMLParser
      extended by edu.jhu.nlp.wikipedia.WikiXMLSAXParser

public class WikiXMLSAXParser
extends WikiXMLParser

A SAX Parser for Wikipedia XML dumps.


Field Summary
 
Fields inherited from class edu.jhu.nlp.wikipedia.WikiXMLParser
currentPage
 
Constructor Summary
WikiXMLSAXParser(java.lang.String fileName)
           
 
Method Summary
 WikiPageIterator getIterator()
          This parser is event driven, so it can't provide a page iterator.
 void parse()
          The main parse method.
static void parseWikipediaDump(java.lang.String dumpFile, PageCallbackHandler handler)
          A convenience method for the Wikipedia SAX interface
 void setPageCallback(PageCallbackHandler handler)
          Set a callback handler.
 
Methods inherited from class edu.jhu.nlp.wikipedia.WikiXMLParser
getInputSource, notifyPage
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WikiXMLSAXParser

public WikiXMLSAXParser(java.lang.String fileName)
Method Detail

setPageCallback

public void setPageCallback(PageCallbackHandler handler)
                     throws java.lang.Exception
Set a callback handler. The callback is executed every time a page instance is detected in the stream. Custom handlers are implementations of PageCallbackHandler

Specified by:
setPageCallback in class WikiXMLParser
Parameters:
handler -
Throws:
java.lang.Exception

parse

public void parse()
           throws java.lang.Exception
The main parse method.

Specified by:
parse in class WikiXMLParser
Throws:
java.lang.Exception

getIterator

public WikiPageIterator getIterator()
                             throws java.lang.Exception
This parser is event driven, so it can't provide a page iterator.

Specified by:
getIterator in class WikiXMLParser
Returns:
an iterator to the list of pages
Throws:
java.lang.Exception

parseWikipediaDump

public static void parseWikipediaDump(java.lang.String dumpFile,
                                      PageCallbackHandler handler)
                               throws java.lang.Exception
A convenience method for the Wikipedia SAX interface

Parameters:
dumpFile - - path to the Wikipedia dump
handler - - callback handler used for parsing
Throws:
java.lang.Exception