edu.jhu.nlp.wikipedia
Class WikiXMLSAXParser
java.lang.Object
edu.jhu.nlp.wikipedia.WikiXMLParser
edu.jhu.nlp.wikipedia.WikiXMLSAXParser
public class WikiXMLSAXParser
- extends WikiXMLParser
A SAX Parser for Wikipedia XML dumps.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WikiXMLSAXParser
public WikiXMLSAXParser(java.lang.String fileName)
setPageCallback
public void setPageCallback(PageCallbackHandler handler)
throws java.lang.Exception
- Set a callback handler. The callback is executed every time a
page instance is detected in the stream. Custom handlers are
implementations of
PageCallbackHandler
- Specified by:
setPageCallback
in class WikiXMLParser
- Parameters:
handler
-
- Throws:
java.lang.Exception
parse
public void parse()
throws java.lang.Exception
- The main parse method.
- Specified by:
parse
in class WikiXMLParser
- Throws:
java.lang.Exception
getIterator
public WikiPageIterator getIterator()
throws java.lang.Exception
- This parser is event driven, so it
can't provide a page iterator.
- Specified by:
getIterator
in class WikiXMLParser
- Returns:
- an iterator to the list of pages
- Throws:
java.lang.Exception
parseWikipediaDump
public static void parseWikipediaDump(java.lang.String dumpFile,
PageCallbackHandler handler)
throws java.lang.Exception
- A convenience method for the Wikipedia SAX interface
- Parameters:
dumpFile
- - path to the Wikipedia dumphandler
- - callback handler used for parsing
- Throws:
java.lang.Exception