edu.jhu.nlp.wikipedia
Class WikiXMLDOMParser

java.lang.Object
  extended by edu.jhu.nlp.wikipedia.WikiXMLParser
      extended by edu.jhu.nlp.wikipedia.WikiXMLDOMParser

public class WikiXMLDOMParser
extends WikiXMLParser

A memory efficient parser for easy access to Wikipedia XML dumps in native and compressed XML formats.
Typical pattern of use:

WikiXMLDOMParser wxp = new WikiXMLDOMParser("enwiki-latest-pages-articles.xml");
wxp.setPageCallback(...);
wxp.parse();

or

WikiXMLDOMParser wxp = new WikiXMLDOMParser("enwiki-latest-pages-articles.xml");
wxp.parse();
WikiPageIterator it = wxp.getIterator();
...


Field Summary
 
Fields inherited from class edu.jhu.nlp.wikipedia.WikiXMLParser
currentPage
 
Constructor Summary
WikiXMLDOMParser(java.lang.String fileName)
           
 
Method Summary
 WikiPageIterator getIterator()
           
 void parse()
          The main parse method.
 void setPageCallback(PageCallbackHandler handler)
          Set a callback handler.
 
Methods inherited from class edu.jhu.nlp.wikipedia.WikiXMLParser
getInputSource, notifyPage
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WikiXMLDOMParser

public WikiXMLDOMParser(java.lang.String fileName)
Method Detail

setPageCallback

public void setPageCallback(PageCallbackHandler handler)
                     throws java.lang.Exception
Set a callback handler. The callback is executed every time a page instance is detected in the stream. Custom handlers are implementations of PageCallbackHandler

Specified by:
setPageCallback in class WikiXMLParser
Parameters:
handler -
Throws:
java.lang.Exception

getIterator

public WikiPageIterator getIterator()
                             throws java.lang.Exception
Specified by:
getIterator in class WikiXMLParser
Returns:
an iterator to the list of pages
Throws:
java.lang.Exception

parse

public void parse()
           throws java.lang.Exception
The main parse method.

Specified by:
parse in class WikiXMLParser
Throws:
java.lang.Exception