edu.jhu.nlp.wikipedia
Class WikiXMLDOMParser
java.lang.Object
edu.jhu.nlp.wikipedia.WikiXMLParser
edu.jhu.nlp.wikipedia.WikiXMLDOMParser
public class WikiXMLDOMParser
- extends WikiXMLParser
A memory efficient parser for easy access to Wikipedia XML dumps in native and compressed XML formats.
Typical pattern of use:
WikiXMLDOMParser wxp = new WikiXMLDOMParser("enwiki-latest-pages-articles.xml");
wxp.setPageCallback(...);
wxp.parse();
or
WikiXMLDOMParser wxp = new WikiXMLDOMParser("enwiki-latest-pages-articles.xml");
wxp.parse();
WikiPageIterator it = wxp.getIterator();
...
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WikiXMLDOMParser
public WikiXMLDOMParser(java.lang.String fileName)
setPageCallback
public void setPageCallback(PageCallbackHandler handler)
throws java.lang.Exception
- Set a callback handler. The callback is executed every time a
page instance is detected in the stream. Custom handlers are
implementations of
PageCallbackHandler
- Specified by:
setPageCallback
in class WikiXMLParser
- Parameters:
handler
-
- Throws:
java.lang.Exception
getIterator
public WikiPageIterator getIterator()
throws java.lang.Exception
- Specified by:
getIterator
in class WikiXMLParser
- Returns:
- an iterator to the list of pages
- Throws:
java.lang.Exception
parse
public void parse()
throws java.lang.Exception
- The main parse method.
- Specified by:
parse
in class WikiXMLParser
- Throws:
java.lang.Exception