Working With Large Documents

HydraExpress Components : XML Binding Development Guide : PART III Advanced Topics : Advanced Tasks : Working With Large Documents

If an XML document is so large that better performance could be provided by marshaling it in pieces, use class rwsf::XmlReader to read specific elements from the document. rwsf::XmlReader is an efficient pull-parser.

To unmarshal part of a document, use method isElementNext() and method readNextNode() to locate the element of interest. The code sample below iterates over the XML document until the reader is positioned at an element with the name targetType in the unnamed namespace. The sample then unmarshals this element into an instance of targetType, a class generated by HydraExpress:

std::string xmlData =

"<ns:rootElement xmlns:ns=www.mynamespace.com>"

"<firstElement>firstData</firstElement>"

"<targetType><size>12</size><weight>23</weight></targetType>"

...

"<lastElement>lastData</lastElement>"

"</ns:rootElement>";

rwsf::XmlReader reader(xmlData);

tns::TargetType data; // 1

rwsf::XmlName target = tns::TargetType::DefaultElementName;

try {

while (reader.isElementNext(target) == false) { // 2

reader.readNextNode(); // 3

}

tns::TargetType::unmarshal(reader, data, target); // 4

}

catch (const rwsf::XmlParseException& e)

{

std::cerr << "Error unmarshaling : " << e.what() << std::endl;

// return or rethrow...

}

//1 Constructs data, an empty instance of class TargetType.

//2 Loops until the reader is positioned just before an element with a name matching target.

//3 Advances the reader over the next node in the document.

//4 Populates data from reader.

A more general solution is to create a helper class to handle the iteration. The sample below shows a sample class template for reading specific elements from an XML reader. The interface is deliberately simple, and follows the interface of the Servlet class rwsf::Enumeration<T>:

#include <rwsf/core/XmlName.h>

#include <rwsf/core/XmlReader.h>

#include <rwsf/core/XmlParseException.h>

template

class XmlDocumentIterator {

public:

XmlDocumentIterator(rwsf::XmlReader& input,

const rwsf::XmlName& elementName = getDefaultName() ) :

reader_(input),

defaultName_(elementName)

{};

bool

hasMoreElements() {

advanceReader();

return !reader_.eof();

}

getNextElement() {

if ( hasMoreElements() == false ) {

throw rwsf::XmlParseException("EOF trying to read " +

defaultName_.getQualifiedName());

}

T temp;

T::unmarshal(reader_.readElement(defaultName_),

temp, defaultName_);

return temp;

}

private:

rwsf::XmlReader& reader_;

const rwsf::XmlName defaultName_;

void

advanceReader() {

while (reader_.isElementNext(defaultName_) == false &&

reader_.eof() == false)

{ reader_.readNextNode(); }

}

static rwsf::XmlName

getDefaultName()

{

return T::DefaultElementName;

}

};

To use the template, construct an instance of the XmlDocumentIterator specialized on the HydraExpress class to be read:

std::string xmlContents = rwsf::rwsfReadFile("input.xml");

rwsf::XmlReader reader(xmlContents);

XmlDocumentIterator<TargetNode> it(reader);

while (it.hasMoreElements()) {

TargetNode node = it.getNextElement();

// Process node

}