XML Binding Development Guide : PART III Advanced Topics : Chapter 7 Advanced Tasks : Working With Large Documents
Working With Large Documents
If an XML document is so large that better performance could be provided by marshaling it in pieces, use class rwsf::XmlReader to read specific elements from the document. rwsf::XmlReader is an efficient pull-parser.
To unmarshal part of a document, use method isElementNext() and method readNextNode() to locate the element of interest. The code sample below iterates over the XML document until the reader is positioned at an element with the name targetType in the unnamed namespace. The sample then unmarshals this element into an instance of targetType, a class generated by HydraExpress:
 
std::string xmlData =
"<ns:rootElement xmlns:ns=www.mynamespace.com>"
"<firstElement>firstData</firstElement>"
"<targetType><size>12</size><weight>23</weight></targetType>"
...
"<lastElement>lastData</lastElement>"
"</ns:rootElement>";
rwsf::XmlReader reader(xmlData);
tns::TargetType data; // 1
 
rwsf::XmlName target = tns::TargetType::DefaultElementName;
 
try {
while (reader.isElementNext(target) == false) { // 2
reader.readNextNode(); // 3
}
tns::TargetType::unmarshal(reader, data, target); // 4
}
catch (const rwsf::XmlParseException& e)
{
std::cerr << "Error unmarshaling : " << e.what() << std::endl;
// return or rethrow...
}
//1 Constructs data, an empty instance of class TargetType.
//2 Loops until the reader is positioned just before an element with a name matching target.
//3 Advances the reader over the next node in the document.
//4 Populates data from reader.
A more general solution is to create a helper class to handle the iteration. The sample below shows a sample class template for reading specific elements from an XML reader. The interface is deliberately simple, and follows the interface of the Servlet class rwsf::Enumeration<T>:
 
#include <rwsf/core/XmlName.h>
#include <rwsf/core/XmlReader.h>
#include <rwsf/core/XmlParseException.h>
 
template
<class T>
class XmlDocumentIterator {
public:
XmlDocumentIterator(rwsf::XmlReader& input,
const rwsf::XmlName& elementName = getDefaultName() ) :
reader_(input),
defaultName_(elementName)
{};
bool
hasMoreElements() {
advanceReader();
return !reader_.eof();
}
 
T
getNextElement() {
if ( hasMoreElements() == false ) {
throw rwsf::XmlParseException("EOF trying to read " +
defaultName_.getQualifiedName());
}
T temp;
T::unmarshal(reader_.readElement(defaultName_),
temp, defaultName_);
return temp;
}
 
private:
rwsf::XmlReader& reader_;
const rwsf::XmlName defaultName_;
void
advanceReader() {
while (reader_.isElementNext(defaultName_) == false &&
reader_.eof() == false)
{ reader_.readNextNode(); }
}
static rwsf::XmlName
getDefaultName()
{
return T::DefaultElementName;
}
};
To use the template, construct an instance of the XmlDocumentIterator specialized on the HydraExpress class to be read:
 
std::string xmlContents = rwsf::rwsfReadFile("input.xml");
rwsf::XmlReader reader(xmlContents);
XmlDocumentIterator<TargetNode> it(reader);
 
while (it.hasMoreElements()) {
TargetNode node = it.getNextElement();
// Process node
}