HydraExpress™ C++ API Reference Guide

Product Documentation:
   HydraExpress C++
Documentation Home
List of all members | Public Types | Public Member Functions | Static Public Attributes
rwsf::XmlReader Class Reference

A simple XML pull-parser that implements reference semantics. More...

#include <rwsf/core/XmlReader.h>

Inheritance diagram for rwsf::XmlReader:
rwsf::HandleBase

Public Types

enum  NodeType {
  StartTag, EndTag, EmptyTag, Data,
  Unknown
}
 

Public Member Functions

 XmlReader ()
 
 XmlReader (const char *buf, size_t length)
 
 XmlReader (const unsigned char *buf, size_t length)
 
 XmlReader (const std::string &document)
 
void addNamespace (const rwsf::XmlNamespace &ns)
 
bool eof ()
 
rwsf::XmlReader getElementReader (const rwsf::XmlReaderName &name=rwsf::XmlReaderName::Empty)
 
std::string getEncoding () const
 
bool getExpandAttributeReference () const
 
bool getExpandCommentReference () const
 
bool getExpandContentReference () const
 
rwsf::XmlAttributeSet getLastAttributes () const
 
std::string getLastContent () const
 
rwsf::XmlName getLastName () const
 
NodeType getLastNodeType () const
 
std::string getPrefixForURI (const std::string &uri) const
 
std::string getStandalone () const
 
std::string getURIForPrefix (const std::string &prefix) const
 
std::string getVersion () const
 
bool hasEncoding () const
 
bool hasStandalone () const
 
bool isElementNext (const rwsf::XmlName &name)
 
bool isElementNext (const std::string &name)
 
std::string readElement (const rwsf::XmlName &name=NullName)
 
std::string readElement (const std::string &name)
 
void readElementEnd (const rwsf::XmlName &name)
 
void readElementEnd ()
 
rwsf::XmlAttributeSet readElementStart (const rwsf::XmlName &name)
 
void readElementStart ()
 
std::string readElementValue ()
 
void readNextNode ()
 
std::string readWellFormedElement (const rwsf::XmlName &name=NullName)
 
void setExpandAttributeReference (bool expandReference)
 
void setExpandCommentReference (bool expandComment)
 
void setExpandContentReference (bool expandReference)
 
- Public Member Functions inherited from rwsf::HandleBase
bool isValid (void) const
 
bool operator!= (const HandleBase &second) const
 
bool operator== (const HandleBase &second) const
 

Static Public Attributes

static rwsf::XmlName NullName
 

Additional Inherited Members

- Protected Member Functions inherited from rwsf::HandleBase
 HandleBase (void)
 
 HandleBase (StaticCtor)
 
 HandleBase (BodyBase *body)
 
 HandleBase (const HandleBase &second)
 
virtual ~HandleBase (void)
 
BodyBase & body (void) const
 
HandleBaseoperator= (const HandleBase &second)
 

Detailed Description

Class rwsf::XmlReader is a simple XML pull-parser. The XML document is typically parsed element by element using readElement(), or by iteratively calling readElementStart(), readElementValue(), and readElementEnd(). On each read, an XmlReader instance sets its internal state with information about the content it just read. Member functions getLastNodeType(), getLastName(), and getLastContent() can then be used to retrieve portions of the rwsf::XmlReader's state.

Note
This class uses reference semantics in which an instance of this class represents a reference to an implementation class.

rwsf::XmlReader throws an exception of type rwsf::XmlParseException when it encounters XML that is not well-formed. The rwsf::XmlParseException exception contains a description of the error and the line and column number of the source document where the error occurred.

rwsf::XmlReader can parse documents in the encodings UTF-8, UTF-16(BE), UTF-16LE, US-ASCII, and ISO-8859-1. In addition, if the rwsf_icu library is present, rwsf::XmlReader also converts from any character encodings supported by the ICU.

Please see the XML Binding Development Guide for further information on conversions and custom converters.

Note
rwsf::XmlReader converts all documents to UTF-8 regardless of the encoding of the source document.

Currently, rwsf::XmlReader provides support only for reading elements and their content. No support for reading processing instructions, DOCTYPE declarations, or entity declarations is provided.

Member Enumeration Documentation

Enumeration of different node types in XML.

Enumerator
StartTag 

An XML start tag; e.g., <customer>

EndTag 

An XML end tag; e.g., </customer>

EmptyTag 

An empty XML tag; e.g., <customer/>

Data 

Data that is the content of an element, not including any tags; e.g., John Doe.

Unknown 

Set before the reader has read an element from the document.

Constructor & Destructor Documentation

rwsf::XmlReader::XmlReader ( )

Default constructor. Constructs an invalid reader.

rwsf::XmlReader::XmlReader ( const char *  buf,
size_t  length 
)

Constructs a reader from the document pointed to by buf, which is length bytes long. Parses the prolog of the document if found, and determines the document encoding both from the encoding= specifier in the optional XML declaration, and from a guess based on the first few bytes of the document. Upon construction, the reader is placed before the first tag in the document.

rwsf::XmlReader::XmlReader ( const unsigned char *  buf,
size_t  length 
)

Constructs a reader from the document pointed to by buf, which is length bytes long. Parses the prolog of the document if found, and determines the document encoding both from the encoding= specifier in the optional XML declaration, and from a guess based on the first few bytes of the document. Upon construction, the reader is placed before the first tag in the document.

rwsf::XmlReader::XmlReader ( const std::string &  document)

Convenience constructor for converting from an std::string. Constructs a reader from the XML document in document. Parses the prolog of the document if found, and determines the encoding used by document, both from the encoding= specifier in the optional XML declaration, and from a guess based on the first few bytes of the document. Upon construction, the reader is placed before the first tag in the document.

Member Function Documentation

void rwsf::XmlReader::addNamespace ( const rwsf::XmlNamespace ns)

Adds ns to the list of namespaces known by the reader. This method is useful when parsing document fragments where namespaces are declared outside the scope of the fragment.

bool rwsf::XmlReader::eof ( )

Returns true if at the end of the current document; false otherwise.

rwsf::XmlReader rwsf::XmlReader::getElementReader ( const rwsf::XmlReaderName &  name = rwsf::XmlReaderName::Empty)

Returns a new rwsf::XmlReader instance for the current element, as if the current element in its entirety were this new document's root. This new reader copies the state of the parent reader, but its internal cursor is set to the beginning of the element, so that functions like readElementStart(), readElementValue(), etc. return the current element's information. The parent reader will have its cursor advanced past the element, so any of the parent reader's read() functions return the next element's information instead.

Exceptions
rwsf::XmlParseExceptionThe current element's name is not the provided name.
std::string rwsf::XmlReader::getEncoding ( ) const

Returns the name of the encoding of the original source document, either from the XML declaration's "encoding=" declaration, or as automatically sensed from the first few bytes of the XML document.

bool rwsf::XmlReader::getExpandAttributeReference ( ) const

Returns true if the reader expands entity references in attributes, false otherwise. See setExpandAttributeReference() for an example of usage.

bool rwsf::XmlReader::getExpandCommentReference ( ) const

Returns true if the reader expands comments found in XML content, false otherwise. See setExpandCommentReference() for an example of usage.

bool rwsf::XmlReader::getExpandContentReference ( ) const

Returns true if the reader expands XML references in content, false otherwise. See setExpandContentReference() for an example of usage.

rwsf::XmlAttributeSet rwsf::XmlReader::getLastAttributes ( ) const

Returns the set of attributes associated with the last node read of type rwsf::XmlReader::StartTag.

std::string rwsf::XmlReader::getLastContent ( ) const

Returns the last content read, for nodes of type rwsf::XmlReader::Data. This value is undefined if the last node read was not of type rwsf::XmlReader::Data.

Note
The content will be encoded in UTF-8, regardless of the encoding of the source document.
rwsf::XmlName rwsf::XmlReader::getLastName ( ) const

Returns the name of the last node read. This value is undefined if the last node read was of type rwsf::XmlReader::Data.

NodeType rwsf::XmlReader::getLastNodeType ( ) const

Returns the type of the last node read. See NodeType for more information on the NodeType enumeration.

std::string rwsf::XmlReader::getPrefixForURI ( const std::string &  uri) const

Looks up the provided uri in the current list of namespaces and returns the corresponding prefix. If the current list of namespaces does not contain the uri, returns the empty string.

std::string rwsf::XmlReader::getStandalone ( ) const

Returns the value of the source document's "standalone=" declaration if it exists, the empty string otherwise.

std::string rwsf::XmlReader::getURIForPrefix ( const std::string &  prefix) const

Looks up the provided prefix in the current list of namespaces, returns the corresponding URI. If the current list of namespaces does not contain the prefix, returns the empty string.

std::string rwsf::XmlReader::getVersion ( ) const

Returns the value of the source document's "version=" declaration if it exists, the empty string otherwise.

bool rwsf::XmlReader::hasEncoding ( ) const

Returns true if the source XML document explicitly specifies an encoding. Returns false if the document does not specify an encoding, i.e. the encoding was automatically determined from the first few bytes of the XML document.

bool rwsf::XmlReader::hasStandalone ( ) const

Returns true if a "standalone=" declaration exists in the source document's XML declaration.

bool rwsf::XmlReader::isElementNext ( const rwsf::XmlName name)

Returns true if name is the next element.

Note
If a qualified name is required for name, name must be an instance of XmlName. Any element or type name used in an std::string is considered an unqualified local name, even if it contains a namespace prefix and/or URI.
bool rwsf::XmlReader::isElementNext ( const std::string &  name)

Returns true if name is the next element.

Note
If a qualified name is required for name, name must be an instance of XmlName. Any element or type name used in an std::string is considered an unqualified local name, even if it contains a namespace prefix and/or URI.
std::string rwsf::XmlReader::readElement ( const rwsf::XmlName name = NullName)

Reads in the next element from the current document and returns the entire element. A name can be provided, in which case the element's name must match, or an exception is thrown.

This method returns the entire XML for the element, rooted at the element (in other words, the element's start and end tag will be a part of the resulting string). Also returned is all content and child tags with their content. In effect, the method grabs the element wholesale and gives it to you in string form.

Note
The returned string will always be encoded in UTF-8, regardless of the original source encoding.

If a qualified name is required for name, name must be an instance of XmlName. Any element or type name used in an std::string is considered an unqualified local name, even if it contains a namespace prefix and/or URI.

Exceptions
rwsf::XmlParseExceptionThe current element's name is not the provided name.
rwsf::XmlParseExceptionThe element's XML is invalid or malformed.
std::string rwsf::XmlReader::readElement ( const std::string &  name)

Reads in the next element from the current document and returns the entire element. A name can be provided, in which case the element's name must match, or an exception is thrown.

This method returns the entire XML for the element, rooted at the element (in other words, the element's start and end tag will be a part of the resulting string). Also returned is all content and child tags with their content. In effect, the method grabs the element wholesale and gives it to you in string form.

Note
The returned string will always be encoded in UTF-8, regardless of the original source encoding.

If a qualified name is required for name, name must be an instance of XmlName. Any element or type name used in an std::string is considered an unqualified local name, even if it contains a namespace prefix and/or URI.

Exceptions
rwsf::XmlParseExceptionThe current element's name is not the provided name.
rwsf::XmlParseExceptionThe element's XML is invalid or malformed.
void rwsf::XmlReader::readElementEnd ( const rwsf::XmlName name)

Reads the next node in the document. If the node is not an end tag matching name, throws an exception of type rwsf::XmlParseException.

Exceptions
rwsf::XmlParseExceptionThe current element's name is not the provided name.
rwsf::XmlParseExceptionThe next tag is not an end tag.
void rwsf::XmlReader::readElementEnd ( )

Reads the next node in the document. If the node is not an end tag, throws an exception.

Exceptions
rwsf::XmlParseExceptionThe next tag is not an end tag.
rwsf::XmlAttributeSet rwsf::XmlReader::readElementStart ( const rwsf::XmlName name)

Reads the next node in the document. If the node is not a start tag, or the node's name does not match name, throws an exception of type rwsf::XmlParseException. Returns any attributes found inside the tag.

Exceptions
rwsf::XmlParseExceptionThe current element's name is not the provided name.
rwsf::XmlParseExceptionThe next tag is not a start tag.
void rwsf::XmlReader::readElementStart ( )

Reads the next node in the document. If the node is not a start tag, throws an exception.

Exceptions
rwsf::XmlParseExceptionThe next tag is not a start tag.
std::string rwsf::XmlReader::readElementValue ( )

Reads and returns the next element content from the document. The element's start or end tags are not included in the returned string. If getExpandCommentReference() returns false, comments will not be included in the output. If getExpandContentReference() returns false, the output will contain entity references (&lt;, &gt;, etc.). Otherwise, comments are printed and entity references unescaped, respectively.

See also
getExpandCommentReference() and getExpandContentReference() for more information on comment and entity reference expansion.
void rwsf::XmlReader::readNextNode ( )

Reads the next start tag, empty tag, end tag, or content from the document. Use getLastNodeType(), getLastName(), and getLastContent() to retrieve information on what was read. If a well-formedness error is encountered while reading the document, an exception of type rwsf::XmlParseException is thrown.

Note
This method is not typically used directly. It is used by other methods such as readElementStart(), readElementValue(), and so on.
std::string rwsf::XmlReader::readWellFormedElement ( const rwsf::XmlName name = NullName)

This method functions exactly like readElement(), except that it adds namespace declarations to the element's start tag to allow the element to be well formed. This includes namespaces declared on parent elements that are in use by this element or one of its children. You can expect that the element alone will be able to resolve its namespaces internally, even if they were declared external to this element.

Note
The returned string will always be encoded in UTF-8, regardless of the original source encoding.
Exceptions
rwsf::XmlParseExceptionThe current element's name is not the provided name.
rwsf::XmlParseExceptionThe element's XML is invalid or malformed.
void rwsf::XmlReader::setExpandAttributeReference ( bool  expandReference)

Sets whether the reader expands entity references in attributes. For example, when expandReference is true (the default), the reader converts the attribute value, like so:

3 &lt; 4

to:

3 < 4.
void rwsf::XmlReader::setExpandCommentReference ( bool  expandComment)

Sets whether the reader expands comments found in XML content. The default is expandComment = false.

When expandComment is true, the reader keeps the comment in the element value returned from readElement():

<elem><!-- My Comment --></elem>

to:

<elem><!-- My Comment --></elem>

If expandComment is false (the default), the above example is converted to:

<elem></elem>
void rwsf::XmlReader::setExpandContentReference ( bool  expandReference)

Sets whether the reader expands entity references in content. For example, when expandReference is true (the default), the reader converts the element value returned from readElement(), like so:

<elem>5 &lt; 20</elem>

to:

<elem>5 < 20 </elem>

Member Data Documentation

rwsf::XmlName rwsf::XmlReader::NullName
static

Static constant rwsf::XmlName that contains an empty prefix and an empty namespace URI.

Copyright © 2020 Rogue Wave Software, Inc. All Rights Reserved.
Rogue Wave is registered trademark of Rogue Wave Software, Inc. in the United States and other countries, and HydraExpress is a trademark of Rogue Wave Software. All other trademarks are the property of their respective owners.
Provide feedback to Rogue Wave about its documentation.