How HydraExpress Performs Character Conversions for Non-UTF-8 Encodings
This section discusses how HydraExpress and its data parser deal with non-UTF-8 character encodings.
Character Encoding in XML Schemas and WSDLs
For XML Schemas and WSDLs with a character encoding other than UTF-8, the internal parser automatically converts the string data to UTF-8 from the encoding specified in the document’s prolog (See
“Character Encoding in an XML Prolog” ).
NOTE >> When providing non-UTF-8 schemas and WSDLs to HydraExpress, ensure that all characters in the document match the encoding stated in the document’s prolog.
HydraExpress also supports non-ASCII XML element and attribute names. These names are handled in one of two ways:
• If a customized mapping is defined to map the non-ASCII names to ASCII values, and this mapping is provided in a HydraExpress project file to the code generator, the binding uses the mapped element and attribute names. Here is an example mapping:
<rwsf-codegen-project>
<options>
<option name='project-name' value='MyMapping'/>
...
</options>
<mappings>
<name xsdtype="element" xsdname="ãf³"
name="i18nElement"/>
<name xsdtype="attribute" xsdname="èç3"
name="i18nAttribute"/>
</mappings>
</rwsf-codegen-project>
• If no specialized mapping is supplied, the binding assigns element and attribute names based on the pattern Member<number>.
Non-ASCII element and attribute names are preserved when an XML instance document is unmarshaled and later marshaled back into an XML document.
Character Encoding in Generated Classes
Classes created by HydraExpress parse XML using class
rwsf::XmlReader. An instance of
rwsf::XmlReader converts XML source to UTF-8, regardless of the original encoding. For details on the encodings that the reader supports, see the entry for
rwsf::XmlReader in the
HydraExpress C++ API Reference Guide.HydraExpress produces XML documents in UTF-8.