#include <rw/wstring.h> RWWString a;
Class RWWString offers very powerful and convenient facilities for manipulating wide character strings.
This string class manipulates wide characters of the fundamental type wchar_t. These characters are generally two or four bytes, and can be used to encode richer code sets than the classic "char" type. Because wchar_t characters are all the same size, indexing is fast.
Conversion to and from multibyte and ASCII forms are provided by the RWWString constructors, and by the RWWString member functions isAscii(), toAscii(), and toMultiByte().
Stream operations implicitly translate to and from the multibyte stream representation. That is, on output, wide character strings are converted into multibyte strings, while on input they are converted back into wide character strings. Hence, the external representation of wide character strings is usually as multibyte character strings, saving storage space and making interfaces with devices (which usually expect multibyte strings) easier.
RWWStrings tolerate embedded nulls.
Parameters of type "const wchar_t*" must not be passed a value of zero. This is detected in the debug version of the library.
The class is implemented using a technique called copy on write. With this technique, the copy constructor and assignment operators still reference the old object and hence are very fast. An actual copy is made only when a "write" is performed, that is if the object is about to be changed. The net result is excellent performance, but with easy-to-understand copy semantics.
A separate RWWSubString class supports substring extraction and modification operations.
Simple
#include <rw/rstream.h> #include <rw/wstring.h> main(){ RWWString a(L"There is no joy in Beantown"); a.subString(L"Beantown") = L"Redmond"; cout << a << endl; return 0; }
Program output:
There is no joy in Redmond.
enum RWWString::caseCompare { exact, ignoreCase };
Used to specify whether comparisons, searches, and hashing functions should use case sensitive (exact) or case-insensitive (ignoreCase) semantics..
enum RWWString::multiByte_ { multiByte };
Allow conversion from multibyte character strings to wide character strings. See constructor below.
enum RWWString::ascii_ {ascii };
Allow conversion from ASCII character strings to wide character strings. See constructor below.
RWWString();
Creates a string of length zero (the null string).
RWWString(const wchar_t* cs);
Creates a string from the wide character string cs. The created string will copy the data pointed to by cs, up to the first terminating null.
RWWString(const wchar_t* cs, size_t N);
Constructs a string from the character string cs. The created string will copy the data pointed to by cs. Exactly N characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N* sizeof(wchar_t) bytes or N wide characters long.
RWWString(RWSize_T ic);
Creates a string of length zero (the null string). The string's capacity (that is, the size it can grow to without resizing) is given by the parameter ic.
RWWString(const RWWString& str);
Copy constructor. The created string will copy str's data.
RWWString(const RWWSubString& ss);
Conversion from sub-string. The created string will copy the substring represented by ss.
RWWString(char c);
Constructs a string containing the single character c.
RWWString(char c, size_t N);
Constructs a string containing the character c repeated N times.
RWWString(const char* mbcs, multiByte_ mb);
Construct a wide character string from the multibyte character string contained in mbcs. The conversion is done using the Standard C library function ::mbstowcs(). This constructor can be used as follows:
RWWString a("\306\374\315\313\306\374", multiByte);
RWWString(const char* acs, ascii_ asc);
Construct a wide character string from the ASCII character string contained in acs. The conversion is done by simply stripping the high-order bit and, hence, is much faster than the more general constructor given immediately above. For this conversion to be successful, you must be certain that the string contains only ASCII characters. This can be confirmed (if necessary) using RWCString::isAscii(). This constructor can be used as follows:
RWWString a("An ASCII character string", ascii);
RWWString(const char* cs, size_t N, multiByte_ mb); RWWString(const char* cs, size_t N, ascii__ asc);
These two constructors are similar to the two constructors immediately above except that they copy exactly N characters, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N bytes long.
operator const wchar_t*() const;
Access to the RWWString's data as a null terminated wide string. This datum is owned by the RWWString and may not be deleted or changed. If the RWWString object itself changes or goes out of scope, the pointer value previously returned will become invalid. While the string is null-terminated, note that its length is still given by the member function length(). That is, it may contain embedded nulls.
RWWString& operator=(const char* cs);
Assignment operator. Copies the null-terminated character string pointed to by cs into self. Returns a reference to self.
RWWString& operator=(const RWWString& str);
Assignment operator. The string will copy str's data. Returns a reference to self.
RWWString& operator=(const RWWSubString& sub);
Assignment operator. The string will copy sub's data. Returns a reference to self.
RWWString& operator+=(const wchar_t* cs);
Append the null-terminated character string pointed to by cs to self. Returns a reference to self.
RWWString& operator+=(const RWWString& str);
Append the string str to self. Returns a reference to self.
wchar_t& operator[](size_t i); wchar_t operator[](size_t i) const;
Return the ith character. The first variant can be used as an lvalue. The index i must be between 0 and the length of the string less one. Bounds checking is performed -- if the index is out of range then an exception of type RWBoundsErr will be thrown.
wchar_t& operator()(size_t i); wchar_t operator()(size_t i) const;
Return the ith character. The first variant can be used as an lvalue. The index i must be between 0 and the length of the string less one. Bounds checking is performed if the pre-processor macro RWBOUNDS_CHECK has been defined before including <rw/wstring.h>. In this case, if the index is out of range, then an exception of type RWBoundsErr will be thrown.
RWWSubString operator()(size_t start, size_t len); const RWWSubString operator()(size_t start, size_t len) const;
Substring operator. Returns an RWWSubString of self with length len, starting at index start. The first variant can be used as an lvalue. The sum of start plus len must be less than or equal to the string length. If the library was built using the RWDEBUG flag, and start and len are out of range, then an exception of type RWBoundsErr will be thrown.
RWWString& append(const wchar_t* cs);
Append a copy of the null-terminated wide character string pointed to by cs to self. Returns a reference to self.
RWWString& append(const wchar_t* cs, size_t N,);
Append a copy of the wide character string cs to self. Exactly N wide characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N*sizeof(wchar_t) bytes long. Returns a reference to self.
RWWString& append(const RWWString& cstr);
Append a copy of the string cstr to self. Returns a reference to self.
RWWString& append(const RWWString& cstr, size_t N);
Append the first N characters or the length of cstr (whichever is less) of cstr to self. Returns a reference to self.
size_t binaryStoreSize() const;
Returns the number of bytes necessary to store the object using the global function:
RWFile& operator<<(RWFile&, const RWWString&);
size_t capacity() const;
Return the current capacity of self. This is the number of characters the string can hold without resizing.
size_t capacity(size_t capac);
Hint to the implementation to change the capacity of self to capac. Returns the actual capacity.
int collate(const RWWString& str) const; int collate(const wchar_t* str) const;
Returns an int less then, greater than, or equal to zero, according to the result of calling the POSIX function ::wscoll() on self and the argument str. This supports locale-dependent collation.
int compareTo(const RWWString& str, caseCompare = RWWString::exact) const; int compareTo(const wchar_t* str, caseCompare = RWWString::exact) const;
Returns an int less than, greater than, or equal to zero, according to the result of calling the Standard C library function ::memcmp() on self and the argument str. Case sensitivity is according to the caseCompare argument, and may be RWWString::exact or RWWString::ignoreCase.
RWBoolean contains(const RWWString& cs, caseCompare = RWWString::exact) const; RWBoolean contains(const wchar_t* str, caseCompare = RWWString::exact) const;
Pattern matching. Returns TRUE if cs occurs in self. Case sensitivity is according to the caseCompare argument, and may be RWWString::exact or RWWString::ignoreCase.
const wchar_t* data() const;
Access to the RWWString's data as a null terminated string. This datum is owned by the RWWString and may not be deleted or changed. If the RWWString object itself changes or goes out of scope, the pointer value previously returned will become invalid. While the string is null-terminated, note that its length is still given by the member function length(). That is, it may contain embedded nulls.
size_t first(wchar_t c) const;
Returns the index of the first occurrence of the wide character c in self. Returns RW_NPOS if there is no such character or if there is an embedded null prior to finding c.
size_t first(wchar_t c, size_t) const;
Returns the index of the first occurrence of the wide character c in self. Continues to search past embedded nulls. Returns RW_NPOS if there is no such character.
size_t first(const wchar_t* str) const;
Returns the index of the first occurrence in self of any character in str. Returns RW_NPOS if there is no match or if there is an embedded null prior to finding any character from str.
size_t first(const wchar_t* str, size_t N) const;
Returns the index of the first occurrence in self of any character in str. Exactly N characters in str are checked including any embedded nulls so str must point to a buffer containing at least N wide characters. Returns RW_NPOS if there is no match.
unsigned hash(caseCompare = RWWString::exact) const;
Returns a suitable hash value.
size_t index(const wchar_t* pat,size_t i=0, caseCompare = RWWString::exact) const; size_t index(const RWWString& pat,size_t i=0, caseCompare = RWWString::exact) const;
Pattern matching. Starting with index i, searches for the first occurrence of pat in self and returns the index of the start of the match. Returns RW_NPOS if there is no such pattern. Case sensitivity is according to the caseCompare argument; it defaults to RWWString::exact.
size_t index(const wchar_t* pat, size_t patlen,size_t i, caseCompare) const; size_t index(const RWWString& pat, size_t patlen,size_t i, caseCompare) const;
Pattern matching. Starting with index i, searches for the first occurrence of the first patlen characters from pat in self and returns the index of the start of the match. Returns RW_NPOS if there is no such pattern. Case sensitivity is according to the caseCompare argument.
RWWString& insert(size_t pos, const wchar_t* cs);
Insert a copy of the null-terminated string cs into self at position pos. Returns a reference to self.
RWWString& insert(size_t pos, const wchar_t* cs, size_t N);
Insert a copy of the first N wide characters of cs into self at position pos. Exactly N wide characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N*sizeof(wchar_t) bytes long. Returns a reference to self.
RWWString& insert(size_t pos, const RWWString& str);
Insert a copy of the string str into self at position pos. Returns a reference to self.
RWWString& insert(size_t pos, const RWWString& str, size_t N);
Insert a copy of the first N wide characters or the length of str (whichever is less) of str into self at position pos. Returns a reference to self.
RWBoolean isAscii() const;
Returns TRUE if it is safe to perform the conversion toAscii() (that is, if all characters of self are ASCII characters).
RWBoolean isNull() const;
Returns TRUE if this string has zero length (i.e., the null string).
size_t last(wchar_t c) const;
Returns the index of the last occurrence in the string of the wide character c. Returns RW_NPOS if there is no such character.
size_t length() const;
Return the number of characters in self.
RWWString& prepend(const wchar_t* cs);
Prepend a copy of the null-terminated wide character string pointed to by cs to self. Returns a reference to self.
RWWString& prepend(const wchar_t* cs, size_t N,);
Prepend a copy of the character string cs to self. Exactly N characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N*sizeof(wchart_t) bytes long. Returns a reference to self.
RWWString& prepend(const RWWString& str);
Prepends a copy of the string str to self. Returns a reference to self.
RWWString& prepend(const RWWString& cstr, size_t N);
Prepend the first N wide characters or the length of cstr (whichever is less) of cstr to self. Returns a reference to self.
istream& readFile(istream& s);
Reads characters from the input stream s, replacing the previous contents of self, until EOF is reached. The input stream is treated as a sequence of multibyte characters, each of which is converted to a wide character (using the Standard C library function mbtowc()) before storing. Null characters are treated the same as other characters.
istream& readLine(istream& s, RWBoolean skipWhite = TRUE);
Reads characters from the input stream s, replacing the previous contents of self, until a newline (or an EOF) is encountered. The newline is removed from the input stream but is not stored. The input stream is treated as a sequence of multibyte characters, each of which is converted to a wide character (using the Standard C library function mbtowc()) before storing. Null characters are treated the same as other characters. If the skipWhite argument is TRUE, then whitespace is skipped (using the iostream library manipulator ws) before saving characters.
istream& readString(istream& s);
Reads characters from the input stream s, replacing the previous contents of self, until an EOF or null terminator is encountered. The input stream is treated as a sequence of multibyte characters, each of which is converted to a wide character (using the Standard C library function mbtowc()) before storing.
istream& readToDelim(istream&, wchar_t delim=(wchar_t)'\n');
Reads characters from the input stream s, replacing the previous contents of self, until an EOF or the delimiting character delim is encountered. The delimiter is removed from the input stream but is not stored. The input stream is treated as a sequence of multibyte characters, each of which is converted to a wide character (using the Standard C library function mbtowc()) before storing. Null characters are treated the same as other characters.
istream& readToken(istream& s);
Whitespace is skipped before storing characters into wide string. Characters are then read from the input stream s, replacing previous contents of self, until trailing whitespace or an EOF is encountered. The trailing whitespace is left on the input stream. Only ASCII whitespace characters are recognized, as defined by the standard C library function isspace(). The input stream is treated as a sequence of multibyte characters, each of which is converted to a wide character (using the Standard C library function mbtowc()) before storing.
RWWString& remove(size_t pos);
Removes the characters from the position pos, which must be no greater than length(), to the end of string. Returns a reference to self.
RWWString& remove(size_t pos, size_t N);
Removes N wide characters or to the end of string (whichever comes first) starting at the position pos, which must be no greater than length(). Returns a reference to self.
RWWString& replace(size_t pos, size_t N, const wchar_t* cs);
Replaces N wide characters or to the end of string (whichever comes first) starting at position pos, which must be no greater than length(), with a copy of the null-terminated string cs. Returns a reference to self.
RWWString& replace(size_t pos, size_t N1,const wchar_t* cs, size_t N2);
Replaces N1 characters or to the end of string (whichever comes first) starting at position pos, which must be no greater than length(), with a copy of the string cs. Exactly N2 characters are copied, including any embedded nulls. Hence, the buffer pointed to by cs must be at least N2*sizeof(wchart_t) bytes long. Returns a reference to self.
RWWString& replace(size_t pos, size_t N, const RWWString& str);
Replaces N characters or to the end of string (whichever comes first) starting at position pos, which must be no greater than length(), with a copy of the string str. Returns a reference to self.
RWWString& replace(size_t pos, size_t N1, const RWWString& str, size_t N2);
Replaces N1 characters or to the end of string (whichever comes first) starting at position pos, which must be no greater than length(), with a copy of the first N2 characters, or the length of str (whichever is less), from str. Returns a reference to self.
void resize(size_t n);
Changes the length of self, adding blanks (i.e., L' ') or truncating as necessary.
RWWSubString strip(stripType s = RWWString::trailing, wchar_t c = L' '); const RWWSubString strip(stripType s = RWWString::trailing, wchar_t c = L' ') const;
Returns a substring of self where the character c has been stripped off the beginning, end, or both ends of the string. The first variant can be used as an lvalue. The enum stripType can take values:
stripType |
Meaning |
leading |
Remove characters at beginning |
trailing |
Remove characters at end |
both |
Remove characters at both ends |
RWWSubString subString(const wchar_t* cs, size_t start=0, caseCompare = RWWString::exact); const RWWSubString subString(const wchar_t* cs, size_t start=0, caseCompare = RWWString::exact) const;
Returns a substring representing the first occurrence of the null-terminated string pointed to by "cs". Case sensitivity is according to the caseCompare argument; it defaults to RWWString::exact. The first variant can be used as an lvalue.
RWCString toAscii() const;
Returns an RWCString object of the same length as self, containing only ASCII characters. Any non-ASCII characters in self simply have the high bits stripped off. Use isAscii() to determine whether this function is safe to use.
RWCString toMultiByte() const;
Returns an RWCString containing the result of applying the standard C library function wcstombs() to self. This function is always safe to use.
void toLower();
Changes all upper-case letters in self to lower-case. Uses the C library function towlower().
void toUpper();
Changes all lower-case letters in self to upper-case. Uses the C library function towupper().
static unsigned hash(const RWWString& wstr);
Returns the hash value of wstr as returned by wstr.hash(RWWString::exact).
static size_t initialCapacity(size_t ic = 15);
Sets the minimum initial capacity of an RWWString, and returns the old value. The initial setting is 15 wide characters. Larger values will use more memory, but result in fewer resizes when concatenating or reading strings. Smaller values will waste less memory, but result in more resizes.
static size_t maxWaste(size_t mw = 15);
Sets the maximum amount of unused space allowed in a wide string should it shrink, and returns the old value. The initial setting is 15 wide characters. If more than mw characters are wasted, then excess space will be reclaimed.
static size_t resizeIncrement(size_t ri = 16);
Sets the resize increment when more memory is needed to grow a wide string. Returns the old value. The initial setting is 16 wide characters.
RWBoolean operator==(const RWWString&, const wchar_t* ); RWBoolean operator==(const wchar_t*, const RWWString&); RWBoolean operator==(const RWWString&, const RWWString&); RWBoolean operator!=(const RWWString&, const wchar_t* ); RWBoolean operator!=(const wchar_t*, const RWWString&); RWBoolean operator!=(const RWWString&, const RWWString&);
Logical equality and inequality. Case sensitivity is exact.
RWBoolean operator< (const RWWString&, const wchar_t* ); RWBoolean operator< (const wchar_t*, const RWWString&); RWBoolean operator< (const RWWString&, const RWWString&); RWBoolean operator> (const RWWString&, const wchar_t* ); RWBoolean operator> (const wchar_t*, const RWWString&); RWBoolean operator> (const RWWString&, const RWWString&); RWBoolean operator<=(const RWWString&, const wchar_t* ); RWBoolean operator<=(const wchar_t*, const RWWString&); RWBoolean operator<=(const RWWString&, const RWWString&); RWBoolean operator>=(const RWWString&, const wchar_t* ); RWBoolean operator>=(const wchar_t*, const RWWString&); RWBoolean operator>=(const RWWString&, const RWWString&);
Comparisons are done lexicographically, byte by byte. Case sensitivity is exact. Use member collate() or strxfrm() for locale sensitivity.
RWWString operator+(const RWWString&, const RWWString&); RWWString operator+(const wchar_t*, const RWWString&); RWWString operator+(const RWWString&, const wchar_t* );
Concatenation operators.
ostream& operator<<(ostream& s, const RWWString& str);
Output an RWWString on ostream s. Each character of str is first converted to a multibyte character before being shifted out to s.
istream& operator>>(istream& s, RWWString& str);
Calls str.readToken(s). That is, a token is read from the input stream s.
RWvostream& operator<<(RWvostream&, const RWWString& str); RWFile& operator<<(RWFile&, const RWWString& str);
Saves string str to a virtual stream or RWFile, respectively.
RWvistream& operator>>(RWvistream&, RWWString& str); RWFile& operator>>(RWFile&, RWWString& str);
Restores a wide character string into str from a virtual stream or RWFile, respectively, replacing the previous contents of str.
RWWString strXForm(const RWWString&);
Returns a string transformed by ::wsxfrm(), to allow quicker collation than RWWString::collate().
RWWString toLower(const RWWString& str);
Returns a version of str where all upper-case characters have been replaced with lower-case characters. Uses the C library function towlower().
RWWString toUpper(const RWWString& str);
Returns a version of str where all lower-case characters have been replaced with upper-case characters. Uses the C library function towupper().