SourcePro : Introduction to SourcePro® : SourcePro Core : An Example Using the Internationalization Module with the Threads Module and the C++ Standard Library
An Example Using the Internationalization Module with the Threads Module and the C++ Standard Library
This example uses the Internationalization and Threads Modules with the C++ Standard Library to merge two sorted lists into a single sorted list. The input files are encoded in UTF-8 and the resulting list is written to the standard output also in UTF-8. In the program, the strings are converted to UTF-16 as required by the classes of the Internationalization Module. The RWUCollator can then sort the lists using the Unicode Collation Algorithm based on any specified locale.
The standard library supplies the classes for program input and output. The Threads Module is used to implement the producer/consumer model through thread functions (RWThreadFunction) and synchronized queues (RWTPCValQueue). This allows concurrent processing of the two input streams, which could significantly improve performance for large data sets.
Example 2 – Using the Internationalization Module with the Threads Module and the C++ Standard Library
 
// Concurrent Merge Example
 
// Hard-coded input file names
#define INPUT_FILE_1 "sorted1.dat"
#define INPUT_FILE_2 "sorted2.dat"
 
// From the Threads Module
#include <rw/functor/rwBind.h>
#include <rw/itc/RWTPCValQueue.h>
#include <rw/sync/RWCriticalSection.h>
 
// From the Internationalization Module
#include <rw/i18n/RWUCollator.h>
#include <rw/i18n/RWUFromUnicodeConversionContext.h>
#include <rw/i18n/RWULocale.h>
#include <rw/i18n/RWUString.h>
#include <rw/i18n/RWUToUnicodeConversionContext.h>
 
// From the C++ Standard Library
#include <fstream>
#include <iostream>
using std::cerr;
using std::cout;
using std::endl;
using std::ifstream;
 
static const RWUString DONE;
 
// Create a typedef for a synchronized queue of RWUString objects
typedef RWTPCValQueue<RWUString> SQueue;
 
// The Producer function represents the producer in the
// producer/consumer model. The producer uses a specified input
// stream to read strings encoded in UTF-8. The strings are
// converted from UTF-8 to UTF-16 using an
// RWUToUnicodeConversionContext. The converted strings are
// then written to an RWTPCValQueue, for consumption by a consumer.
// Once the end of the file is reached, a well-known string, DONE,
// is written into the queue to tell the consumer that there is no
// more data from this producer. This example creates two producer
// instances, one for each of two sorted input files.
 
void Producer(ifstream& input, SQueue& queue)
{
// Create the UTF-8 conversion context
RWUToUnicodeConversionContext context("UTF-8");
RWUString s;
 
// As long as we have not reached the end of the file...
while (!input.eof()) {
// Read a string, converting from UTF-8 to UTF-16
input >> s;
// Write the string to the queue
queue.write(s);
}
 
// Write DONE to the queue to mark the end of data
queue.write(DONE);
}
 
// The Consumer function represents the consumer in the
// producer/consumer model. A conversion context is created to
// convert the producer strings from UTF-16 to UTF-8. As strings
// are read from the producers, two strings (one from each
// producer) are compared using a RWUCollator at Secondary
// strength. (Secondary strength considers differences in
// basic character identity, and possibly diacritics. Differences
// in case are ignored at the secondary level). The lesser of the
// two strings is written to the output stream, preserving the
// ordering of the strings in the final output file. When the
// strings are written to the output stream, the conversion context
// is used to convert the strings from UTF-16 to UTF-8.
 
void Consumer(SQueue& q1, SQueue& q2, const RWULocale& locale)
{
// Instantiate a conversion context for converting from
// UTF-16 to UTF-8
RWUFromUnicodeConversionContext context("UTF-8");
 
// Create an RWUCollator using the default locale, and
// then set its strength to secondary, which considers
// differences in basic character identity, and possibly
// diacritics. Differences in case are ignored at the
// secondary level
RWUCollator collator(locale);
collator.setStrength(RWUCollator::Secondary);
 
// Obtain one string from each of the two producers
RWUString str1 = q1.read();
RWUString str2 = q2.read();
 
// Initialize the done flags, based on the strings
// read from the queues
bool q1Done = str1 == DONE;
bool q2Done = str2 == DONE;
 
// As long as there is data in either of the queues...
while (!(q1Done && q2Done)) {
// If queue 1 is done, or if the string from queue
// 1 is greater than or equal to the string from queue 2,
// then write the string from queue 2. Update the queue 2
// string and flag
if (q1Done ||
(!q2Done && collator.compareTo(str1, str2) >= 0)) {
cout << str2 << endl;
str2 = q2.read();
q2Done = str2 == DONE;
}
// Else if queue 2 is done, or if the string from queue
// 1 is less than the string from queue 2, then write the
// string from queue 2. Update the queue 2 string and flag
else if (q2Done ||
(!q1Done && collator.compareTo(str1, str2) < 0)) {
cout << str1 << endl;
str1 = q1.read();
q1Done = str1 == DONE;
}
}
}
 
// Main
int main(int argc, char* argv[])
{
// Create the input files
ifstream input1(INPUT_FILE_1);
if (!input1) {
cerr << "Unable to open " << INPUT_FILE_1 << ", aborting."
<< endl;
return 1;
}
 
ifstream input2(INPUT_FILE_2);
if (!input2) {
cerr << "Unable to open " << INPUT_FILE_2 << ", aborting."
<< endl;
return 1;
}
 
// Create a locale for use in collating the input strings. By
// default, use the United States English locale. If a locale
// is given on the command line, then use it.
RWULocale locale("en_US");
if (argc > 1) {
locale = RWULocale(argv[1]);
}
 
// Create two synchronized queues, one for each producer
SQueue q1(10);
SQueue q2(10);
 
// Create the producers
RWThread producer1 =
RWThreadFunction::make(rwBind(Producer,
rwRef(input1),
rwRef(q1)));
 
RWThread producer2 =
RWThreadFunction::make(rwBind(Producer,
rwRef(input2),
rwRef(q2)));
 
// Create the consumer
RWThread consumer =
RWThreadFunction::make(rwBind(Consumer,
rwRef(q1),
rwRef(q2),
rwRef(locale)));
 
// Write a message to show the start of the merge
cout << "Starting merge..." << endl << endl;
 
// Start the producers and the consumer
producer1.start();
producer2.start();
consumer.start();
 
// Wait for all threads to terminate
producer1.join();
producer2.join();
consumer.join();
 
// Write a message to mark the end of the merge
cout << endl << "Done." << endl;
 
// Return success
return 0;
}
 
Program Input:
sorted1.dat
sorted2.dat
agua
azul
blanco
cabeza
chorizo
despues
familia
limpio
luna
madre
nombre
padre
rosa
ahora
azur
blanco
caliente
curioso
donde
hombre
llama
luz
mano
oreja
rojo
Program output:
The two outputs below demonstrate the difference in sorting based on differing locales. The first uses the default U.S. English locale, the second the Spanish traditional locale, which the example allows you to input at the command line. The two words treated differently are chorizo and llama.