|Title||Unicode replication fails|
|Assigned user||Nick Barnes|
|Description||Both Bugzilla and Perforce can store Unicode issue data, but the P4DTI replicator fails when attempting to replicate such data between the two systems.|
|Analysis||The P4DTI should use Unicode strings internally for all issue data.|
To correctly read Unicode objects from Perforce, we should specify -C utf8 (and then decode the UTF8 encoded objects using Python's Unicode facilities). To write Unicode objects, we need to specify -C utf8 and encode the objects in UTF8 on their way out of the replicator. On a non-Unicode server, we should do UTF8 encoding and decoding anyway, and the server will preserve 8-bit data. This will enable us to preserve Unicode data on a round-trip from Bugzilla even when the Perforce server is not in Unicode mode.
For the Bugzilla interface, the MySQL Python user interface (version 1.2.1 or later) can handle Unicode objects (and we can specify UTF8 encoding), but the existing P4DTI usage of the interface embeds all values in the query string, which breaks for Unicode values. We need to use one of the DB API-2.0 quoting mechanisms (see PEP-249), for instance the pyformat style (using %s for strings), which will allow us to pass the Unicode strings in directly to MySQLdb.
There are a few other places in which internal issue values interact with the outside world: the messaging subsystem, email, and the automated test suite. These need to be updated to cope with Unicode strings.
|Evidence||I just know.|
|Created by||Nick Barnes|
|Created on||2009-03-03 14:45:19|
|Last modified by||Nick Barnes|
|Last modified on||2009-03-03 14:45:49|
|History||2009-03-03 NB Created.|
|165234||closed||2008-06-11 17:27:32||Nick Barnes||Additional fixes to address test suite issues on unicode branch.|