P4DTI issue job002053

Title	Unicode replication fails
Status	closed
Priority	essential
Assigned user	Nick Barnes
Organization	Ravenbrook
Description	Both Bugzilla and Perforce can store Unicode issue data, but the P4DTI replicator fails when attempting to replicate such data between the two systems.
Analysis	The P4DTI should use Unicode strings internally for all issue data. To correctly read Unicode objects from Perforce, we should specify -C utf8 (and then decode the UTF8 encoded objects using Python's Unicode facilities). To write Unicode objects, we need to specify -C utf8 and encode the objects in UTF8 on their way out of the replicator. On a non-Unicode server, we should do UTF8 encoding and decoding anyway, and the server will preserve 8-bit data. This will enable us to preserve Unicode data on a round-trip from Bugzilla even when the Perforce server is not in Unicode mode. For the Bugzilla interface, the MySQL Python user interface (version 1.2.1 or later) can handle Unicode objects (and we can specify UTF8 encoding), but the existing P4DTI usage of the interface embeds all values in the query string, which breaks for Unicode values. We need to use one of the DB API-2.0 quoting mechanisms (see PEP-249), for instance the pyformat style (using %s for strings), which will allow us to pass the Unicode strings in directly to MySQLdb. There are a few other places in which internal issue values interact with the outside world: the messaging subsystem, email, and the automated test suite. These need to be updated to cope with Unicode strings.
How found	customer
Evidence	I just know.
Observed in	2.4.2
Created by	Nick Barnes
Created on	2009-03-03 14:45:19
Last modified by	Nick Barnes
Last modified on	2009-03-03 14:45:49
History	2009-03-03 NB Created.

Fixes

Change	Effect	Date	User	Description
165234	closed	2008-06-11 17:27:32	Nick Barnes	Additional fixes to address test suite issues on unicode branch.