P4DTI issue job002053

TitleUnicode replication fails
Statusclosed
Priorityessential
Assigned userNick Barnes
OrganizationRavenbrook
DescriptionBoth Bugzilla and Perforce can store Unicode issue data, but the P4DTI replicator fails when attempting to replicate such data between the two systems.
AnalysisThe P4DTI should use Unicode strings internally for all issue data.
To correctly read Unicode objects from Perforce, we should specify -C utf8 (and then decode the UTF8 encoded objects using Python's Unicode facilities). To write Unicode objects, we need to specify -C utf8 and encode the objects in UTF8 on their way out of the replicator. On a non-Unicode server, we should do UTF8 encoding and decoding anyway, and the server will preserve 8-bit data. This will enable us to preserve Unicode data on a round-trip from Bugzilla even when the Perforce server is not in Unicode mode.
For the Bugzilla interface, the MySQL Python user interface (version 1.2.1 or later) can handle Unicode objects (and we can specify UTF8 encoding), but the existing P4DTI usage of the interface embeds all values in the query string, which breaks for Unicode values. We need to use one of the DB API-2.0 quoting mechanisms (see PEP-249), for instance the pyformat style (using %s for strings), which will allow us to pass the Unicode strings in directly to MySQLdb.
There are a few other places in which internal issue values interact with the outside world: the messaging subsystem, email, and the automated test suite. These need to be updated to cope with Unicode strings.
How foundcustomer
EvidenceI just know.
Observed in2.4.2
Created byNick Barnes
Created on2009-03-03 14:45:19
Last modified byNick Barnes
Last modified on2009-03-03 14:45:49
History2009-03-03 NB Created.

Fixes

Change Effect Date User Description
165234 closed 2008-06-11 17:27:32 Nick Barnes Additional fixes to address test suite issues on unicode branch.