P4DTI issue job000385

Title	Renumbered changelist causes P4DTI error and deletes fix
Status	closed
Priority	critical
Assigned user	Nick Barnes
Organization	Ravenbrook
Description	If a changelist fixing a job is submitted at the same time as a replication of that job is taking place, a race condition may occur which causes the fix record to be deleted and an error message to be sent by email. Here is such an email: > Subject: (P4DTI-8603) Job 'job000093' overwritten by issue '87'. > > (P4DTI-8658) This is an automatically generated e-mail from the Perforce > Defect Tracking Integration replicator 'replicator0'. > > (P4DTI-8512) The replicator failed to replicate Perforce job 'job000093' to > defect tracker issue '87', because of the following problem: > > (P4DTI-7065) Change 4833 unknown. > The Perforce client exited with error code 256. > > (P4DTI-8614) The replicator has therefore overwritten Perforce job 'job000093' > with defect tracker issue '87'. See section 2.2 of the P4DTI User Guide for > more information. > > (P4DTI-8625) The job looked like this before being overwritten:
Analysis	This is a race condition, due to Perforce's lack of locks on jobs. The sequence of events would be something like this: user: p4 fix -c 4833 job000093 P4DTI: -- start replication poll. P4DTI: p4 logger -> #job job000093 P4DTI: -- try to replicate job000093 P4DTI: -- get fixes for replication: P4DTI: p4 fixes -j job000093 -> 4833 user: p4 submit (renumbering 4833 as 4835) P4DTI: -- get 4833 for replication: P4DTI: p4 change -o 4833 This will only occur if such a changelist is submitted between the P4DTI issuing "p4 fixes -j job000093" and "p4 change -o 4833", during a single replication poll. So race conditions such as this are quite rare. When the P4DTI detects a Perforce failure here, it applies its "defect tracker wins" policy and overwrites the Perforce job record with the defect tracker issue, which includes deleting the fix record for the job (because that fix record was not successfully replicated to the defect tracker). When race conditions such as this occur, the P4DTI will send an error email and the user or administrator may have to redo a Perforce operation (the fix, in this case). A number of possible solutions when "p4 change -o" fails when replicating fixes for a job: 1. Disregard that fix record and do not replicate it to the defect tracker. The next replicator poll will fix it up (because the submit puts the job into the log again, the "p4 fixes -j" call will produce the new changelist number, and the "p4 change -o" will succeed). 2. Fake a changelist record for replication to the defect tracker. Again, the next replicator poll will fix this up. 3. Backtrack to the "p4 fixes -j" call. If the same changelist number is returned again, there is a serious Perforce error and the P4DTI should report it. If it is not, we have hit this race condition and may proceed once more with replicating the fix records for this job. Solutions 1 and 2 have the problem that a change in the defect tracker issue before the next poll will cause the fix record for the new changelist number to be deleted (because of the "defect tracker wins" policy).
How found	customer
Evidence	<`http://info.ravenbrook.com/mail/2001/08/20/22-27-38/0.txt`>
Observed in	1.1.1
Test procedure	<`http://www.ravenbrook.com/project/p4dti/master/test/test_p4dti.py`>, section 13
Created by	Nick Barnes
Created on	2001-08-21 16:11:53
Last modified by	Nick Barnes
Last modified on	2002-04-09 11:35:23
History	2001-08-21 NB Created. 2001-10-03 GDR Fixed with solution 3 in migration branch in which the customer was experiencing the defect.

Fixes

Change	Effect	Date	User	Description
22964	closed	2001-10-04 15:49:00	Gareth Rees	Merged Bugzilla migration branch back to master sources. This adds migration algorithms to the replicator, and support for migration from Perforce to Bugzilla. Moved config.py from the migration branch (which had the configuration for Xebeo) to test/config_xebeo.py so that it doesn't conflict with the real config.py. The P4DTI unit tests call start_logger() explicitly, because replicator.init() no longer does so. The catalag use test ignores all whitespace differences (not just different numbers of spaces) when comparing messages.
22917	open	2001-10-03 15:36:36	Gareth Rees	Handle renumbered changelist race condition during replication of fixes from Perforce to the defect tracker.