P4DTI issue job000385

TitleRenumbered changelist causes P4DTI error and deletes fix
Statusclosed
Prioritycritical
Assigned userNick Barnes
OrganizationRavenbrook
DescriptionIf a changelist fixing a job is submitted at the same time as a replication of that job is taking place, a race condition may occur which causes the fix record to be deleted and an error message to be sent by email.
Here is such an email:

> Subject: (P4DTI-8603) Job 'job000093' overwritten by issue '87'.
>
> (P4DTI-8658) This is an automatically generated e-mail from the Perforce
> Defect Tracking Integration replicator 'replicator0'.
>
> (P4DTI-8512) The replicator failed to replicate Perforce job 'job000093' to
> defect tracker issue '87', because of the following problem:
>
> (P4DTI-7065) Change 4833 unknown.
> The Perforce client exited with error code 256.
>
> (P4DTI-8614) The replicator has therefore overwritten Perforce job 'job000093'
> with defect tracker issue '87'. See section 2.2 of the P4DTI User Guide for
> more information.
>
> (P4DTI-8625) The job looked like this before being overwritten:
Analysis
This is a race condition, due to Perforce's lack of locks on jobs.
The sequence of events would be something like this:

   user: p4 fix -c 4833 job000093
   P4DTI: -- start replication poll.
   P4DTI: p4 logger -> #job job000093
   P4DTI: -- try to replicate job000093
   P4DTI: -- get fixes for replication:
   P4DTI: p4 fixes -j job000093 -> 4833
   user: p4 submit (renumbering 4833 as 4835)
   P4DTI: -- get 4833 for replication:
   P4DTI: p4 change -o 4833

This will only occur if such a changelist is submitted between the P4DTI issuing "p4 fixes -j job000093" and "p4 change -o 4833", during a single replication poll. So race conditions such as this are quite rare.
When the P4DTI detects a Perforce failure here, it applies its "defect tracker wins" policy and overwrites the Perforce job record with the defect tracker issue, which includes deleting the fix record for the job (because that fix record was not successfully replicated to the defect tracker).
When race conditions such as this occur, the P4DTI will send an error email and the user or administrator may have to redo a Perforce operation (the fix, in this case).
A number of possible solutions when "p4 change -o" fails when replicating fixes for a job:
1. Disregard that fix record and do not replicate it to the defect tracker. The next replicator poll will fix it up (because the submit puts the job into the log again, the "p4 fixes -j" call will produce the new changelist number, and the "p4 change -o" will succeed).
2. Fake a changelist record for replication to the defect tracker. Again, the next replicator poll will fix this up.
3. Backtrack to the "p4 fixes -j" call. If the same changelist number is returned again, there is a serious Perforce error and the P4DTI should report it. If it is not, we have hit this race condition and may proceed once more with replicating the fix records for this job.
Solutions 1 and 2 have the problem that a change in the defect tracker issue before the next poll will cause the fix record for the new changelist number to be deleted (because of the "defect tracker wins" policy).
How foundcustomer
Evidence<http://info.ravenbrook.com/mail/2001/08/20/22-27-38/0.txt>
Observed in1.1.1
Test procedure<http://www.ravenbrook.com/project/p4dti/master/test/test_p4dti.py>, section 13
Created byNick Barnes
Created on2001-08-21 16:11:53
Last modified byNick Barnes
Last modified on2002-04-09 11:35:23
History2001-08-21 NB Created.
2001-10-03 GDR Fixed with solution 3 in migration branch in which the customer was experiencing the defect.

Fixes

Change Effect Date User Description
22964 closed 2001-10-04 15:49:00 Gareth Rees Merged Bugzilla migration branch back to master sources. This adds migration algorithms to the replicator, and support for migration from Perforce to Bugzilla.
Moved config.py from the migration branch (which had the configuration for Xebeo) to test/config_xebeo.py so that it doesn't conflict with the real config.py.
The P4DTI unit tests call start_logger() explicitly, because replicator.init() no longer does so.
The catalag use test ignores all whitespace differences (not just different numbers of spaces) when comparing messages.
22917 open 2001-10-03 15:36:36 Gareth Rees Handle renumbered changelist race condition during replication of fixes from Perforce to the defect tracker.