P4DTI issue job000135

Title	Replicator makes no more progress if replication to Perforce fails
Status	closed
Priority	essential
Assigned user	Gareth Rees
Organization	Ravenbrook
Description	If replication from the defect tracker to Perforce fails for whatever reason, the replicator gets into a loop where it keeps e-mailing the administrator.
Analysis	This is because such an error is caught in the replicator's run() method. This bypasses the bit in replicate() where it marks the changed issues as done. So each time it polls it gets the same changed issue and encounters the same problem. The administrator's mailbox will fill up with messages, one every 10 seconds, and no further replication will happen until the administrator takes action. However, the replicator shouldn't just mark the changes as done, because the Perforce server might be down, and we should just wait until the Perforce server comes back up again. We don't want to leave the database in an inconsistent state. Backing off (polling at less frequent intervals) might be sensible when this happens -- but a naive exponential backoff strategy will mean that the replicator doesn't start up again until 10*2^n seconds after the problem is fixed. See also <`http://www.ravenbrook.com/project/p4dt...2001-02-22/version-1.0-support/#job-135`>.
How found	manual_test
Evidence	Test report for 0.4.0, 2000-12-06, item 2.
Observed in	0.4.0
Created by	Gareth Rees
Created on	2000-12-06 13:35:39
Last modified by	Gareth Rees
Last modified on	2001-12-10 19:12:16
History	2000-12-06 GDR Created.
Support	Advice for all releases. Symptom. The integration administrator receives a sequence of automatically generated e-mails from the replicator that look like this: `The replicator failed to poll successfully, because of the following problem:` `Exception: TeamShare API error: SOCKET_CONNECT_FAILED: Socket Connect failed.` A variety of errors may appear here. Cause. The replicator has become stuck and can make no more progress. Possibly it can't connect to one or both of the servers; possibly it can't replicate issues because it has been misconfigured. The replicator backs off, doubling its waiting time each time it encounters the problem. When the problem is fixed it will continue normally. Solution. The replicator can't continue until the problem is fixed. The administrator must identify and fix the problem. Section 11.2 of the P4DTI Administrator's Guide is a good place to start.

Fixes

Change	Effect	Date	User	Description
8839	closed	2001-02-21 16:56:03	Gareth Rees	The replicator backs off exponentially if it fails to poll successfully, so as not to mailbomb the administrator.