P4DTI issue job000135

TitleReplicator makes no more progress if replication to Perforce fails
Statusclosed
Priorityessential
Assigned userGareth Rees
OrganizationRavenbrook
DescriptionIf replication from the defect tracker to Perforce fails for whatever reason, the replicator gets into a loop where it keeps e-mailing the administrator.
AnalysisThis is because such an error is caught in the replicator's run() method. This bypasses the bit in replicate() where it marks the changed issues as done. So each time it polls it gets the same changed issue and encounters the same problem. The administrator's mailbox will fill up with messages, one every 10 seconds, and no further replication will happen until the administrator takes action.
However, the replicator shouldn't just mark the changes as done, because the Perforce server might be down, and we should just wait until the Perforce server comes back up again. We don't want to leave the database in an inconsistent state.
Backing off (polling at less frequent intervals) might be sensible when this happens -- but a naive exponential backoff strategy will mean that the replicator doesn't start up again until 10*2^n seconds after the problem is fixed.
See also <http://www.ravenbrook.com/project/p4dt...2001-02-22/version-1.0-support/#job-135>.
How foundmanual_test
EvidenceTest report for 0.4.0, 2000-12-06, item 2.
Observed in0.4.0
Created byGareth Rees
Created on2000-12-06 13:35:39
Last modified byGareth Rees
Last modified on2001-12-10 19:12:16
History2000-12-06 GDR Created.
Support

Advice for all releases.

Symptom. The integration administrator receives a sequence of automatically generated e-mails from the replicator that look like this:

The replicator failed to poll successfully, because of the following problem:

Exception:
TeamShare API error: SOCKET_CONNECT_FAILED: Socket Connect failed.

A variety of errors may appear here.

Cause. The replicator has become stuck and can make no more progress. Possibly it can't connect to one or both of the servers; possibly it can't replicate issues because it has been misconfigured. The replicator backs off, doubling its waiting time each time it encounters the problem. When the problem is fixed it will continue normally.

Solution. The replicator can't continue until the problem is fixed. The administrator must identify and fix the problem. Section 11.2 of the P4DTI Administrator's Guide is a good place to start.

Fixes

Change Effect Date User Description
8839 closed 2001-02-21 16:56:03 Gareth Rees The replicator backs off exponentially if it fails to poll successfully, so as not to mailbomb the administrator.