P4DTI issue job000277

TitleConsistency checker and refresh script don't work with larger TeamTrack databases
Statusclosed
Prioritycritical
Assigned userGareth Rees
OrganizationTeamShare
DescriptionIf a TeamTrack database has several hundred issues in it, you can't run the consistency checker or refresh scripts. The P4DTI hangs for several minutes, then stops with an error from the server.
Using TeamTrack 4509, with Microsoft Access as the underlying database, we edited the sample TeamTrack database and added cases to it until it contained around 140 cases. When we ran the P4DTI consistency checking script it paused for a couple of minutes and then we got the following error from TeamShare. The same thing happened with the refresh script.
Traceback (most recent call last):
  File "C:\PROGRA~1\P4DTI\check.py", line 37, in ?
    r.check_consistency()
  File "C:\PROGRA~1\P4DTI\replicator.py", line 437, in check_consistency
    for issue in self.dt.all_issues():
  File "C:\PROGRA~1\P4DTI\dt_teamtrack.py", line 431, in all_issues
    for c in self.server.query(teamtrack.table['CASES'], query):
TeamShare API error: SERVER_ERROR: (no message from the TeamShare API)
AnalysisThe actual Microsoft Access database we discovered this defect in is available from [1].
We ran queries on our TeamTrack database using the ReadListWithWhere program in the Samples distributed with the TeamShare 4509 -- this rules out the possibility that the problem is to do with P4DTI code, or a mismatch between TeamTrack server and API versions.
We found that the command
./ReadListWIthWhere gannet.ravenbrook.com nick "" 1 "TS_ID < 115"
worked and returned 65 cases, but the command
./ReadListWIthWhere gannet.ravenbrook.com nick "" 1 "TS_ID < 120"
which should have returned 70 cases, just hung.

NDL 2001-03-29: The problem has been reproduced on both gannet and sandpiper in isolation (so it's not a network problem), using the ReadListWithWhere executable which TeamShare sent us. The problem is somewhat unpredictable - running the above with TS_ID < 110 fails first one or two times and then succeeds; restarting the server brings the problem back.

GDR 2001-04-19: TeamShare were able to reproduce the problem [3]. The cause is that the client runs out of memory while fetching records from the server, and then bugs in the TeamShare API prevent this error being caught and reported. [Note: this analysis is incorrect. GDR 2001-05-17] The TeamShare API builds large data structures: 600k or more for each returned record from the CASES table [5].
TeamShare have fixed the memory handling and socket bugs that made diagnosis difficult [4] [6]. We must rebuild the Python interface to use the updated API.
We must redesign the replicator so that it doesn't try to fetch all the records at once (this would have been necessary anyway once we came across customers with very large databases, but we didn't expect "very large" to be around 100). The modification must take account of the following:
1. The TeamShare API doesn't have a feature like a database cursor but it can be emulated: TeamShare have some advice about this [5], which I think can be improved by making more use of SQL features like sub-queries and functions like MAX, MIN, COUNT.
2. The defect_tracker interface methods "all_issues" and "changed_entities" will need modification (they will need to return cursor-like objects instead of lists). The latter needs special care since it makes two queries that have potentially large result sets (the query on the CHANGES table that gets the set of changed records, and then the query on the CASES table that gets the records themselves).
3. The fetching of issues one at a time complicates the replicator algorithms for replication and consistency checking. The current algorithms fetch many issues and jobs and then pair them up. The revised algorithm might do something like fetch issues and jobs in some specified order and use a merge-sort-like algorithm.
4. Something similar should be done on the Perforce side: we can't rely on being able to do "p4 jobs" into memory. However, this is not so urgent and could wait.
5. The P4DTI lacks requirements for performance and capacity. It lacks documentation of its performance and capacity limitations. (See job000300.)

GDR 2001-05-17: TeamShare's analysis of the problem was incorrect. The client was not running out of memory. In fact, the server was failing to respond to a query from the client. The client was hanging in a call to recv, and eventually the network timed out. There was a problem with the API which obscured the problem, but when the problem was corrected (with the sources sent to us in [4]) we get the error "SERVER_ERROR: Failure to receive return code: -1" resulting from the server's failure to respond, not the memory error that was implied by the analysis. See my report in section 3.3 of [7].
How foundmanual_test
Evidence[1] <http://www.ravenbrook.com/project/p4dt...-23/teamtrack-testcase/tTrackSample.mdb>
[2] <http://www.ravenbrook.com/project/p4dt...2001-03-23/teamtrack-testcase/local.bmp>
[3] <http://info.ravenbrook.com/mail/2001/04/03/22-29-15/0.txt>
[4] <http://info.ravenbrook.com/mail/2001/04/09/15-39-55/0.txt>
[5] <http://info.ravenbrook.com/mail/2001/04/18/08-28-40/0.txt>
[6] <http://www.ravenbrook.com/project/p4dti/import/2001-04-09/teamshare-api/>
[7] <http://www.ravenbrook.com/project/p4dt...city/design/python-teamtrack-interface/>
Observed in1.0.5
Test procedure<http://www.ravenbrook.com/project/p4dti/master/test/test_teamtrack.py>, section 2.4
Created byGareth Rees
Created on2001-03-23 17:43:23
Last modified byGareth Rees
Last modified on2001-12-10 19:35:57
History2001-03-23 GDR Created.
2001-03-25 RB Qualified the description to say that the problem is with larger TeamTrack databases.
2001-03-29 NDL Added link to local.bmp. Changed ' to " in commands under Analysis (to eliminate a red herring). Additional analysis.
2001-04-19 GDR Added analysis and references.
2001-05-17 GDR More analysis after thorough investigation.
2001-07-15 GDR Improved description.

Fixes

Change Effect Date User Description
12701 closed 2001-05-19 12:02:21 Gareth Rees Merged work from branch/2001-05-15/capacity to version/1.1.
12665 open 2001-05-18 10:40:23 Gareth Rees dt_teamtrack uses cursors whenever it makes a query that could return many results.
12656 open 2001-05-17 19:18:01 Gareth Rees Added cursor implementation to dt_teamtrack.
Replicator can now accept lists or cursors as result of all_issues and changed_entities. (As a consequence, it no longer produces messages 844-847 so these are removed from catalog, test_p4dti and Administrator's Guide.)
Integrator's Guide specifies that all_issues and changed_entities return cursors.