Performance analysis of TeamShare API workarounds

Gareth Rees, Ravenbrook Limited, 2001-05-16

1. Introduction

This document describes workarounds for the lack of capacity in the TeamShare API, and analyzes their performance.

The intended readership is P4DTI project staff.

This document is not confidential.

2. Problem background

The TeamShare API has inadequate capacity to return a list of cases from a large database; see job000277 and job000278.

This is because the TeamShare API takes around 600k to represent a case in memory [Schreiber 2001-04-03].

In order to stop this happening, we need to redesign queries so that they only return a few records at a time. Essentially, we must implement a database cursor.

3. Four solutions

Query the database and get all the results at once. This is what we do at the moment.

The problem with this is that it won't work for very many records. A safe limit might be as low as 50.
Break the query down into a sequence of queries with WHERE clauses like this:
```
TS_ID > -1 AND TS_ID <= 20 AND query
TS_ID > 20 AND TS_ID <= 40 AND query
TS_ID > 40 AND TS_ID <= 60 AND query
...
```
Each query gets a chunk of records, from zero to n records (here n = 20 for illustration).

This was recommended by TeamShare [Shaw 2001-04-16].

A problem with this approach is that individual queries may return no records. So it may take a long time to find query results that are sparsely distributed in a large table.
Break the query down into a sequence of queries with WHERE clauses like this:
```
TS_ID = (SELECT MIN(TS_ID) FROM TS_CASES WHERE TS_ID > -1 AND query)
TS_ID = (SELECT MIN(TS_ID) FROM TS_CASES WHERE TS_ID > 5 AND query)
TS_ID = (SELECT MIN(TS_ID) FROM TS_CASES WHERE TS_ID > 8 AND query)
...
```
Each query gets the next record in the query (here the first two records that match the query are records 5 and 8).

A problem with this approach is that it only gets one record at a time, so it has to issue as many queries as there are records that match the query.
Break the query down into a sequence of queries with WHERE clauses like this:
```
(query) AND TS_ID BETWEEN (SELECT MIN(TS_ID)      FROM TS_CASES WHERE TS_ID > -1 AND (query))
                      AND (SELECT MIN(TS_ID) + 19 FROM TS_CASES WHERE TS_ID > -1 AND (query))
(query) AND TS_ID BETWEEN (SELECT MIN(TS_ID)      FROM TS_CASES WHERE TS_ID > 36 AND (query))
                      AND (SELECT MIN(TS_ID) + 19 FROM TS_CASES WHERE TS_ID > 36 AND (query))
...
```
Each query gets a chunk of records, from one to n records (here n = 20 for illustration).

Note that even if the first record in the table is greater than than chunk size, we might still get up to n records from the query; for example in the illustration the last record we got was record 36, even though n = 20.

A disadvantage of this approach is that the database must issue three select queries for each chunk of results.

As far as we know, there's no way to write an ANSI SQL query that returns just the first n records matching some criteria [NB 2001-05-16]. Some databases provide a "row number" feature, but this isn't portable.

4. Results

Because the approaches obviously vary in performance depending on features of the database, I've tested them in several configurations, as shown in table 1.

The tests were run on a Windows NT machine with 128Mb of physical memory and 300Mb of virtual memory using TeamTrack 4.5 (build 4509) and the TeamTrack web server. The TeamTrack server ran on the same machine as the test client. The times are given to the nearest second. Really, I should have run the tests many times and presented average values, but life is too short. Solutions 1 and 3 don't depend on the chunk size, hence the repeated values.

Table 1. Performance of capacity solutions
Query description	Number of query results	Chunk size	Elapsed time for solution (seconds)
Query description	Number of query results	Chunk size	sol. 1	sol. 2	sol. 3	sol. 4
Query results are together near the start of the table.	50	5	12	14	27	14
	50	20	"	15	"	12
	50	50	"	12	"	14
Query results are together near the end of the table (there are 1000 earlier records that are not returned by the query but which may need to be skipped).	50	5	14	34	26	12
	50	20	"	29	"	15
	50	50	"	24	"	17
	100	5	89	103	59	22
	100	20	"	44	"	26
	100	50	"	34	"	31
	100	100	"	117^b	"	87^b
	200	5	N/A^a	171^b	108	44
	200	20	"	65	"	40
	200	50	"	85	"	56
	200	100	"	206^b	"	243^b
	400	5	N/A^c	N/A^a	N/A^c	93
	400	20	"	108	"	79
	400	50	"	121	"	102
Query results are distributed sparsely in thtable (constituting about 10% of all the records).	50	5	15	46	33^b	27
	50	20	"	22	"	14
	50	50	"	12	"	12
	400	5	N/A^c	N/A^a	N/A^c	203
	400	20	"	"	"	132
	400	50	"	"	"	88

Notes

This query failed with the error "Failure to receive return code: -1".
This query failed with the error "Failure to receive return code: -1" but was later run successfully.
I didn't run this query.

The second set of tests seeks to find a good chunk size for solution 4 by running that solution with different chunk sizes on a query that returns 200 records. These records are together at the start of the table. The test results are shown in table 2.

Table 2. Performance of solution 4 by chunk size
Chunk size	5	10	15	20	25	30	35	40	45	50
Elapsed time	44	42	40	40	47	51	51	46	46	56

5. Conclusions

5.1. Analysis of results

The results are mostly straightforward:

Solution 1 is no good because it simply doesn't work when the result size is large.
Solution 4 is consistently better than 2 and 3, and it's a lot better than solution 2 when the results are at the end of a long table. In fact, it's the only solution that's remotely acceptable when there are many results sparsely distributed in the table.
When the results are together in the table, solution 4 does best with chunk size of 20.
When the results are sparse in the table, solution 4 does better the larger the chunk size.

So what are the queries that we issue?

Filespecs for a case in the VCACTIONS table (sparse, small result set).
Fixes for a case in the VCACTIONS table (sparse, small result set).
All issues being replicated (large result set, together, near end of table if using start_date configuration parameter).
Changes that haven't been dealt with (large result set, together, near end of table).
Cases needing replication (large result set, sparse).
Auxiliary table entries: projects, users, fields, selections etc. (large result set, together at start of table).

5.2. Recommendations

TeamShare should extend their API to make it more convenient to process large query results. It would be nice to have a full cursor implementation, but if that's too expensive to implement then we could cope with a function like this:

ReadInitialRecordListWithWhere(TSRecordList* list, int tableId, const char* whereClause, size_t maxRecords)

Return the first maxRecords records from the table specified by tableId that match the conditions in the whereClause (or all the records that match if there are fewer than maxRecord). The whereClause must either be the empty string (meaning match all records) or else be SQL code suitable to appear after WHERE in a SQL query.
Until TeamShare implement recommendation 1, use solution 4 with a chunk size of 20.

A. References

[NB 2001-05-16]	"Re: Initial results from breaking down the query"; Nick Barnes; Ravenbrook Limited; 2001-05-16.
[Schreiber 2001-04-03]	"Re: Serious problems in the TeamShare API" (e-mail message); Royce Schreiber; TeamShare; 2001-04-03.
[Shaw 2001-04-16]	"Notes on memory usage in the API" (e-mail message); Kelly Shaw; TeamShare; 2001-04-16.
[TeamShare 2001-05-02]	"TeamShare API Reference Guide"; TeamShare; 2001-05-02.

B. Document History

2001-05-16	GDR	Created.
2001-05-17	GDR	Added recommendation for TeamShare.

Copyright © 2001 Ravenbrook Limited. This document is provided "as is", without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this document. You may make and distribute verbatim copies of this document provided that you do not charge a fee for this document or for its distribution.

$Id: //info.ravenbrook.com/project/p4dti/doc/2001-05-16/teamtrack-performance/index.html#8 $

Ravenbrook / Projects / Perforce Defect Tracking Integration / Project Documents