36. Tests

36.1. Introduction

.intro: This document contains a guide to the Memory Pool System tests.

.readership: This document is intended for any MPS developer.

36.2. Running tests

.run: Run these commands:

cd code
make -f <makefile> VARIETY=<variety> <target>  # Unix
nmake /f <makefile> VARIETY=<variety> <target> # Windows

where <makefile> is the appropriate makefile for the platform (see manual/build.txt), <variety> is the variety (see design.mps.config.var.codes) and <target> is the collection of tests (see .target below). For example:

make -f lii6ll VARIETY=cool testrun

If <variety> is omitted, tests are run in both the cool and hot varieties.

36.3. Test targets

.target: The makefiles provide the following targets for common sets of tests:

.target.testall: The testall target runs all test cases (even if known to fail).

.target.testrun: The testrun target runs the “smoke tests”. This subset of tests are quick checks that the MPS is working. They run quickly enough for it to be practical to run them every time the MPS is built.

.target.testci: The testci target runs the continuous integration tests, the subset of tests that are expected to pass in full-featured build configurations.

.target.testansi: The testansi target runs the subset of the tests that are expected to pass in the generic (“ANSI”) build configuration (see design.mps.config.opt.ansi).

.target.testpollnone: The testpollnone target runs the subset of the tests that are expected to pass in the generic (“ANSI”) build configuration (see design.mps.config.opt.ansi) with the option CONFIG_POLL_NONE (see design.mps.config.opt.poll).

.target.testratio: The testratio target compares the performance of the HOT and RASH varieties. See .ratio.

.target.testscheme: The testscheme target builds the example Scheme interpreter (example/scheme) and runs its test suite.

.target.testmmqa: The testmmqa target runs the tests in the MMQA test suite. See .mmqa.

36.4. Test features

.randomize: Each time a test case is run, it randomly chooses some of its parameters (for example, the sizes of objects, or how many links to create in a graph of references). This allows a fast test to cover many cases over time.

.randomize.seed: The random numbers are chosen pseudo-randomly based on a seed initialized from environmental data (the time and the processor cycle count). The seed is reported at test startup, for example:

code$ xci6ll/cool/apss
xci6ll/cool/apss: randomize(): choosing initial state (v3): 2116709187.
...
xci6ll/cool/apss: Conclusion: Failed to find any defects.

Here, the number 2116709187 is the random seed.

.randomize.specific-seed Each test can be run with a specified seed by passing the seed on the command line, for example:

code$ xci6ll/cool/apss 2116709187
xci6ll/cool/apss: randomize(): resetting initial state (v3) to: 2116709187.
...
xci6ll/cool/apss: Conclusion: Failed to find any defects.

.randomize.repeatable: This ensures that the single-threaded tests are repeatable. (Multi-threaded tests are not repeatable even if the same seed is used; see job003719.)

36.5. Test list

See manual/code-index for the full list of automated test cases.

.test.finalcv: Registers objects for finalization, makes them unreachable, deregisters them, etc. Churns to provoke minor (nursery) collection.

.test.finaltest: Creates a large binary tree, and registers every node. Drops the top reference, requests collection, and counts the finalization messages.

.test.zcoll: Collection scheduling, and collection feedback.

.test.zmess: Message lifecycle and finalization messages.

36.6. Test database

.db: The automated tests are described in the test database (tool/testcases.txt).

.db.format: This is a self-documenting plain-text database which gives for each test case its name and an optional set of features. For example the feature =P means that the test case requires polling to succeed, and therefore is expected to fail in build configurations without polling (see design.mps.config.opt.poll).

.db.format.simple: The format must be very simple because the test runner on Windows is written as a batch file (.bat), in order to avoid having to depend on any tools that are did not come as standard with Windows XP, and batch files are inflexible. (But note that we no longer support Windows XP, so it would now be possible to rewrite the test runner in PowerShell if we thought that made sense.)

.db.testrun: The test runner (tool/testrun.sh on Unix or tool/testrun.bat on Windows) parses the test database to work out which tests to run according to the target. For example the testpollnone target must skip all test cases with the P feature.

36.7. Test runner

.runner.req.automated: The test runner must execute without user interaction, so that it can be used for continuous integration.

.runner.req.output.pass: Test cases are expected to pass nearly all the time, and in these cases we almost never want to see the output, so the test runner must suppress the output for passing tests.

.runner.req.output.fail: However, if a test case fails then the test runner must preserve the output from the failing test, including the random seed (see .randomize.seed), so that this can be analyzed and the test repeated. Moreover, it must print the output from the failing test, so that if the test is being run on a continuous integration system (see .ci), then the output of the failing tests is included in the failure report. (See job003489.)

36.8. Performance test

.ratio: The testratio target checks that the hot variety is not too much slower than the rash variety. A failure of this test usually is expected to indicate that there are assertions on the critical path using AVER instead of AVER_CRITICAL (and so on). This works by running gcbench for the AMC pool class and djbench for the MVFF pool class, in the hot variety and the rash variety, computing the ratio of CPU time taken in the two varieties, and testing that this falls under an acceptable limit.

.ratio.cpu-time: Note that we use the CPU time (reported by /usr/bin/time) and not the elapsed time (as reported by the benchmark) because we want to be able to run this test on continuous integration machines that might be heavily loaded.

.ratio.platform: This target is currently supported only on Unix platforms using GNU Makefiles.

36.9. Adding a new test

To add a new test to the MPS, carry out the following steps. (The procedure uses the name “newtest” throughout but you should of course replace this with the name of your test case.)

.new.source: Create a C source file in the code directory, typically named “newtest.c”. In additional to the usual copyright boilerplate, it should contain a call to testlib_init() (this ensures reproducibility of pseudo-random numbers), and a printf() reporting the absence of defects (this output is recognized by the test runner):

#include <stdio.h>
#include "testlib.h"

int main(int argc, char *argv[])
{
  testlib_init(argc, argv);
  /* test happens here */
  printf("%s: Conclusion: Failed to find any defects.\n", argv[0]);
  return 0;
}

.new.unix: If the test case builds on the Unix platforms (FreeBSD, Linux and macOS), edit code/comm.gmk adding the test case to the TEST_TARGETS macro, and adding a rule describing how to build it, typically:

$(PFM)/$(VARIETY)/newtest: $(PFM)/$(VARIETY)/newtest.o \
        $(TESTLIBOBJ) $(PFM)/$(VARIETY)/mps.a

.new.windows: If the test case builds on Windows, edit code/commpre.nmk adding the test case to the TEST_TARGETS macro, and edit code/commpost.nmk adding a rule describing how to build it, typically:

$(PFM)\$(VARIETY)\newtest.exe: $(PFM)\$(VARIETY)\newtest.obj \
        $(PFM)\$(VARIETY)\mps.lib $(FMTTESTOBJ) $(TESTLIBOBJ)

.new.macos: If the test case builds on macOS, open code/mps.xcodeproj/project.pbxproj for edit and open this project in Xcode. If the project navigator is not visible at the left, select View → Navigators → Show Project Navigator (⌘1). Right click on the Tests folder and choose Add Files to “mps”…. Select code/newtest.c and then click Add. Move the new file into alphabetical order in the Tests folder. Click on “mps” at the top of the project navigator to reveal the targets. Select a test target that is similar to the one you have just created. Right click on that target and select Duplicate (⌘D). Select the new target and change its name to “newtest”. Select the “Build Phases” tab and check that “Dependencies” contains the mps library, and that “Compile Sources” contains newtest.c and testlib.c. Close the project.

.new.database: Edit tool/testcases.txt and add the new test case to the database. Use the appropriate flags to indicate the properties of the test case. These flags are used by the test runner to select the appropriate sets of test cases. For example tests marked =P are expected to fail in build configurations without polling (see design.mps.config.opt.poll).

.new.manual: Edit manual/source/code-index.rst and add the new test case to the “Automated test cases” section.

36.10. Continuous integration

[This section might need to become a document in its own right. CI has grown in importance and complexity. RB 2023-01-15]

.ci: Ravenbrook uses both GitHub CI and Travis CI for continuous integration of the MPS via GitHub.

[This section needs: definition of CI goals and requirements, what we need CI to do and why, how the testci target meets those requirements. ‘taint really a design without this. Mention how CI supports the pull request merge procedure (except that exists on a separate branch at the moment). RB 2023-01-15]

[Need to discuss compilers and toolchains. RB 2023-01-15]

.ci.run.posix: On Posix systems where we have autoconf, the CI services run commands equivalent to:

./configure
make install
make test

which execises the testci target, as defined by Makefile.in in the root of the MPS tree.

.ci.run.windows: On Windows the CI services run commands that do at least:

make /f w3i6mv.nmk all testci

as defined by the .ci.github.config.

.ci.run.other.targets: On some platforms we arrange to run the testansi, testpollnone, testratio, and testscheme targets. [Need to explain why, where, etc. RB 2023-01-15]

.ci.run.other.checks: We could also run various non-build checks using CI to check:

document formatting
shell script syntax

[In the branch of writing, these do not yet exist. They are the subject of GitHub pull request #113 of branch/2023-01-13/rst-check. When merged, they can be linked. RB 2023-01-15]

.ci.when:: CI is triggered on the mps GitHub repo by:

commits (pushes)
new pull requests
manually, using tools (see .ci.tools)

.ci.results: CI results are visible via the GitHub web interface:

in pull requests, under “Checks”,
on the branches page as green ticks or red crosses that link to details.

as well as in logs specific to the type of CI.

.ci.results.travis: Results from Travis CI can be found at the Travis CI build history for the MPS GitHub repo.

.ci.results.github: Results from GitHub CI can be found at build and test actions on the Actions tab at the Ravenbrook GitHub repo.

.ci.github: [Insert overview of GitHub CI here. RB 2023-01-15]

.ci.github.platforms: GitHub provides runners for Linux, Windows, and macOS, but only on x86_64. See .ci.travis.platforms for ARM64 and FreeBSD.

.ci.github.config: GitHub CI is configured using the build-and-test.yml file in the .github/workflows directory of the MPS tree.

.ci.travis: [Insert overview of Travis CI here. RB 2023-01-15]

.ci.travis.platforms: Where possible, we use GitHub CI for platforms, because Travis CI is slow and expensive. However GitHub CI does not provide ARM64 or FreeBSD, so we use Travis CI for those.

.ci.travis.config: Travis is configured using the .travis.yml file at top level of the MPS tree.

.ci.tools: The MPS tree contains some simple tools for managing CI without the need to install whole packages such as the GitHub CLI or Travis CI’s Ruby gem.

.ci.tools.kick: tool/github-ci-kick and tool/travis-ci-kick both trigger CI builds without the need to push a change or make a pull request in the mps GitHub repo. In particular, they are useful for applying CI to work that was pushed while CI was disabled, for whatever reason.

36.11. MMQA tests

.mmqa: The Memory Management Quality Assurance test suite is another suite of test cases.

.mmqa.why: The existence of two test suites originates in the departmental structure at Harlequin Ltd where the MPS was originally developed. Tests written by members of the Memory Management Group went into the code directory along with the MPS itself, while tests written by members of the Quality Assurance Group went into the test directory. (Conway’s Law states that “organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations” [Conway_1968].)

.mmqa.run: See test/README for how to run the MMQA tests.

36.12. Other tests

.coverage: The program tool/testcoverage compiles the MPS with coverage enabled, runs the smoke tests (.target.testrun) and outputs a coverage report.

.opendylan: The program tool/testopendylan pulls Open Dylan from GitHub and builds it against the MPS.

36.13. References

[Conway_1968]

“How do Committees Invent?”; Melvin E. Conway; Datamation 14:5, pp. 28–31; April 1968; <http://www.melconway.com/Home/Committees_Paper.html>