MPS/MMREF import procedures

	      Nick Levine, Ravenbrook Limited, 2001-08-15


1. INTRODUCTION

This is the procedure for the one-off (but see section 1.1 for
qualification of that) import of data from the HOPE/RCS hierarchy
[Global Graphics 2001-08-13] into Ravenbrook's perforce repository
under /project/mps.

This procedure takes four .tar.gz archives from Global Graphics,
representing the revision history of the MPS and MMRef projects, and
merges those projects into Ravenbrook's Perforce repository.

(This procedure includes the import of data from the MMref project as it
is related and any faults in the procedure form one project will
probably need fixing identically in the other. Also, we're going to make
interleaving of changes even nastier if we do the two seperately.)

The purpose is to allow Ravenbrook to develop and maintain the MPS and
MMRef projects.

The intended readership is Ravenbrook development staff.

This document is not confidential.

1.1. Status

The procedure was first run on 2001-08-17. It took about 2.5 hours,
most of which was spent doing the backups described in section 3.1,
items 3 and 4 below. Sections 3.4, 3.5 and 3.6 were carried out while
this was going on; this part of the procedure took less than an hour.

This document was updated after the procedure was run, as the merge
with our central system had not been fully tested in advance.

Numerous problems emerged after the merge (see for example [NB
2001-08-24]); after investigation we settled on surgery:
  a) of RCS files to remove checkpoint labels, as these confuse the
  rcstoperf script
  b) of the resulting checkpoint, which still contained bogus and
  broken entries [NB 2001-09-10].
We may decide to restore checkpoint labels later as a seperate
operation.

This document now also desribes how to merge the imported material
into Ravenbrook's Perforce repository given that a previous (but
broken) version of that material is already there.


2. HOW THE MERGE WORKS

2.1. Paths and branches

Paths in the original look like

   mps/src/misc.h
   mmref/src/diagrams/address.png

For the trunk branch (revisions of the form 1.x), these gets mapped to

   //info.ravenbrook.com/project/mps/branch/2001-08-13/trunk/src/misc.h
   //info.ravenbrook.com/project/mmref/branch/2001-08-13/trunk/src/diagrams/address.png


2.2. Branches

There are no branches in the MMRef project.  In the MPS, each branch
gets mapped to

   //info.ravenbrook.com/project/mps/branch/DATE/NAME/...

where DATE is NDL's best guess as to when the branch was made [NDL
2001-08-14b] and NAME has been generated by the RCS to Perforce script,
from branch labels in the RCS files.


2.3. Changelists and dates

Changelists will run sequentially from the most recent change in
Ravenbrook's repository.  This means that changelists will be out of
sequence with respect to date (but within each project things will be in
order).  Renumbering changelists would be unacceptable because it would
invalidate lots of stuff in the information system, or stuff we've
published.


3. PROCEDURE

The outline is: backup Perforce; unpack the archives from Global
Graphics; run the RCS to Perforce script to fake up a checkpoint and
build a depot; edit the checkpoint to fix paths in the depot; restore
from the edited checkpoint and test; merge checkpoints and depots;
restore.


3.1. Getting started

  Note: second time through (to repair the checkpoint created first
  time), we already know the range of change numbers affected. It is
  not necessary to turn off robots or the Perforce server until the
  merge is ready (section 3.8).

  1. Turn off the infomail robot on raven:

     su infomail
     crontab -r

     Turn off the infosys robot on sparrow:

     su infosys
     crontab -r

  2. Run "p4 counters" and then halt the server (this stops the change
     number from being bumped up while I'm working).  Remember the value
     for the change counter; we'll need it in section 3.5, item 3).

     p4 -u root admin stop

  3. Make a tape backup of sparrow [GDR 2000-10-04].

  4. Build a checkpoint.  For safety, copy the checkpoint, db.* files
     and the depot under /home/p4/repository to a different location.

     export DATE=`date "+%Y%m%dT%T"`
     p4d -r /home/p4/repository -jd checkpt.$DATE
     cp -R /home/p4/repository/{checkpt.$DATE,db.*,info.ravenbrook.com} /home/ndl/backup.$DATE

     Note after the event: this copy took forever. It would probably
     have been quicker to build a tar file.


3.2. Environment

Running on gannet (NT) with cygwin. (I have not tried this on any
other environment.)

I needed to set

     PATH=/usr/bin:$PATH

so that I would get cygwin versions of find and sort (as opposed to
win32 versions).

We made minor fixes to the rcstoperf.sh script, as follows:

  1. An additional command-line option (-halt) which allows you to do a
     copy and nothing else (so that we can check the intermediate
     results).

  2. A minor bug fix (-r missing from a p4d command).

The modified tool is [NDL 2001-08-15].


3.3. Preconditions

  1. An empty directory to to all this in. All relative pathnames are
     relative to this directory. I assume you're in this directory at
     the start of each section below.

  2. A clean (ie unused) Perforce server. I used version 2000.2 as it
     matches our central server. Unpack it in p4d/ and add this
     directory to PATH.

  3. The four tar files from
	 /project/mps/import/2001-08-13/MM-rcs/,
	 /project/mmref/import/2001-08-13/MM-rcs/,
     in rcs/

  4. The contents of /project/mps/tool/rcs-import/ and
     /project/mps/tool/second-rcs-import/, in tool/

  5. This procedure uses two scripts written in Common Lisp. These
     have been tested against version 4.1.20 of LispWorks (Windows NT
     "professional" release).


3.4. Install the RCS files:

  1. Unpack and group (as per [NDL 2001-08-14a]):

   cd rcs

   for tar in HOMEmm.tar MMQA.tar MMsrc.tar; do tar xf $tar; done
   mkdir mps
   mv mm mps/home
   mv mmqa mps/qa
   mv src mps

   tar xf MMref.tar
   mkdir mmref
   mv mm mmref/src

   rm *.tar

  2. Remove checkpoint labels:

  Start LispWorks. Compile and load tool/remove-checkpoint-labels.lisp.
  Run

      (rewrite-rcs-files <directory>)

  where <directory> is a string naming the full path to the rcs/
  directory, for example:

      (rewrite-rcs-files "e:/tmp/p4/all/rcs/")

  This function descends through subdirectories, assuming all files it
  meets are RCS files, removing all labels corresponding to
  checkpoints, and rewriting each file in-place.


3.5. Run the (modified) rcstoperf script:

I prefer to run it in stages and verify the results as it goes.

  1. tool/rcstoperf.sh -copy -halt rcs p4d

Copies the RCS hierarchy. OK, you could copy these files by hand, but
this is a quick sanity check that things are where they ought to be.
You should end up with a directory p4d/depot/IMPORT with subdirectories
mps/home, mps/qa, mps/src and mmref/src.  (When we edit the checkpoint
in section 3.6, item 6, we'll change all the paths starting IMPORT.)

  2. tool/rcstoperf.sh -nocopy -extract rcs p4d

This builds a (large) file p4d/tmp.extract.  This looks a bit like a
checkpoint; it contains all the metadata from the RCS files.

  3. tool/rcstoperf.sh -nocopy -next CHANGE -changes -meta rcs p4d

Where CHANGE is one of the following:

    a) If the RCS changes are being added to Perforce for the first
    time, CHANGE is 1 plus the highest changelevel in Ravenbrook's
    Perforce server (see step 3.1, item 2).

    b) If the RCS changes have already been added to Perforce and we
    are now attempting to fix problems (as described in section 1.1
    above), CHANGE is the same as last time. In this case, it is
    essential to verify after this operation that the value of the
    change counter in the first line of tmp.meta is the same as last
    time. The operations in section 3.8 below assume this.

This call to rcstoperf builds a checkpoint file p4d/tmp.meta.  On
gannet this checkpoint needs immediate surgery before we can proceed
(see comments about awk in [Perforce 1999]).

  4. The perl script to restore missing "@" charcters is in tool/:

        mv p4d/tmp.meta p4d/tmp.meta.at
        perl tool/terminal-ats.pl p4d/tmp.meta.at > p4d/tmp.meta

  5. tool/rcstoperf.sh -nocopy -load rcs p4d

We should now have a more compact checkpoint file (tmp.checkpt)
and a bunch of db.* files, all in p4d/.

  6. Edit the checkpoint, removing systematic errors.

  Start LispWorks (or continue to use the old image). Compile and load
  tool/edit-checkpoint.lisp. Run

      (edit-checkpoint <checkpoint>)

  where <checkpoint> is  a string naming the full path to the
  checkpoint file created above (tmp.checkpoint). For example:

      (edit-checkpoint "e:/tmp/p4/all/p4d/tmp.checkpt")

For each file / branch combination, this does the following:
a) Remove db.rev and db.revcx entries with lowest changelist
   numbers (they're bogus), confirming that the remaining db.rev
   entry with the lowest changelist number is correctly numbered in
   the "lbrRev" field
b) Subtract 1 from revision values for all remaining entries
c) Change "action" on earliest remaining db.rev / db.revcx entries
   to 0 ("add")
d) Store changelist from earliest remaining db.rev / db.revcx
   entries in db.integ (in place of zero there at present)
e) Change "resolved" to 2 ("automatically as part of branch") in
   all db.integ entries
f) Change "how" from 3 ("branch into") to 11 ("dirty branch into")
   in first of each db.integ pair

   This procedure generates a new checkpoint file, tmp.checkpoint.out,
   which is significantly shorter than before (but with the same
   number of changes).

  7. Install this checkpoint.

  cd p4d/
  rm db.*
  p4d -r . -jr tmp.checkpoint.out


3.6. Rename the branches

We need to do some relocation within the depot (as per [NDL
2001-08-14b]).

  1. Start the Perforce server.  Create yourself a client.

  2. mkdir tmp

  3. tool/branches.sh > tmp/branches.txt

This creates a file with two columns separated by whitespace; the
branch name and an estimate of the date the branch was created in [ISO
8601] format.  (Perforce got the branch names from labels in the RCS
files at step 5 in section 3.5.)

(This is a non-scalable report; give it a minute or two to run!)

  4. awk '{printf "s+//depot/%s/mps/+//info.ravenbrook.com/project/mps/branch/%s/%s/+\n", $1,$2,$1}' tmp/branches.txt > tmp/uniq.sed

Convert the map from date to name into a sed script that carries out the
path rewriting for all the branches.

  5. Edit the tail of tmp/uniq.sed:

In the last line, change

   DATE/main

to

   2001-08-13/trunk

Make a copy of the last line, with "mmref" substituted for the two
occurances of "mps".

Add one further line:

    s+//depot/IMPORT/+//info.ravenbrook.com/IMPORT/+

In an attempt to learn from earlier defects, I note that an inability
to spell ravenbrook correctly has annoying consequences downstream.

Add the following, which translate users:

    s/@gavinm@/@grm@/
    s/@nickb@/@nb@/
    s/@richard@/@rb@/

(Make sure there's a final newline, otherwise sed won't see the last
line of the file.)

  6. sed -f tmp/uniq.sed p4d/tmp.checkpt.out > p4d/checkpt.mm

This will take around 10 minutes; the new checkpoint file will be
about half as big again as the old one.  While it's running, carry out
step 10.

  7. Halt the Perforce server, remove the db files and journal

	rm p4d/db.* p4d/journal

  8. We need to fix up the (few) binary files in the distribution.

The only binary files have extension .png, and they're all on version
1.1.

Edit p4d/checkpt.mm.  Every line matching the regexp

     db\.rev@.*\.png@

should be changed: the lines have the pattern:

     "@pv@ 3 @db.rev@", $file, $rev, 0, ..., 0

Both zeros shown should be changed to 257.  Retain the trailing space
at the end of the lines.

(Note: [Perforce 1999] says only to change the first 0.  Experiments
have shown that this leads to crlf corruption of binary files.)

Now go to the directory containing these diagrams and unpack them:

     cd p4d/depot/IMPORT/mmref/src/diagrams
     co *,v

(Gannet doesn't have rcs installed. I built a tarball of the .v files,
ftp'd it to raven, unpacked it there and copied the .png files back.)

Finally, move the files thus:

     for p in *.png; do mkdir $p,d; mv $p $p,d/1.1; done

  9. p4d -r p4d -jr checkpt.mm

  10. mv p4d/depot p4d/info.ravenbrook.com

(Cygwin does this as a deep copy. Use the desktop for real-time
results.)

  11. Restart Perforce. Create a depot info.ravenbrook.com. Create
      yourself a client. Try it out. (Check: for non-corruption of png
      files; for plausibility of files, braches, changes, etc.)


3.7. Merge, version A

Follow this section only if merging for the first time. Otherwise
follow section 3.8 below.

  1. Copy the fabricated depot and checkpoint to sparrow and unpack
     them.

     cd p4d
     tar cf - info.ravenbrook.com checkpt.mm | gzip -c > mm.tar.gz

FTP the tarball to sparrow and unpack it.

     cd /home/p4/repository
     gunzip -c mm.tar.gz | tar xvf -

(We know that this won't overwrite any files because the new depot
is all under info.ravenbrook.com/IMPORT/.)

  2. Restore from the RCS checkpoint.

     p4d -r /home/p4/repository -jr checkpt.mm

  3. Start Perforce server and test it.

  4. Restart infomail and infosys.

On raven:

     su infomail
     crontab /home/infomail/etc/crontab

On sparrow:

     su infosys
     crontab /home/infosys/etc/crontab

3.8. Merge, version B

Follow this section only if merging for a second time. First time
around, follow section 3.7 above.

  1. If you haven't already carried out the steps in section 3.1, do
     so now.

  2. Copy the fabricated depot to sparrow and unpack it:

     cd p4d
     tar cf - info.ravenbrook.com | gzip -c > mm.tar.gz

FTP the tarball to sparrow and unpack it.

     cd /home/p4/repository
     mv info.ravenbrook.com/IMPORT ./IMPORT.bak
     gunzip -c mm.tar.gz | tar xvf -

(We know that this won't overwrite any files because both the new and
previously new depots are all under info.ravenbrook.com/IMPORT/.)

  3. Set up a working space:

     Make an empty directory "sparrow" on gannet.

     Copy the checkpoint dumped in section 3.1, item 4 above into
     sparrow/ on gannet, naming it checkpt.start

     Copy the fabricated checkpoint from last time into sparrow/ on
     gannet, naming it checkpt.old

     Copy the fabricated checkpoint from this time into sparrow/,
     naming it checkpt.new

  4. We want to generate a new checkpoint file, which has the
     following differences from checkpt.start:

a) all but one of the changes which were introduced by checkpt.old
removed, the exception being the removal of the "change" counter,
which is bogus.

b) all the changes introduced on mps/branch and mmref/branch since
then removed (these are db.have entries caused by people syncing
their clients; note that we exclude additions of m*/branch/index.html
as these two files are not part of the RCS merge.

c) all but one of the changes introduced by checkpt.new added, the
exception being the resetting of the "change" counter, which is now
bogus.

     cd sparrow/
     p4d -r . -jr checkpt.start

     sed s/^@pv@/@dv@/ checkpt.old | tail +2 > checkpt.dv1
     p4d -r . -jr checkpt.dv1

     p4d -r . -jd checkpt.midway
     grep 'project/m[^/]*/branch/[12]' checkpt.midway | sed s/^@pv@/@dv@/ > checkpt.dv2
     p4d -r . -jr checkpt.dv2

     tail +2 checkpt.new > checkpt.new-no-counter
     p4d -r . -jr checkpt.new-no-counter

     p4d -r . -z -jd checkpt.final.gz

  5. Advise users that their client spaces will need fixing up by
     hand.

     awk -F/ '{print $3}' checkpt.dv2 |sort -u

This generates a list of clients which had synced some or all of the
affected branches. Since (a) what they had synced is probably bogus
and (b) checkpt.final does not know about these syncs, the owners of
these clients will need to remove these branch directories by hand

      rm -rf project/m*/branch/[12]*

  6. Copy (FTP) this new fabricated depot to /home/p4/repository on
     sparrow.

  7. Restore from the RCS checkpoint.

     cd /home/p4/repository
     rm db.*
     p4d -r . -z -jr checkpt.final.gz

  8. Start Perforce server and test it.

  9. Restart infomail and infosys, as in item 4 of section 3.7 above.



A. REFERENCES

[GDR 2000-10-04] "Backup procedures for sparrow.ravenbrook.com";
Gareth Rees; Ravenbrook Limited; 2000-10-04.
<URL:/doc/2000/10/04/backup-procedure/>

[Global Graphics 2001-08-13] "Memory Management RCS files"; Global
Graphics 2001-08-13. <URL:/project/mps/import/2001-08-13/MM-rcs/>

[ISO 8601] "ISO 8601:2000 Data elements and interchange formats --
Information interchange -- Representation of dates and times"; ISO;
1988-06-15.

[NB 2001-08-24] "MPS import to Perforce seems to be broken" (email
message); Nick Barnes; Ravenbrook Limited; 2001-08-24.
<URL://info.ravenbrook.com/mail/2001/08/24/13-35-45/0.txt>

[NB 2001-09-10] "Re: MPS import to Perforce seems to be broken" (email
message); Nick Barnes; Ravenbrook Limited; 2001-09-10.
<URL://info.ravenbrook.com/mail/2001/09/10/11-22-47/0.txt>

[NDL 2001-08-14a] "Re: Pekka P. Pirinen: Re: schedule of items to be
assigned" (email message); Nick Levine; Ravenbrook Limited;
2001-08-14. <URL:
http://info.ravenbrook.com/mail/2001/08/14/14-35-29/0.txt>

[NDL 2001-08-14b] "Re: Pekka P. Pirinen: Re: schedule of items to be
assigned" (email message); Nick Levine; Ravenbrook Limited;
2001-08-14. <URL:
//info.ravenbrook.com/mail/2001/08/14/18-15-36/0.txt>

[NDL 2001-08-15] "RCS to Perforce"; Nick Levine; Ravenbrook Limited;
2001-08-15; <URL:/project/mps/tool/rcs-import/>.

[Perforce 1999] "Problems with the rcstoperf.sh conversion script" /;
Perforce Inc; 1999. <URL:
/project/mps/import/1999/RCS-to-Perforce/note031.html>


B. DOCUMENT HISTORY

2001-08-15 NDL Created.

2001-08-16 NDL Fixed instructions for binary files.

2001-08-16 GDR Expanded to specify the whole procedure.

2001-08-17 NDL Note procedure has been carried out. Fixed bits we
couldn't test in advance.

2001-09-12 NDL Revise in the light previous failures.

2001-09-13 NDL Merge process for second pass (section 3.8).

2002-06-20 NDL Removed confidentiality notice and updated the
copyright / license.

C. COPYRIGHT AND LICENSE

Copyright (C) 2001-2002 Ravenbrook Limited <http://www.ravenbrook.com/>.
All rights reserved.  This is an open source license.  Contact
Ravenbrook for commercial licensing options.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Redistributions in any form must be accompanied by information on how
to obtain complete source code for this software and any accompanying
software that uses this software.  The source code must either be
included in the distribution or be available for no more than the cost
of distribution plus a nominal fee, and must be freely redistributable
under reasonable conditions.  For an executable file, complete source
code means the source code for all modules it contains. It does not
include source code for modules or files that typically accompany the
major components of the operating system on which the executable file
runs.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, OR NON-INFRINGEMENT, ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT HOLDERS AND CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


$Id: //info.ravenbrook.com/project/mps/procedure/rcs-import/index.txt#11 $