MPS/MMREF import procedures Nick Levine, Ravenbrook Limited, 2001-08-15 1. INTRODUCTION This is the procedure for the one-off (but see section 1.1 for qualification of that) import of data from the HOPE/RCS hierarchy [Global Graphics 2001-08-13] into Ravenbrook's perforce repository under /project/mps. This procedure takes four .tar.gz archives from Global Graphics, representing the revision history of the MPS and MMRef projects, and merges those projects into Ravenbrook's Perforce repository. (This procedure includes the import of data from the MMref project as it is related and any faults in the procedure form one project will probably need fixing identically in the other. Also, we're going to make interleaving of changes even nastier if we do the two seperately.) The purpose is to allow Ravenbrook to develop and maintain the MPS and MMRef projects. The intended readership is Ravenbrook development staff. This document is not confidential. 1.1. Status The procedure was first run on 2001-08-17. It took about 2.5 hours, most of which was spent doing the backups described in section 3.1, items 3 and 4 below. Sections 3.4, 3.5 and 3.6 were carried out while this was going on; this part of the procedure took less than an hour. This document was updated after the procedure was run, as the merge with our central system had not been fully tested in advance. Numerous problems emerged after the merge (see for example [NB 2001-08-24]); after investigation we settled on surgery: a) of RCS files to remove checkpoint labels, as these confuse the rcstoperf script b) of the resulting checkpoint, which still contained bogus and broken entries [NB 2001-09-10]. We may decide to restore checkpoint labels later as a seperate operation. This document now also desribes how to merge the imported material into Ravenbrook's Perforce repository given that a previous (but broken) version of that material is already there. 2. HOW THE MERGE WORKS 2.1. Paths and branches Paths in the original look like mps/src/misc.h mmref/src/diagrams/address.png For the trunk branch (revisions of the form 1.x), these gets mapped to //info.ravenbrook.com/project/mps/branch/2001-08-13/trunk/src/misc.h //info.ravenbrook.com/project/mmref/branch/2001-08-13/trunk/src/diagrams/address.png 2.2. Branches There are no branches in the MMRef project. In the MPS, each branch gets mapped to //info.ravenbrook.com/project/mps/branch/DATE/NAME/... where DATE is NDL's best guess as to when the branch was made [NDL 2001-08-14b] and NAME has been generated by the RCS to Perforce script, from branch labels in the RCS files. 2.3. Changelists and dates Changelists will run sequentially from the most recent change in Ravenbrook's repository. This means that changelists will be out of sequence with respect to date (but within each project things will be in order). Renumbering changelists would be unacceptable because it would invalidate lots of stuff in the information system, or stuff we've published. 3. PROCEDURE The outline is: backup Perforce; unpack the archives from Global Graphics; run the RCS to Perforce script to fake up a checkpoint and build a depot; edit the checkpoint to fix paths in the depot; restore from the edited checkpoint and test; merge checkpoints and depots; restore. 3.1. Getting started Note: second time through (to repair the checkpoint created first time), we already know the range of change numbers affected. It is not necessary to turn off robots or the Perforce server until the merge is ready (section 3.8). 1. Turn off the infomail robot on raven: su infomail crontab -r Turn off the infosys robot on sparrow: su infosys crontab -r 2. Run "p4 counters" and then halt the server (this stops the change number from being bumped up while I'm working). Remember the value for the change counter; we'll need it in section 3.5, item 3). p4 -u root admin stop 3. Make a tape backup of sparrow [GDR 2000-10-04]. 4. Build a checkpoint. For safety, copy the checkpoint, db.* files and the depot under /home/p4/repository to a different location. export DATE=`date "+%Y%m%dT%T"` p4d -r /home/p4/repository -jd checkpt.$DATE cp -R /home/p4/repository/{checkpt.$DATE,db.*,info.ravenbrook.com} /home/ndl/backup.$DATE Note after the event: this copy took forever. It would probably have been quicker to build a tar file. 3.2. Environment Running on gannet (NT) with cygwin. (I have not tried this on any other environment.) I needed to set PATH=/usr/bin:$PATH so that I would get cygwin versions of find and sort (as opposed to win32 versions). We made minor fixes to the rcstoperf.sh script, as follows: 1. An additional command-line option (-halt) which allows you to do a copy and nothing else (so that we can check the intermediate results). 2. A minor bug fix (-r missing from a p4d command). The modified tool is [NDL 2001-08-15]. 3.3. Preconditions 1. An empty directory to to all this in. All relative pathnames are relative to this directory. I assume you're in this directory at the start of each section below. 2. A clean (ie unused) Perforce server. I used version 2000.2 as it matches our central server. Unpack it in p4d/ and add this directory to PATH. 3. The four tar files from /project/mps/import/2001-08-13/MM-rcs/, /project/mmref/import/2001-08-13/MM-rcs/, in rcs/ 4. The contents of /project/mps/tool/rcs-import/ and /project/mps/tool/second-rcs-import/, in tool/ 5. This procedure uses two scripts written in Common Lisp. These have been tested against version 4.1.20 of LispWorks (Windows NT "professional" release). 3.4. Install the RCS files: 1. Unpack and group (as per [NDL 2001-08-14a]): cd rcs for tar in HOMEmm.tar MMQA.tar MMsrc.tar; do tar xf $tar; done mkdir mps mv mm mps/home mv mmqa mps/qa mv src mps tar xf MMref.tar mkdir mmref mv mm mmref/src rm *.tar 2. Remove checkpoint labels: Start LispWorks. Compile and load tool/remove-checkpoint-labels.lisp. Run (rewrite-rcs-files ) where is a string naming the full path to the rcs/ directory, for example: (rewrite-rcs-files "e:/tmp/p4/all/rcs/") This function descends through subdirectories, assuming all files it meets are RCS files, removing all labels corresponding to checkpoints, and rewriting each file in-place. 3.5. Run the (modified) rcstoperf script: I prefer to run it in stages and verify the results as it goes. 1. tool/rcstoperf.sh -copy -halt rcs p4d Copies the RCS hierarchy. OK, you could copy these files by hand, but this is a quick sanity check that things are where they ought to be. You should end up with a directory p4d/depot/IMPORT with subdirectories mps/home, mps/qa, mps/src and mmref/src. (When we edit the checkpoint in section 3.6, item 6, we'll change all the paths starting IMPORT.) 2. tool/rcstoperf.sh -nocopy -extract rcs p4d This builds a (large) file p4d/tmp.extract. This looks a bit like a checkpoint; it contains all the metadata from the RCS files. 3. tool/rcstoperf.sh -nocopy -next CHANGE -changes -meta rcs p4d Where CHANGE is one of the following: a) If the RCS changes are being added to Perforce for the first time, CHANGE is 1 plus the highest changelevel in Ravenbrook's Perforce server (see step 3.1, item 2). b) If the RCS changes have already been added to Perforce and we are now attempting to fix problems (as described in section 1.1 above), CHANGE is the same as last time. In this case, it is essential to verify after this operation that the value of the change counter in the first line of tmp.meta is the same as last time. The operations in section 3.8 below assume this. This call to rcstoperf builds a checkpoint file p4d/tmp.meta. On gannet this checkpoint needs immediate surgery before we can proceed (see comments about awk in [Perforce 1999]). 4. The perl script to restore missing "@" charcters is in tool/: mv p4d/tmp.meta p4d/tmp.meta.at perl tool/terminal-ats.pl p4d/tmp.meta.at > p4d/tmp.meta 5. tool/rcstoperf.sh -nocopy -load rcs p4d We should now have a more compact checkpoint file (tmp.checkpt) and a bunch of db.* files, all in p4d/. 6. Edit the checkpoint, removing systematic errors. Start LispWorks (or continue to use the old image). Compile and load tool/edit-checkpoint.lisp. Run (edit-checkpoint ) where is a string naming the full path to the checkpoint file created above (tmp.checkpoint). For example: (edit-checkpoint "e:/tmp/p4/all/p4d/tmp.checkpt") For each file / branch combination, this does the following: a) Remove db.rev and db.revcx entries with lowest changelist numbers (they're bogus), confirming that the remaining db.rev entry with the lowest changelist number is correctly numbered in the "lbrRev" field b) Subtract 1 from revision values for all remaining entries c) Change "action" on earliest remaining db.rev / db.revcx entries to 0 ("add") d) Store changelist from earliest remaining db.rev / db.revcx entries in db.integ (in place of zero there at present) e) Change "resolved" to 2 ("automatically as part of branch") in all db.integ entries f) Change "how" from 3 ("branch into") to 11 ("dirty branch into") in first of each db.integ pair This procedure generates a new checkpoint file, tmp.checkpoint.out, which is significantly shorter than before (but with the same number of changes). 7. Install this checkpoint. cd p4d/ rm db.* p4d -r . -jr tmp.checkpoint.out 3.6. Rename the branches We need to do some relocation within the depot (as per [NDL 2001-08-14b]). 1. Start the Perforce server. Create yourself a client. 2. mkdir tmp 3. tool/branches.sh > tmp/branches.txt This creates a file with two columns separated by whitespace; the branch name and an estimate of the date the branch was created in [ISO 8601] format. (Perforce got the branch names from labels in the RCS files at step 5 in section 3.5.) (This is a non-scalable report; give it a minute or two to run!) 4. awk '{printf "s+//depot/%s/mps/+//info.ravenbrook.com/project/mps/branch/%s/%s/+\n", $1,$2,$1}' tmp/branches.txt > tmp/uniq.sed Convert the map from date to name into a sed script that carries out the path rewriting for all the branches. 5. Edit the tail of tmp/uniq.sed: In the last line, change DATE/main to 2001-08-13/trunk Make a copy of the last line, with "mmref" substituted for the two occurances of "mps". Add one further line: s+//depot/IMPORT/+//info.ravenbrook.com/IMPORT/+ In an attempt to learn from earlier defects, I note that an inability to spell ravenbrook correctly has annoying consequences downstream. Add the following, which translate users: s/@gavinm@/@grm@/ s/@nickb@/@nb@/ s/@richard@/@rb@/ (Make sure there's a final newline, otherwise sed won't see the last line of the file.) 6. sed -f tmp/uniq.sed p4d/tmp.checkpt.out > p4d/checkpt.mm This will take around 10 minutes; the new checkpoint file will be about half as big again as the old one. While it's running, carry out step 10. 7. Halt the Perforce server, remove the db files and journal rm p4d/db.* p4d/journal 8. We need to fix up the (few) binary files in the distribution. The only binary files have extension .png, and they're all on version 1.1. Edit p4d/checkpt.mm. Every line matching the regexp db\.rev@.*\.png@ should be changed: the lines have the pattern: "@pv@ 3 @db.rev@", $file, $rev, 0, ..., 0 Both zeros shown should be changed to 257. Retain the trailing space at the end of the lines. (Note: [Perforce 1999] says only to change the first 0. Experiments have shown that this leads to crlf corruption of binary files.) Now go to the directory containing these diagrams and unpack them: cd p4d/depot/IMPORT/mmref/src/diagrams co *,v (Gannet doesn't have rcs installed. I built a tarball of the .v files, ftp'd it to raven, unpacked it there and copied the .png files back.) Finally, move the files thus: for p in *.png; do mkdir $p,d; mv $p $p,d/1.1; done 9. p4d -r p4d -jr checkpt.mm 10. mv p4d/depot p4d/info.ravenbrook.com (Cygwin does this as a deep copy. Use the desktop for real-time results.) 11. Restart Perforce. Create a depot info.ravenbrook.com. Create yourself a client. Try it out. (Check: for non-corruption of png files; for plausibility of files, braches, changes, etc.) 3.7. Merge, version A Follow this section only if merging for the first time. Otherwise follow section 3.8 below. 1. Copy the fabricated depot and checkpoint to sparrow and unpack them. cd p4d tar cf - info.ravenbrook.com checkpt.mm | gzip -c > mm.tar.gz FTP the tarball to sparrow and unpack it. cd /home/p4/repository gunzip -c mm.tar.gz | tar xvf - (We know that this won't overwrite any files because the new depot is all under info.ravenbrook.com/IMPORT/.) 2. Restore from the RCS checkpoint. p4d -r /home/p4/repository -jr checkpt.mm 3. Start Perforce server and test it. 4. Restart infomail and infosys. On raven: su infomail crontab /home/infomail/etc/crontab On sparrow: su infosys crontab /home/infosys/etc/crontab 3.8. Merge, version B Follow this section only if merging for a second time. First time around, follow section 3.7 above. 1. If you haven't already carried out the steps in section 3.1, do so now. 2. Copy the fabricated depot to sparrow and unpack it: cd p4d tar cf - info.ravenbrook.com | gzip -c > mm.tar.gz FTP the tarball to sparrow and unpack it. cd /home/p4/repository mv info.ravenbrook.com/IMPORT ./IMPORT.bak gunzip -c mm.tar.gz | tar xvf - (We know that this won't overwrite any files because both the new and previously new depots are all under info.ravenbrook.com/IMPORT/.) 3. Set up a working space: Make an empty directory "sparrow" on gannet. Copy the checkpoint dumped in section 3.1, item 4 above into sparrow/ on gannet, naming it checkpt.start Copy the fabricated checkpoint from last time into sparrow/ on gannet, naming it checkpt.old Copy the fabricated checkpoint from this time into sparrow/, naming it checkpt.new 4. We want to generate a new checkpoint file, which has the following differences from checkpt.start: a) all but one of the changes which were introduced by checkpt.old removed, the exception being the removal of the "change" counter, which is bogus. b) all the changes introduced on mps/branch and mmref/branch since then removed (these are db.have entries caused by people syncing their clients; note that we exclude additions of m*/branch/index.html as these two files are not part of the RCS merge. c) all but one of the changes introduced by checkpt.new added, the exception being the resetting of the "change" counter, which is now bogus. cd sparrow/ p4d -r . -jr checkpt.start sed s/^@pv@/@dv@/ checkpt.old | tail +2 > checkpt.dv1 p4d -r . -jr checkpt.dv1 p4d -r . -jd checkpt.midway grep 'project/m[^/]*/branch/[12]' checkpt.midway | sed s/^@pv@/@dv@/ > checkpt.dv2 p4d -r . -jr checkpt.dv2 tail +2 checkpt.new > checkpt.new-no-counter p4d -r . -jr checkpt.new-no-counter p4d -r . -z -jd checkpt.final.gz 5. Advise users that their client spaces will need fixing up by hand. awk -F/ '{print $3}' checkpt.dv2 |sort -u This generates a list of clients which had synced some or all of the affected branches. Since (a) what they had synced is probably bogus and (b) checkpt.final does not know about these syncs, the owners of these clients will need to remove these branch directories by hand rm -rf project/m*/branch/[12]* 6. Copy (FTP) this new fabricated depot to /home/p4/repository on sparrow. 7. Restore from the RCS checkpoint. cd /home/p4/repository rm db.* p4d -r . -z -jr checkpt.final.gz 8. Start Perforce server and test it. 9. Restart infomail and infosys, as in item 4 of section 3.7 above. A. REFERENCES [GDR 2000-10-04] "Backup procedures for sparrow.ravenbrook.com"; Gareth Rees; Ravenbrook Limited; 2000-10-04. [Global Graphics 2001-08-13] "Memory Management RCS files"; Global Graphics 2001-08-13. [ISO 8601] "ISO 8601:2000 Data elements and interchange formats -- Information interchange -- Representation of dates and times"; ISO; 1988-06-15. [NB 2001-08-24] "MPS import to Perforce seems to be broken" (email message); Nick Barnes; Ravenbrook Limited; 2001-08-24. [NB 2001-09-10] "Re: MPS import to Perforce seems to be broken" (email message); Nick Barnes; Ravenbrook Limited; 2001-09-10. [NDL 2001-08-14a] "Re: Pekka P. Pirinen: Re: schedule of items to be assigned" (email message); Nick Levine; Ravenbrook Limited; 2001-08-14. [NDL 2001-08-14b] "Re: Pekka P. Pirinen: Re: schedule of items to be assigned" (email message); Nick Levine; Ravenbrook Limited; 2001-08-14. [NDL 2001-08-15] "RCS to Perforce"; Nick Levine; Ravenbrook Limited; 2001-08-15; . [Perforce 1999] "Problems with the rcstoperf.sh conversion script" /; Perforce Inc; 1999. B. DOCUMENT HISTORY 2001-08-15 NDL Created. 2001-08-16 NDL Fixed instructions for binary files. 2001-08-16 GDR Expanded to specify the whole procedure. 2001-08-17 NDL Note procedure has been carried out. Fixed bits we couldn't test in advance. 2001-09-12 NDL Revise in the light previous failures. 2001-09-13 NDL Merge process for second pass (section 3.8). 2002-06-20 NDL Removed confidentiality notice and updated the copyright / license. C. COPYRIGHT AND LICENSE Copyright (C) 2001-2002 Ravenbrook Limited . All rights reserved. This is an open source license. Contact Ravenbrook for commercial licensing options. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Redistributions in any form must be accompanied by information on how to obtain complete source code for this software and any accompanying software that uses this software. The source code must either be included in the distribution or be available for no more than the cost of distribution plus a nominal fee, and must be freely redistributable under reasonable conditions. For an executable file, complete source code means the source code for all modules it contains. It does not include source code for modules or files that typically accompany the major components of the operating system on which the executable file runs. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS AND CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. $Id: //info.ravenbrook.com/project/mps/procedure/rcs-import/index.txt#11 $