Sunday, 1 March 2015

Subversion trickery

At Rapita we use Subversion (SVN) for version control. It is used for source code, tests, documentation, and even office admin. I like it, and I have come to know it well. I thought I'd write a short article describing some lesser-known tricks that are useful to me.

Creating, deleting and copying without checking out

Did you know that commands such as "svn copy", "svn move", "svn delete" and "svn mkdir" do not have to operate on your working copy (i.e. the checked out files)? You can use them directly on the repository itself.

This is really useful if you want to create a new subdirectory "/trunk/data", but you don't want to check out all of "/trunk" in order to do that. You can just give a URL within the repository, as in:
$ svn mkdir https://mysvnserver/svn/repository/trunk/data/
(Because this works on the repository, rather than the working copy, there is an implicit "commit" step: you have to enter the log message at the same time.)

Similarly, you can move, delete and copy files and directories without checking them out. The syntax is similar:
$ svn copy https://mysvnserver/svn/repository/trunk/ https://mysvnserver/svn/repository/branches/bug1234/

The ^ shortcut

Recent SVN clients have a nice shortcut feature that can be used if the current directory is within a working copy. You can specify ^ as a shorthand for the root of the repository that the working copy belongs to. So ^ becomes https://mysvnserver/svn/repository/, or whatever the root of your repository is. So the copy, move, delete and mkdir commands can be written very quickly indeed:
$ svn copy ^/trunk ^/branches/bug1234
$ svn delete ^/branches/bug765
$ svn mkdir ^/tags
This only works from within a working copy. But the working copy does not have to be up to date. It can even refer to a directory that doesn't exist in the repository.

Deep clean: wiping out unversioned files

If you store your source code or tests inside SVN, like we do at Rapita, then your working copy may contain unversioned files after compilation and testing. These can include compiled programs, libraries, object code, temporary input/output files for tests, and auto-generated source code. For a clean build and a clean test, you really need to remove these files, in order to be sure that stale files are not causing (or hiding) problems.

SVN calls them "unversioned files". There are also "unversioned directories". They are not added to the repository, SVN does not track them, but it does detect them when you run:
$ svn status
Sometimes a Makefile rule allows these files to be removed:
$ make clean
But that rule may not delete everything, not least because the Makefile has to explicitly state which files must be deleted.

I often use a "deep clean" process which removes absolutely all of the unversioned files. Any file that is not known to SVN is deleted. SVN does not include a feature to do this - there is, as yet, no "svn deepclean" or "svn delete_unversioned" - so the job has to be done by a script (or Makefile) of your own devising. This script will typically contain a single line of bash code, such as:
#!/bin/bash -xe
svn status --no-ignore | egrep '^[?I]' | cut -c9- | xargs -d '\n' rm -r --
The steps here are:
  • svn status --no-ignore
    List all files, versioned and unversioned. For example, the output from this command may be:
    M src/adt2ast/adt2ast.adb
    I src/adt2ast/obj/adt2ast.o
    ? src/adt2ast/obj/b~adt2ast.adb

    The first line is a modified, versioned source file (adt2ast.adb - it's Ada source code). The second and third line are unversioned files. The "I" means that SVN has been told to ignore files matching this pattern (probably "*.o").
  • egrep '^[?I]'
    Remove any line not beginning with ? or I. This leaves out lines referring to versioned files that have been modified, or added or deleted. Result:
    I src/adt2ast/obj/adt2ast.o
    ? src/adt2ast/obj/b~adt2ast.adb
  • cut -c9-
    Remove the first 8 characters from each line. Result:
    src/adt2ast/obj/adt2ast.o
    src/adt2ast/obj/b~adt2ast.adb
  • xargs -d '\n' rm -r --
    Run the command "rm -r -- X" for every line X received by xargs. This deletes each unversioned or ignored file. The "--" is important here, because otherwise any unversioned file name beginning with "-" will be interpreted by "rm" as a parameter.
This script is actually a bit dangerous. It is very fast and aggressive. It destroys unversioned files and directories without asking "Are you sure?". You won't be able to recover the files from the Recycle Bin. Before using it, be sure that you have used "svn add" on any files that you want to keep, thus making them versioned. If you are worried that you might have forgotten to add some files, then make a copy of whatever you are working on beforehand. It is very easy to make a mistake and destroy a file that you meant to keep, but forgot to "svn add". This has happened to me. On one occasion I nearly lost an afternoon's work this way. Fortunately the automatic backup program had copied the work before I accidentally wiped it. I think that accidents of this sort may be why "svn" does not include the feature already.

In combination with "svn revert" and "svn update", deepclean is an extremely useful way to put a working copy back to a "known good" state, containing only versioned files with a particular version. You probably have something like it running on your "continuous integration" machine, or your build server. It's a good thing to have on your workstation, too, because it's useful to be able to start from the same place as the build server.

Line ending styles

Sometimes an operating system is particular about whether the lines in a file end with CR/LF, or LF. It is not merely that Windows .bat files have to use CR/LF. Bash scripts have to use LF on Linux - if you don't do this, then you get weird behaviour when the script starts up:
$ ./x.sh
bash: ./x.sh: /bin/bash^M: bad interpreter: No such file or directory
Furthermore, tools such as "diff" and "patch" make assumptions about line endings, and may end up peppering source files with unwanted CR characters.

This is not an issue if you only develop software for one OS. But if you are writing software for Windows and Linux, then it is best if SVN is aware of the line-ending requirements for each file. You can set the properties of a file as follows:
  • svn propset svn:eol-style CRLF file.bat
  • svn propset svn:eol-style LF file.sh
  • svn propset svn:eol-style native sourcecode.c
These override the default behaviour, which preserves the line endings that were used when the file was last edited. They force SVN to always treat the file as having a particular format. The native format means that SVN picks the appropriate choice for the OS you are using. If you mess up when editing the file, and accidentally use the wrong style, then SVN will fix it for you. I use a small bash script, called "svn_eol", containing the line:
svn propset svn:eol-style native $@
This normalises the properties of source files to "native" form, making it easy to use "diff" and "patch", and trivial to edit the files on both Windows and Linux.