Recommended backup procedure and preserving my data...

John Arbash Meinel john at arbash-meinel.com
Thu Oct 15 17:47:27 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Szakmeister wrote:
> I'm not looking for the "every branch taken is a backup" answer.  I
> get that, but I'm more interested from a enterprise situation.  I have
> data that I want to backup, and I want to make sure it's consistent.
> For instance, how do I go about saving of the data (preferably,
> incrementally) for 50 projects that might use a mix of separated
> branches and shared repositories?  I should add another caveat: the
> resultant data should be consistent, and I don't have to monkey around
> with any of the internals of the backup.  Is my only choice to prevent
> access to the shared repo and/or branch while backing up?  I'd like to
> avoid that as we have folks working odd hours at times.

The easiest way to ensure it is to create a second repository, mirror
into that, then take your backup. That way you control when the
repository is being updated, versus when it is being backed up. And you
never have to restrict access to the main location.

> 
> On a similar note: Subversion has the ability to dump a repository
> into a well-known format, that can be used to upgrade your repository,
> or downgrade it.  But I see it as something else: a format that I can
> easily parse and use to migrate to a different tool.  So it's my data
> liberator as well. :-)  I suppose fast-export and fast-import do the
> same for me?

You *can* do a fast-export, but last I heard, it wasn't made as a
'round-tripping' tool. Meaning that you can export, and potentially
import all of that history into a new repository. But the new repository
will likely get all new revision ids and file-ids, and not be compatible
with the old repository.

If you *just* wanted a single-file dump of the whole history, you could do:

bzr init empty-branch
cd trunk
bzr send -o ../big-dump.patch ../empty-branch

It effectively generates the delta of your entire history, and puts it
into the file. It doesn't work with multiple branches, though. (Well you
can do them each separately, but you'll have lots of really big files
when you are done.

It also isn't optimized for this case, and will probably be quite slow.


> 
> Sorry for all the questions, but I'd like to seriously consider
> rolling out Bazaar in our infrastructure.  I can't really do it
> though, unless I can take care of these issues as well.

#!/bin/sh
#backup all of the current repository

CUR_REPO=XXXX
BACKUP_REPO=YYYY

cd $BACKUP_REPO
# Wipe out all of the current state of branches, in case things have
# been deleted, renamed, etc. We leave '.bzr' intact so that we don't
# have to copy all of the repository each time.
rm -rf * # assuming '*' doesn't expand '.bzr' on your system

cd $CUR_REPO
for b in `bzr branches`; do
  bzr branch $b $BACKUP_REPO/$b
done;

cd $BACKUP_REPO
$RUN_BACKUP_TO_TAPE

Note that this should be ~nice to your backup tapes. Bazaar will
autopack the repository from time to time, but does so in an
'exponential backoff' fashion. So the *first* time you run this script,
I would add a "bzr pack" just before $RUN_BACKUP_TO_TAPE.
That should give you a single minimal pack file that gets backed up.

- From then on, we will generally generate new .pack files with just the
changes, and you will only be copying those to tape. 'autopack' will
combine small packs into a single larger one. Which is slightly more
churn than minimal for your backup system, but should be reasonable.

> 
> Also, is there any alternative out there that gets us smart server
> access, without setting up individual accounts on the system?  I
> realize that I can proxy everyone through a single account, and use
> something like bzr_access... I was just wondering if there are more
> alternatives out there--other than a code hosting service, which we
> can't use.
> 
> Thanks in advance!
> 
> -John

You can use 'bzr_access', you can use bzr+http + .htaccess files. You
can use just "bzr://" access and just use firewall rules to restrict who
can actually access the server.

It really depends how much access control support you really need. I
think someone was also working on adding AC to 'bzr;//' but I don't
think that is 'ready' in any sense.

bzr+http might be your best bet here.

John
=:->


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrXUh8ACgkQJdeBCYSNAAPemQCgh1NTzYTweiSPPGR2hZSkQMlp
oV4AoJEu+vug4lHcfh1QurtJTEvqwi8z
=P4Ku
-----END PGP SIGNATURE-----



More information about the bazaar mailing list