Misusing Bazaar for Data Backup
Moritz Bartl
m.bartl at wiredwings.com
Sun Aug 3 10:14:29 BST 2008
Hallo There!
I am currently evaluating different distributed revision control systems
for backup and synchronisation purposes (see
http://www.wiredwings.com/wiki/Untitled_Synchronization_and_Backup_Project
for more info). I know this is not the standard use-case and I don't
expect support for that, but maybe this is of interest to you, too.
Basically, I am looking for feedback if this can be done after all. I
know there are dedicated tools to do backups (rdiff,rsync), but I know
of no non-commercial solution that keeps revisions and syncs the way
DRCS do.
I've performed various tests using the Bazaar 1.6b3 on my system (Vista
64bit, Q6600, 4GB RAM).
One of my test repositories is 861MB, ~500 files, with several binaries
larger than 150MB, and they commit fine. Bazaar uses one of my 4 cores
and up to 600MB of memory on these files, but it works.
Also, I had no problem committing all my 'important files' into a local
repository (41000 files, 6.6GB).
The limit for my system seem to be around 300-500MB for one file. When I
try to commit a 300MB file, I get the MemoryError described in bug
#193685 (in tuned_gzip), while a file with 500MB already breaks knit.py,
never reaching tuned_gzip.py:
bzr: ERROR: exceptions.MemoryError:
Traceback (most recent call last):
File "bzrlib\commands.pyc", line 857, in run_bzr_catch_errors
File "bzrlib\commands.pyc", line 797, in run_bzr
File "bzrlib\commands.pyc", line 499, in run_argv_aliases
File "bzrlib\builtins.pyc", line 2259, in run
File "bzrlib\decorators.pyc", line 192, in write_locked
File "bzrlib\workingtree_4.pyc", line 242, in commit
File "bzrlib\decorators.pyc", line 192, in write_locked
File "bzrlib\mutabletree.pyc", line 197, in commit
File "bzrlib\commit.pyc", line 355, in commit
File "bzrlib\commit.pyc", line 655, in _update_builder_with_changes
File "bzrlib\commit.pyc", line 783, in _populate_from_inventory
File "bzrlib\commit.pyc", line 825, in _record_entry
File "bzrlib\repository.pyc", line 364, in record_entry_contents
File "bzrlib\repository.pyc", line 424, in _add_text_to_weave
File "bzrlib\knit.pyc", line 754, in add_lines
File "bzrlib\knit.pyc", line 826, in _add
File "bzrlib\knit.pyc", line 1610, in _record_to_data
MemoryError
Is there any way to approximately tell how big files can be at most,
depending on the system specs? How big can a repository be? From what I
can tell (in a very limited way), differences between revisions are
stored in single .pack files. My initial commit is a 6.2GB .pack file.
How will this affect (remote) merging? Would you suggest to also do
initial commits incrementally instead of once?
One more thing: Even when I specify individual files for a commit, bzr
still scans for changes. Why?
I have been using Subversion for a while to backup and sync my files,
and it didn't have problems, but I'd rather go and use a distributed
RCS, because then I can create file revisions even when I'm not online.
Any suggestions? Bad idea? I still haven't tried to sync/merge different
large repositories...
I'm open for any suggestions.
Moritz
More information about the bazaar
mailing list