Revision storage question

John Arbash Meinel john at arbash-meinel.com
Wed Feb 7 18:53:34 GMT 2007


Angela wrote:
> On 2/6/07, John Arbash Meinel <john at arbash-meinel.com> wrote:
...

>>
> Hello John,
> 
> Thanks for your suggestions. Basically the problem right now is that there
> are a few couple directories that need versioning, and they contain binary
> files. All in all the size right now is up to ~800MB, and I've broken it
> down to four chunks. Each chunk has a dozen or two subdirectories under it,
> and a still a few under _that_, etc. One could say that it makes more sense
> to create repositories for smaller chunks than the four (i.e., instead of a
> repository for all documentation on a project, you'd have one repository
> for
> the technical documentation, another repository for the user's manual, etc)
> although I'm still debating the pros and cons of that one -- looking at it,
> that's a lot of repositories shared over the network.

They can be put in a fully shared repository, and just have separate
branches for everything. Clients will only download the portions they
need. For example:

bzr init-repo --no-trees repo
repo/
  .bzr/repository
  project1-doc/
  project2-doc/
  project3-doc/

Or

repo/
  project1/
    doc/
  project2/
    doc/
  ...

or

repo/
  doc/
    project1/
    project2/
  ...

or

repo/
  user-manuals/
    project1/
    project2/
    ...
  tech-docs/
    project1/
    project2/

or

user-manuals/ (shared repo1)
  project1/
  project2/
tech-docs/ (shared repo2)
  project1/
  project2/

or even more flattened

repo/
  project1-user-manual/
  project1-tech-docs/
  project2-user-manual/
  project2-tech-docs/

There are a lot of ways to lay out your projects and your branches, and
I'm happy to discuss what might work best for you. Ultimately it is your
decision, though.

> 
> I'd like to clarify the last solution you mentioned, about a local shared
> repository. I did that with the actual project, but since Bazaar works
> rather transparently (plus the shared repository has just started) I
> haven't
> been able to notice any marked difference between that and not having a
> shared repository. How would it impact my (large) repositories right now if
> I accessed them via the network (getting checkouts from another computer on
> the network, storing *my* history in a local shared repository, and then
> committing both on local and the other computer)?
> 
> Thanks!
> 
> 

There is a little bit more overhead when working with heavy checkouts
rather than lightweight ones (because it does have to update 2 locations).

I don't find it problematic for me, but different people and projects
have different tolerances.

For 800MB of information, I would try to break it up. bzr does support
doing "binary deltas", but it isn't a very optimal algorithm. So a lot
of binary files won't compact very much. Which means with 10 commits,
you start getting a lot of extra data to download.

The main advantage of having a local shared repository is if you are
branching a lot. (doing bzr branch project project-feature). For
documentation, you may not do that at all, so there isn't really a net win.

For something like this, it does seem that 'bzr checkout --lightweight'
may be your best method. This is very much 'SVN' mode, since you don't
have any history stored locally. (It does mean that you can't commit
while offline, but that doesn't sound important in your situation).

I would definitely recommend using 'bzr+ssh://' if you can install bzr
on your server, as it is already quite a bit faster than sftp, and with
what has been going on on the 'hpss' branch, it should continue to
become much better.

Splitting things up still helps, since it means that you can checkout
smaller sections. But maybe with lightweight checkouts, it wouldn't
cause as much problems.  In any situation, the first checkout is
generally going to be more expensive than all future actions.

John
=:->



More information about the bazaar mailing list