Does Bazaar acrete garbage

Teemu Likonen tlikonen at iki.fi
Thu Sep 4 16:43:19 BST 2008


John Arbash Meinel wrote (2008-09-04 10:07 -0500):

> Basically, that means that whenever you update git's index, it stores
> more data into the repository, which may never be committed (and is
> thus cleaned out during gc).
> 
> Further, if I understand git's repository layout correctly, these
> little bits are stored as just gzip compressed fulltexts. (So doing
> 'git add foo' takes the current text of foo, gzips it, computes the
> sha hash, and puts it in the repository as such.)

Although I can't give you technical details, in general you are correct.
That's how it works.

> When you later to "git (re?)pack" it will find entries that are
> referenced, and computes deltas, etc to make them smaller. I don't
> know what it does for unreferenced entries, it might also put them
> into the pack file.

No need for "git repack" in normal usage; it's a low-level command 
("plumbing") and a regular user don't need to know or worry about it.

"git gc" does everything. It packs all referenced loose objects (i.e., 
objects stored in separate files, as you described) with delta computing 
etc. "git gc" also unpacks all unreferenced objects (makes them loose 
objects) if such things exist in previously created packs. And "git gc" 
also prunes unreferenced loose objects (collects the garbage) if they 
are more than certain days old (it's configurable).

So, "git gc" is the one. Actually "git gc" is run automatically from 
time to time so it's not necessary to do it manually. However, by 
default it does so quite rarely (optimized for big repositories and fast 
change rate) so in practice I tend to run it manually once a week or so. 
Probably I should just configure it to trigger more often.



More information about the bazaar mailing list