[RFC] bzr.jrydberg.versionedfile

Goffredo Baroncelli kreijack at alice.it
Wed Dec 21 19:57:08 GMT 2005


On Wednesday 21 December 2005 20:03, you (John Arbash Meinel) wrote:

> > Moreover I see some disadvantages to have revision not referenced. 
> > 1) If we wont to know the changes related to a file, we can parse only the file; instead
> > if we allow unreferenced revision, we have to intersect the revision related to 
> > the file, which the ones "official".
> 
> I don't think there is a large gain here. A list of revisions officially
> merged are small. And you always need to know what revision you care
> about the changes. If only to annotate the current object.
The annotate only show the added/changed line, not the deleted ones; about the
size of the revision see below

> Since knits contain the ancestry information for the file, all I need to
> know is the current revision, and then all of those extra changes are
> hidden.
Yes, but so you have to unpack the inventory... 

> 
> > 2) if the project is a big project with many contributors, which pull from others
> > contributors, we can have an explosion of the storage size.
> 
> Not really. You rarely will get anything that you wouldn't get anyway.
> The only things that are 'wasted' are merges/pulls which you then decide
> that you don't want.

What you say is true if the other repository from you merge is in a clean state. For
example this is not true for the bazaar ones:

$ grep README-20050309040720-8f368abf9f346b9d ../inventory.weave | sed -e 's/^.*revision="//' -e 's/".*//' | wc -l
16
$ grep ^n README-20050309040720-8f368abf9f346b9d.weave | wc -l
26

The example above shows that README weave file contains 26 revision; instead the
inventory references only 16: about the ~40% of the revision are meaningfulless.

$ grep "file_id" ../inventory.weave | \
     sed -e 's/^.*file_id="/ /' -e 's/".*revision="//' -e 's/".*//'  | \
     sort | uniq | wc -l
6727
$ grep -h ^n */*.weave | wc -l
9025

The example above shows that the inventory references only the 6727/9025 = 74% of the 
revision present: 1/4 of the repository is without sense. And these information are
replicated in every developer repository

Even tough the other repository pass the "bzr check" control, it may be that this contains
a lot of not usefoul information.

> Well, I'm not sure about Johan's specific implementation, but it would
> be possible to supply the 'Knit.join()' command a list of revisions
> which I think I'm going to be interested in. And it will only bring
> those in.
I toughted  the same

> If it turns out that I don't want them (I do 'bzr merge' and then
> realize I don't want those changes), are you asking for them to be deleted?
If I don't bring  what i don't want, i don't delete it :-)

> If it is simply "I want to sanitize everything which has not actually
> been added to this branch", we would want a command like that anyway.

For me, the default to the pull command is to the merge something that
is published. So why i have to pull something that i don't use ?

> Because a repository is going to share knits, and if you make something
> public, you may need to sanitize it.
> But this would be more of a "take what I have, and generate a new
> branch, stripping out all unreferenced information". It would be an
> occasional pruning, not a common operation which might delete stuff.

Again I don't want to delete anything: I don't want to merge it if I don't
want; I don't like the idea but, if you prefer, what about a switch like 
'--dont-fetch-all'; or better, what about a default option to set in the
bzr.conf file ?

> John
Goffredo

-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack AT inwind.it>
Key fingerprint = CE3C 7E01 6782 30A3 5B87  87C0 BB86 505C 6B2A CFF9




More information about the bazaar mailing list