file ancestors

Tue Nov 22 21:11:55 GMT 2005

Johan Rydberg wrote:
> Hi folks,
> 
> The attached patch adds a "file_graph" method to the Branch class
> which returns two maps, ancestors and descendants, for modifications
> to the file.  The idea is that it can be used to implement per-file
> revision logs (think "bzr log FILE").  Aaron suggested the return data
> format, to let the function line up with the revision_graph function.
> 
> To give it a spin I tried to hook it up in _show_log, but without any
> great success.  The problem is two fold:
> 
>  1.  It is not uncommon that revisions that modify the file is not in
>      the linear branch history (i.e., the revisions has been merged).
>      and "bzr log" only shows revisions that have a revision number.
> 
>  2.  file_graph does not (of course) identify changes to the
>      name space (i.e., renames, deletes and moves)
> 
> The first problem can most likely be solved by building a graph of the
> whole branch, and use that to find out which revision merged the
> modifying revision.  Or does anyone else know of any simple solution?
> 
> A workaround could be that "bzr log" could show revisions without
> revision numbers (merged revisions) -- but when the revision was
> merged is most likely as important information as when the actual
> change was made.
> 
> Regarding building full branch graphs:
> 
> Right now revision_graph reads _every_ revision to get parent
> information.  This is quite expensive, esp with a remote revision
> store.  Maybe this information could be cached in a "revision-graph"
> file in ".bzr" directory.  The format could be simple:

Actually, we already have that information. It is in the table of
contents for the inventory.weave file.
If you do:

w = branch.get_inventory_weave()

w._parents <= this maps each index to its parents
w._names <= maps each index to its name

Now, this doesn't handle ghost revisions, which would be present in the
revision text.

> 
>    REVISION-ID:PARENT[,PARENT[,...]]
> 
> If there is no entry for a revision in the file, it should be treated
> as a ghost revision.  
> 
> I have yet to find a solution to the second problem listen above.  All
> suggestions are welcome.  Right now every inventory is examined, which
> is far from optimal.

I thought the weaves got a new no-text revision when there was a change
to their path.

But as far as I know, there is no way to do it any cheaper right now
than reading every inventory. There has been some discussion about
splitting up the inventory into a separate weave per directory, and then
also possibly storing the back pointer of the parent revision along with
the parent id. This would mean you wouldn't have to read the entire
inventory each time, and you would be able to tell when a parent changed.
I'm not sure how this would handle subdir of subdir rename, for example:

x/y/z.txt
XX/y/z.txt

Since XX isn't a direct parent of z.txt. But I guess at most you would
only have to read the inventories for the parents of "z.txt", so you
would have to read 3 smaller inventories, instead of everything.
And at the very least, all of the weaves would be smaller, since they
wouldn't have to contain an entry for *every* revision. I know
performance is O(N_lines) for weave files, I'm not sure how multiple
weaves would function, but I'm guessing the total performance would
still be O(total_N_lines).

John
=:->

> 
> ~j
> 
> 
> 
> ------------------------------------------------------------------------
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051122/ddda6295/attachment.pgp