check improvements

Robert Collins robert.collins at canonical.com
Tue May 12 22:54:34 BST 2009


On Tue, 2009-05-12 at 09:26 -0500, John Arbash Meinel wrote:


> I don't think lp:~lifeless/bzr is a branch, and I'm pretty sure you
> don't have a branch named '.' :)

...'check'. Sorry.

> Anyway, I think the biggest bit is figuring out how to get the per-file
> graph from the inventory without having to deserialize full inventories
> for each revision (which is probably your Revs*Trees).

No, we get that reasonably efficiently for xml, may need to specialise
for bbc. Its the scan [that I'm about to delete] of the contents of
every tree to determine duplicate file texts and check shas thats an
issue. I'm looking at the best interface for a new cheap-scan to the
repository interface, to scan every InventoryEntry in the repo.
(basically find_file_ids but deserialising all the way to an IE). This
will allow a single pass over the inventory contents. We have to expand
all the inventories to full texts once, to get their validators; this is
cheap on bbc as they are simply root nodes and self validating downwards
from there; not so cheap on xml. I suspect the interface will be
something that yields the inventory text sha, and inventory entries, in
a structured stream. That will allow one pass; read everything; xml can
get_stream with fulltext set on, and then still process the text delta
through a line matching regex, while bbc can process the entire tree.

> One possibility would be to keep the last inventory around, and use the
> delta to update it. Or perhaps a few inventories in an LRUCache, and
> then you peek and find the 'closest' one to the one you want next.
> 
> 
> ...
> 
> Looking at the code in question, we do:
> 
> for _, ie in inv.iter_entries():
>   key = (ie.file_id, ie.revision)
>   result.setdefault(key, False)
>   if entry.revision == inv.revision_id:
>     result[key] = True
> 
> However, I think we could simply change that loop to:
> 
> for _, ie in inv.iter_changes(prev_inv):
>   # the rest is the same
> 
> Because we know we only care about things that are different in this
> inventory, because we know all the keys in the previous inventory have
> been handled.

That code isn't the problem ;). But yes we could use that for bbc;
issues are that it won't cross check the inv sha1 itself; and also we
need to make sure that two ie's in adjacent inventories that claim the
same last-modified but have e.g. different text_sha1 trigger an error.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090513/64d94be1/attachment.pgp 


More information about the bazaar mailing list